316 105 19MB
English Pages 835 [836] Year 2023
Lecture Notes in Networks and Systems 608
Sandeep Kumar · Harish Sharma · K. Balachandran · Joong Hoon Kim · Jagdish Chand Bansal Editors
Third Congress on Intelligent Systems Proceedings of CIS 2022, Volume 1
Lecture Notes in Networks and Systems Volume 608
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Sandeep Kumar · Harish Sharma · K. Balachandran · Joong Hoon Kim · Jagdish Chand Bansal Editors
Third Congress on Intelligent Systems Proceedings of CIS 2022, Volume 1
Editors Sandeep Kumar Department of Computer Science and Engineering CHRIST (Deemed to be University) Bengaluru, India
Harish Sharma Department of Computer Science and Engineering Rajasthan Technical University Kota, Rajasthan, India
K. Balachandran Department of Computer Science and Engineering CHRIST (Deemed to be University) Bengaluru, India
Joong Hoon Kim School of Civil, Environmental and Architectural Engineering Korea University Seoul, Korea (Republic of)
Jagdish Chand Bansal South Asian University New Delhi, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-9224-7 ISBN 978-981-19-9225-4 (eBook) https://doi.org/10.1007/978-981-19-9225-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This book contains outstanding research papers as the proceedings of the 3rd Congress on Intelligent Systems (CIS 2022), held on September 05–06, 2022, at CHRIST (Deemed to be University), Bengaluru, India, under the technical sponsorship of the Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. We have tried our best to enrich the quality of the CIS 2022 through the stringent and careful peer-review process. This book presents novel contributions to Intelligent Systems and serves as reference material for advanced research. We have tried our best to enrich the quality of the CIS 2022 through a stringent and careful peer-review process. CIS 2022 received many technical contributed articles from distinguished participants from home and abroad. CIS 2022 received 729 research submissions from 45 different countries, viz. Algeria, Australia, Bangladesh, Belgium, Brazil, Bulgaria, Colombia, Cote d’Ivoire, Czechia, Egypt, Ethiopia, Fiji, Finland, Germany, Greece, India, Indonesia, Iran, Iraq, Ireland, Italy, Japan, Kenya, Latvia, Malaysia, Mexico, Morocco, Nigeria, Oman, Peru, Philippines, Poland, Romania, Russia, Saudi Arabia, Serbia, Slovakia, South Africa, Spain, Turkmenistan, Ukraine, UK, USA, Uzbekistan, and Vietnam. After a very stringent peer-reviewing process, only 120 high-quality papers were finally accepted for presentation and the final proceedings. This book presents first volume of 60 research papers data science and applications and serves as reference material for advanced research. Bengaluru, India Kota, India Bengaluru, India Seoul, South Korea New Delhi, India
Sandeep Kumar Harish Sharma K. Balachandran Joong Hoon Kim Jagdish Chand Bansal
v
Contents
Design and Analysis of Genetic Algorithm Optimization-Based ANFIS Controller for Interleaved DC-DC Converter-Fed PEMFC System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CH Hussaian Basha, Shaik. Rafikiran, M. Narule, G. Devadasu, B. Srinivasa Varma, S. Naikawadi, A. Kambire, and H. B. Kolekar
1
A Digital Transformation (DT) Model for Intelligent Organizational Systems: Key Constructs for Successful DT . . . . . . . . . . . . Michael Brian George and Grant Royd Howard
13
Smart Sewage Monitoring Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sujatha Rajkumar, Shaik Janubhai Mahammad Abubakar, Venkata Nitin Voona, M. Lakshmi Vaishnavi, and K. Chinmayee Explainable Stacking Machine Learning Ensemble for Predicting Airline Customer Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Pranav and H. S. Gururaja A Water Cycle Algorithm for Optimal Design of IIR Filters . . . . . . . . . . . Teena Mittal Comparative Evaluation of Machine Learning Algorithms for Credit Card Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kiran Jot Singh, Khushal Thakur, Divneet Singh Kapoor, Anshul Sharma, Sakshi Bajpai, Neeraj Sirawag, Riya Mehta, Chitransh Chaudhary, and Utkarsh Singh Future Commercial Prospects of Unmanned Aerial Vehicles (UAVs) . . . . Divneet Singh Kapoor, Kiran Jot Singh, Richa Bansal, Khushal Thakur, and Anshul Sharma Experimental Analysis of Deep Learning Algorithms Used in Brain Tumor Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kapil Mundada, Toufiq Rahatwilkar, and Jayant Kulkarni
27
41 57
69
79
91
vii
viii
Contents
Optimized GrabCut Algorithm in Medical Image Analyses . . . . . . . . . . . . 101 Mária Ždímalová and Kristína Boratková Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features . . . . . . . . . . . . . . . . . . 117 Aniruddha Mohanty, Ravindranath C. Cherukuri, and Alok Ranjan Prusty Genetic Artificial Bee Colony for Mapping onto Network on Chip “GABC” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Maamar Bougherara and Messaoudi Djihad A Study of Machine Translation Models for Kannada-Tulu . . . . . . . . . . . . 145 Asha Hegde, Hosahalli Lakshmaiah Shashirekha, Anand Kumar Madasamy, and Bharathi Raja Chakravarthi Multi-technology Gateway Management for IoT: Review Analysis and Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Sonal and Suman Deswal BGR Images-Based Human Fall Detection Using ResNet-50 and LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Divya Singh, Meenu Gupta, and Rakesh Kumar Presaging Cancer Stage Classification by Extracting Influential Features from Breast/Lung/Prostate Cancer Clinical Datasets Based on TNM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Sweta Manna and Sujoy Mistry Diagnosis of Parkinson’s Disease Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Khushal Thakur, Divneet Singh Kapoor, Kiran Jot Singh, Anshul Sharma, and Janvi Malhotra Stock Market Prediction Techniques Using Artificial Intelligence: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Chandravesh Chaudhari and Geetanjali Purswani Swarm Intelligence-Based Clustering and Routing Using AISFOA-NGWO for WSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 M. Vasim Babu, M. Madhusudhan Reddy, C. N. S. Vinoth Kumar, R. Ramasamy, and B. Aishwarya Sentiment Analysis Using an Improved LSTM Deep Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Dhaval Bhoi, Amit Thakkar, and Ritesh Patel Comparative Analysis on Deep Learning Models for Detection of Anomalies and Leaf Disease Prediction in Cotton Plant Data . . . . . . . . 263 Nenavath Chander and M. Upendra Kumar
Contents
ix
A Model for Prediction of Understandability and Modifiability of Object-Oriented Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Sumit Babu and Raghuraj Singh Agricultural Insect Pest’s Recognition System Using Deep Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Sapna Dewari, Meenu Gupta, and Rakesh Kumar Seq2Code: Transformer-Based Encoder-Decoder Model for Python Source Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Naveen Kumar Laskari, K. Adi Narayana Reddy, and M. Indrasena Reddy Security Using Blockchain in IoT-Based System . . . . . . . . . . . . . . . . . . . . . . 311 Suman Machine Learning, Wearable, and Smartphones for Student’s Mental Health Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Deivanai Gurusamy, Prasun Chakrabarti, Midhunchakkaravarthy, Tulika Chakrabarti, and Xue-bo Jin Improving K-means by an Agglomerative Method and Density Peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Libero Nigro and Franco Cicirelli Assessing the Best-Fit Regression Models for Predicting the Marine Water Quality Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Karuppanan Komathy E-commerce Product’s Trust Prediction Based on Customer Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 Hrutuja Kargirwar, Praveen Bhagavatula, Shrutika Konde, Paresh Chaudhari, Vipul Dhamde, Gopal Sakarkar, and Juan C. Correa Improving Amharic Handwritten Word Recognition Using Auxiliary Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Mesay Samuel Gondere, Lars Schmidt-Thieme, Durga Prasad Sharma, and Abiot Sinamo Boltena Computational Drug Discovery Using Minimal Inhibitory Concentration Analysis with Bacterial DNA Snippets . . . . . . . . . . . . . . . . . 397 K. P. Sabari Priya, J. Hemadharshini, S. Sona, R. Suganya, and Seyed M. Buhari Optimized CNN Model with Deep Convolutional GAN for Brain Tumor Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Mure Vamsi Kalyan Reddy, Prithvi K. Murjani, Sujatha Rajkumar, Thomas Chen, and V. S. Ajay Chandrasekar
x
Contents
Named Entity Recognition: A Review for Key Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 P. Nandini and Bhat Geetalaxmi Jairam A Mathematical Model to Explore the Details in an Image with Local Binary Pattern Distribution (LBP) . . . . . . . . . . . . . . . . . . . . . . . . 439 Denny Dominic, Krishnan Balachandran, and C. Xavier Performance Evaluation of Energy Detection for Cognitive Radio in OFDM System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Rania Mahmoud, Wael A. E. Ali, and Nour Ismail Modified Iterative Shrinkage Thresholding Algorithm for Image De-blurring in Medical Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Himanshu Choudhary, Kartik Sahoo, and Arishi Orra A Comprehensive Review on Crop Disease Prediction Based on Machine Learning and Deep Learning Techniques . . . . . . . . . . . . . . . . . 481 Manoj A. Patil and M. Manohar Sentiment Analysis Through Fourier Transform Techniques in NLP . . . 505 Anuraj Singh and Kaustubh Pathak Gesture Analysis Using Image Processing: For Detection of Suspicious Human Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Prachi Bhagat and Anjali. S. Bhalchandra Reliability Analysis of a Mechanical System with 3 Out of 5 Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 B. Yamuna, Radha Gupta, Kokila Ramesh, and N. K. Geetha Disaster Analysis Through Tweets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Anshul Sharma, Khushal Thakur, Divneet Singh Kapoor, Kiran Jot Singh, Tarun Saroch, and Raj Kumar Design of an Aqua Drone for Automated Trash Collection from Swimming Pools Using a Deep Learning Framework . . . . . . . . . . . . 555 Kiran Mungekar, Bijith Marakarkandy, Sandeep Kelkar, and Prashant Gupta Design of a 3-DOF Robotic Arm and Implementation of D-H Forward Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Denis Manolescu and Emanuele Lindo Secco Impact of Electric Vehicle Charging Station in Distribution System Using V2G Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Golla Naresh Kumar, Suresh Kumar Sudabattula, Abhijit Maji, Chowtakuri Jagath Vardhan Reddy, and Bandi Kanti Chaitanya
Contents
xi
Ontology-Based Querying from Heterogeneous Sensor Data for Heart Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Diksha Hooda and Rinkle Rani CoSSC—Comparative Study of Stellar Classification Using DR-16 and DR-17 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 R. Bhuvaneshwari, M. S. Karthika Devi, R. Vishal, E. Sarvesh, and T. M. Sanjeevaditya Dimensional Emotion Recognition Using EEG Signals via 1D Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 Sukhpreet Kaur and Nilima Kulkarni Optimal Shortest Path Routing over Wireless Sensor Networks Using Constrained Genetic Firefly Optimization Algorithm . . . . . . . . . . . 643 Sujatha Arun Kokatnoor, Vandana Reddy, and Balachandran Krishnan Global Approach of Shape and Texture Features Fusion in Convolutional Neural Network for Automatic Classification of Plant Species Based on Leaves Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Armand Kodjo Atiampo, Kouassi Adelphe Christian N’Goran, and Zacrada Françoise Odile Trey The Analysis of Countries’ Investment Attractiveness Indicators Using Neural Networks Trained on the Adam and WCO Methods . . . . . . 675 Eugene Fedorov, Liubov Kibalnyk, Maryna Leshchenko, Olga Nechyporenko, and Hanna Danylchuk OCT DEEPNET 1—A Deep Learning Approach for Retinal OCT Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689 Ranjitha Rajan and S. N. Kumar Feature Selection in High Dimensional Data: A Review . . . . . . . . . . . . . . . 703 Sarita Silaich and Suneet Gupta Flipping the Switch on Local Exploration: Genetic Algorithms with Reversals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Ankit Grover, Vaishali Yadav, and Bradly Alicea Evaluation of E-teaching Implementation in Iraqi Universities . . . . . . . . . 735 Kadum Ali Ahmed, Muneer S. G. Mansoor, Naseer Al-Imareen, and Ibrahim Alameri Avoiding Obstacles with Geometric Constraints on LiDAR Data for Autonomous Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 Meenakshi Sarkar, Manav Prabhakar, and Debasish Ghose
xii
Contents
A Sensitivity Study of Machine Learning Techniques Based on Multiprocessing for the Load Forecasting in an Electric Power Distribution System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763 Ajay Singh, Kapil Joshi, Konda Hari Krishna, Rajesh Kumar, Neha Rastogi, and Harishchander Anandaram Self-adaptive Butterfly Optimization for Simultaneous Optimal Integration of Electric Vehicle Fleets and Renewable Distribution Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777 Thandava Krishna Sai Pandraju and Varaprasad Janamala Arrhythmia detection—An Enhanced Method Using Gramian Angular Matrix for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Keerthana Krishnan, R. Gandhiraj, and Manoj Kumar Panda Development and Analysis of a Novel Hybrid HBFA Using Firefly and Black Hole Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 Jaspreet Kaur and Ashok Pal Hunter Prey Optimization for Optimal Allocation of Photovoltaic Units in Radial Distribution System for Real Power Loss and Voltage Stability Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 Pappu Soundarya Lahari and Varaprasad Janamala Transfer Learning-Based Convolution Neural Network Model for Hand Gesture Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827 Niranjali Kumari, Garima Joshi, Satwinder Kaur, and Renu Vig Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
Editors and Contributors
About the Editors Dr. Sandeep Kumar is currently a professor at CHRIST (Deemed to be University), Bangalore. Before joining CHRIST, he worked with ACEIT Jaipur, Jagannath University, Jaipur, and Amity University, Rajasthan. He is an associate editor for Springer’s Human-centric Computing and Information Sciences (HCIS) journal. He has published more than eighty research papers in various international journals/conferences and attended several national and international conferences and workshops. He has authored/edited six books in the area of computer science. Also, he has been serving as the general chair of the International Conference on Communication and Computational Technologies (ICCCT 2021, 22, and 23) and the Congress on Intelligent Systems (CIS 2022). His research interests include nature-inspired algorithms, swarm intelligence, soft computing, and computational intelligence. Dr. Harish Sharma is an associate professor at Rajasthan Technical University, Kota, in the Computer Science and Engineering Department. He has worked at Vardhaman Mahaveer Open University, Kota, and Government Engineering College, Jhalawar. He received his B.Tech. and M.Tech. degrees in Computer Engineering from Government Engineering College, Kota, and Rajasthan Technical University, Kota, in 2003 and 2009, respectively. He obtained his Ph.D. from ABV—Indian Institute of Information Technology and Management Gwalior, India. He is a secretary and one of the founder members of the Soft Computing Research Society of India. He is a lifetime member of the Cryptology Research Society of India, ISI, Kolkata. He is an associate editor of The International Journal of Swarm Intelligence (IJSI) published by Inderscience. He has also edited special issues of the many reputed journals like Memetic Computing, Journal of Experimental and Theoretical Artificial Intelligence, Evolutionary Intelligence, etc. His primary area of interest is nature-inspired optimization techniques. He has contributed to more than 105 papers published in various international journals and conferences.
xiii
xiv
Editors and Contributors
Dr. K. Balachandran is currently a professor and the head of CSE at CHRIST (Deemed to be University), Bengaluru, India. He has 38 years of experience in research, academia, and industry. He served as a senior scientific officer in the Research and Development Unit of the Department of Atomic Energy for 20 years. His research interest includes data mining, artificial neural networks, soft computing, and artificial intelligence. He has published more than fifty articles in well-known SCI-/SCOPUS-indexed international journals and conferences and attended several national and international conferences and workshops. He has authored/edited four books in the area of computer science. Professor Joong Hoon Kim is a faculty of Korea University in the School of Civil, Environmental and Architectural Engineering, obtained his Ph.D. from the University of Texas at Austin in 1992 with the thesis “Optimal replacement/rehabilitation model for water distribution systems.” His major areas of interest include optimal design and management of water distribution systems, application of optimization techniques to various engineering problems, and development and application of evolutionary algorithms. His publication includes “A New Heuristic Optimization Algorithm: Harmony Search,” Simulation, February 2001, Vol. 76, pp 60–68, which has been cited over 6700 times by other journals of diverse research areas. His keynote speeches include “Optimization Algorithms as Tools for Hydrological Science” in the Annual Meeting of Asia Oceania Geosciences Society held in Brisbane, Australia, in June of 2013, “Recent Advances in Harmony Search Algorithm” in the 4th Global Congress on Intelligent Systems (GCIS 2013) held in Hong Kong, China, in December of 2013, and “Improving the convergence of Harmony Search Algorithm and its variants” in the 4th International Conference on Soft Computing For Problem Solving (SOCPROS 2014) held in Silchar, India, in December of 2014. He hosted the first, second, and sixth Conference of International Conference on Harmony Search Algorithm (ICHSA) in 2013, 2014, and 2022. He also hosted the 12th International Conference on Hydroinformatics (HIC 2016). Also, he has been serving as an honorary chair of Congress on Intelligent Systems (CIS 2020, 2021, and 2022). Dr. Jagdish Chand Bansal is an associate professor at South Asian University, New Delhi, and a visiting faculty at Maths and Computer Science, Liverpool Hope University, UK. He obtained his Ph.D. in Mathematics from IIT Roorkee. Before joining SAU, New Delhi, he worked as an assistant professor at ABV—Indian Institute of Information Technology and Management Gwalior and BITS Pilani. His primary area of interest is swarm intelligence and nature-inspired optimization techniques. Recently, he proposed a fission-fusion social structure-based optimization algorithm, spider monkey optimization (SMO), which is being applied to various problems from the engineering domain. He has published more than 70 research papers in various international journals/conferences. He is the editor-in-chief of the journal MethodsX published by Elsevier. He is the series editor of the book series Algorithms for Intelligent Systems (AIS) and Studies in Autonomic, Data-driven, and Industrial Computing (SADIC), published by Springer. He is the editor-in-chief
Editors and Contributors
xv
of the International Journal of Swarm Intelligence (IJSI) published by Inderscience. He is also the associate editor of Engineering Applications of Artificial Intelligence (EAAI) and ARRAY, published by Elsevier. He is the general secretary of the Soft Computing Research Society (SCRS). He has also received gold medals at UG and PG levels.
Contributors Shaik Janubhai Mahammad Abubakar School of Electronics Engineering, Vellore Institute of Technology, Vellore, India Kadum Ali Ahmed University of Kerbala, Karbala, Iraq B. Aishwarya SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India Naseer Al-Imareen Széchenyi István University, Gy˝or, Hungary; Al-Qadisiyah University, AL Diwaniyah, Iraq Ibrahim Alameri Faculty of Economics and Administration, University of Pardubice, Pardubice, Czech Republic; Nottingham Trent University, Nottingham, UK; Jabir Ibn Hayyan Medical University, Najaf, Iraq Wael A. E. Ali Electronics and Communications Engineering Department, Arab Academy for Science, Technology and Maritime Transport, Alex, Egypt Bradly Alicea Orthogonal Research and Education Laboratory, ChampaignUrbana, IL, USA; OpenWorm Foundation, Boston, MA, USA Harishchander Anandaram Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu, India Sujatha Arun Kokatnoor Department of Computer Science and Engineering, School of Engineering and Technology, CHRIST (Deemed to be University), Bangalore, Karnataka, India Armand Kodjo Atiampo Unité de Recherche et d’Expertise du Numérique (UREN), Université Virtuelle de Côte d’Ivoire (UVCI), Abidjan, Côte d’Ivoire Sumit Babu Harcourt Butler Technical University, Kanpur, India Sakshi Bajpai Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab, India Krishnan Balachandran Department of Computer Science and Engineering, Christ (Deemed to be University), Bengaluru, India
xvi
Editors and Contributors
Richa Bansal Electronics and Communication Engineering Department, Chandigarh University, Mohali, Punjab, India Prachi Bhagat Prof. Department of Electronics Engineering, Government College of Engineering, Yavatmal, India Praveen Bhagavatula G. H. Raisoni College of Engineering, Nagpur, India Anjali. S. Bhalchandra Department of Electronics Engineering, Government College of Engineering, Aurangabad, India Dhaval Bhoi U & P U. Patel Department of Computer Engineering, Faculty of Technology and Engineering (FTE), Chandubhai S. Patel Institute of Technology (CSPIT), Charotar University of Science and Technology (CHARUSAT), Anand, Gujarat, India R. Bhuvaneshwari Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai, India Abiot Sinamo Boltena Ministry of Innovation and Technology, Addis Ababa, Ethiopia Kristína Boratková Slovak University of Technology in Bratislava, Bratislava, Slovakia Maamar Bougherara LIMPAF Laboratory, Bouira University, Bouira, Algeria; Department of Computer Science, High Normale School of Kouba, Algiers, Algeria Seyed M. Buhari School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, TamilNadu, India Bandi Kanti Chaitanya B.V Raju Institute of Technology, Tuljaraopet, Telangana, India Prasun Chakrabarti Lincoln University College, Petaling Jaya, Malaysia; ITM SLS Baroda University, Vadodara, India Tulika Chakrabarti Sir Padampat Singhania University, Udaipur, Rajasthan, India Bharathi Raja Chakravarthi Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Irelan, Galway, Ireland Nenavath Chander Department of Computer Science and Engineering, Osmania University, Hyderabad, Telangana, India V. S. Ajay Chandrasekar Department of Surgical Oncology, Saveetha Medical College, Thandalam, Tamil Nadu, India Chandravesh Chaudhari Department of Commerce, CHRIST (Deemed to be University), Bangalore, Karnataka, India Paresh Chaudhari G. H. Raisoni College of Engineering, Nagpur, India
Editors and Contributors
xvii
Chitransh Chaudhary Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab, India Thomas Chen School of Science and Technology, Department of Engineering, University of London, Northampton Square London, UK Ravindranath C. Cherukuri CHRIST (Deemed to be University), Bangalore, Karnataka, India K. Chinmayee School of Electronics Engineering, Vellore Institute of Technology, Vellore, India Himanshu Choudhary Indian Institute of Technology, Mandi, India Franco Cicirelli CNR—National Research Council of Italy—Institute for High Performance Computing and Networking (ICAR), Rende, Italy Juan C. Correa Colegio de Estudios Superiores de Administracion, Bogota, Colombia Hanna Danylchuk Cherkasy Bohdan Khmelnytsky National University, Cherkasy, Ukraine Suman Deswal Department of Computer Science and Engineering, DCRUST, Murthal, Sonipat, India G. Devadasu CMR College of Engineering and Technology (Autonomous), Hyderabad, India Sapna Dewari Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India Vipul Dhamde G. H. Raisoni College of Engineering, Nagpur, India Messaoudi Djihad Department of Computer Science, High Normale School of Kouba, Algiers, Algeria Denny Dominic Department of Computer Science and Engineering, Christ (Deemed to be University), Bengaluru, India Eugene Fedorov Cherkasy State Technological University, Cherkasy, Ukraine R. Gandhiraj Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India N. K. Geetha Department of Mathematics, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India Michael Brian George University of South Africa (Unisa), Roodepoort, South Africa Debasish Ghose Indian Institute of Science, Bengaluru, India
xviii
Editors and Contributors
Mesay Samuel Gondere Faculty of Computing and Software Engineering, Arba Minch University, Arba Minch, Ethiopia Ankit Grover Orthogonal Research and Education Laboratory, ChampaignUrbana, IL, USA; Manipal University Jaipur, Jaipur, Rajasthan, India Meenu Gupta Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India Prashant Gupta Prin. L. N. Welingkar Institute of Management Development and Research (WeSchool), Mumbai, India Radha Gupta Department of Mathematics, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India Suneet Gupta CSE Department, Mody University Laxmangarh, Sikar, India H. S. Gururaja Department of ISE, B.M.S. College of Engineering, Bengaluru, India Deivanai Gurusamy Lincoln University College, Petaling Jaya, Malaysia Asha Hegde Department of Computer Science, Mangalore University, Mangalore, Karnataka, India J. Hemadharshini School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, TamilNadu, India Diksha Hooda CSED, Thapar Institute of Engineering and Technology, Patiala, India Grant Royd Howard University of South Africa (Unisa), Roodepoort, South Africa CH Hussaian Basha Nitte Meenakshi Institute of Technology, Karnataka, India M. Indrasena Reddy BVRIT HYDERABAD College of Engineering for Women, Bachupally, Hyderabad, India Nour Ismail Electrical Engineering Department, Alexandria University, Alex, Egypt Bhat Geetalaxmi Jairam Department of Information Science and Engineering, The National Institute of Engineering, Mysore, India Varaprasad Janamala Department of Electrical and Electronics Engineering, School of Engineering and Technology, Christ (Deemed to be University), Bengaluru, Karnataka, India Xue-bo Jin School of Artificial Intelligence, Beijing Technology and Business University, Beijing, China Garima Joshi Panjab University, Chandigarh, India
Editors and Contributors
xix
Kapil Joshi Uttaranchal Institute of Technology, Uttaranchal University, Dehradun, India A. Kambire Department of EE, Nanasaheb Mahadik College of Engineering, Peth, India Divneet Singh Kapoor Electronics and Communication Engineering Department, Chandigarh University, Mohali, Punjab, India Hrutuja Kargirwar G. H. Raisoni College of Engineering, Nagpur, India M. S. Karthika Devi Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai, India Jaspreet Kaur Chandigarh University, Mohali, Punjab, India Satwinder Kaur Panjab University, Chandigarh, India Sukhpreet Kaur MIT Art, Design and Technology University, Pune, India Sandeep Kelkar Prin. L. N. Welingkar Institute of Management Development and Research (WeSchool), Mumbai, India Liubov Kibalnyk Cherkasy Bohdan Khmelnytsky National University, Cherkasy, Ukraine H. B. Kolekar Department of EE, Nanasaheb Mahadik College of Engineering, Peth, India Karuppanan Komathy Academy of Maritime Education and Training, Chennai, India Shrutika Konde G. H. Raisoni College of Engineering, Nagpur, India Konda Hari Krishna Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India Balachandran Krishnan Department of Computer Science and Engineering, School of Engineering and Technology, CHRIST (Deemed to be University), Bangalore, Karnataka, India Keerthana Krishnan Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Jayant Kulkarni Department of Instrumentation Engineering, Vishwakarma Institute of Technology, Pune, India Nilima Kulkarni MIT Art, Design and Technology University, Pune, India Golla Naresh Kumar B.V Raju Institute of Technology, Tuljaraopet, Telangana, India; Lovely Professional University, Phagwara, Punjab, India
xx
Editors and Contributors
Raj Kumar Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab, India Rajesh Kumar Meerut Institute of Technology, Meerut, India Rakesh Kumar Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India S. N. Kumar Department of EEE, Amal Jyothi College of Engineering, Kottayam, Kerala, India Niranjali Kumari Panjab University, Chandigarh, India Pappu Soundarya Lahari Department of Electrical and Electronics Engineering, School of Engineering and Technology, Christ (Deemed to be University), Bangalore, Karnataka, India M. Lakshmi Vaishnavi School of Electronics Engineering, Vellore Institute of Technology, Vellore, India Naveen Kumar Laskari BVRIT HYDERABAD College of Engineering for Women, Bachupally, Hyderabad, India Maryna Leshchenko Cherkasy State Technological University, Cherkasy, Ukraine Anand Kumar Madasamy National Institute of Technology Karnataka, Suratkal, Karnataka, India M. Madhusudhan Reddy K.S.R.M College of Engineering, Kadapa, Andhra Pradesh, India Rania Mahmoud Electronics and Communications Engineering Department, Alexandria Higher Institute of Engineering and Technology, Alex, Egypt Abhijit Maji B.V Raju Institute of Technology, Tuljaraopet, Telangana, India Janvi Malhotra Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab, India Sweta Manna Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Haringhata, WB, India M. Manohar Department of Computer Science and Engineering, School of Engineering and Technology, Christ (Deemed to be University), Bangalore, Karnataka, India Denis Manolescu Robotics Lab, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK Muneer S. G. Mansoor College of Engineering, University of Information Technology and Communications (UOITC), Baghdad, Iraq Bijith Marakarkandy Prin. L. N. Welingkar Institute of Management Development and Research (WeSchool), Mumbai, India
Editors and Contributors
xxi
Riya Mehta Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab, India Midhunchakkaravarthy Lincoln University College, Petaling Jaya, Malaysia Sujoy Mistry Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Haringhata, WB, India Teena Mittal Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India Aniruddha Mohanty CHRIST (Deemed to be University), Bangalore, Karnataka, India Kapil Mundada Department of Instrumentation Engineering, Vishwakarma Institute of Technology, Pune, India Kiran Mungekar ThinkGestalt.Tech, Mumbai, India Prithvi K. Murjani School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India S. Naikawadi Department of EE, Nanasaheb Mahadik College of Engineering, Peth, India P. Nandini Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, India M. Narule Department of EE, Nanasaheb Mahadik College of Engineering, Peth, India Olga Nechyporenko Cherkasy State Technological University, Cherkasy, Ukraine Libero Nigro University of Calabria, DIMES, Rende, Italy Kouassi Adelphe Christian N’Goran Laboratoire de Recherche en Informatique et Télécommunication (LARIT), INPHB, Abidjan, Côte d’Ivoire Arishi Orra Indian Institute of Technology, Mandi, India Ashok Pal Chandigarh University, Mohali, Punjab, India Manoj Kumar Panda Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India Thandava Krishna Sai Pandraju Department of Electrical and Electronics Engineering, Dhanekula Institute of Engineering and Technology, Vijayawada, Andhra Pradesh, India Ritesh Patel U & P U. Patel Department of Computer Engineering, Faculty of Technology and Engineering (FTE), Chandubhai S. Patel Institute of Technology (CSPIT), Charotar University of Science and Technology (CHARUSAT), Anand, Gujarat, India
xxii
Editors and Contributors
Kaustubh Pathak ABV—Indian Institute of Information Technology and Management, Gwalior, India Manoj A. Patil Department of Computer Science and Engineering, School of Engineering and Technology, Christ (Deemed to be University), Bangalore, Karnataka, India; Department of Information Technology, Vasavi College of Engineering, Hyderabad, Telangana, India Manav Prabhakar Indian Institute of Science, Bengaluru, India R. Pranav Department of EEE, B.M.S. College of Engineering, Bengaluru, India K. P. Sabari Priya School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, TamilNadu, India Alok Ranjan Prusty DGT, RDSDE, NSTI(W), Kolkata, West Bengal, India Geetanjali Purswani Department of Commerce, CHRIST (Deemed to be University), Bangalore, Karnataka, India Shaik. Rafikiran Sri Venkateswara College of Engineering (Autonomous), Tirupati, India Toufiq Rahatwilkar Department of Instrumentation Engineering, Vishwakarma Institute of Technology, Pune, India Ranjitha Rajan Lincoln University College, Kota Bharu, Malaysia Sujatha Rajkumar School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India R. Ramasamy Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Tamil Nadu, India Kokila Ramesh Department of Mathematics, University), Bangalore, Karnataka, India
FET,
Jain
(Deemed-to-be-
Rinkle Rani CSED, Thapar Institute of Engineering and Technology, Patiala, India Neha Rastogi Uttaranchal Institute of Management, Uttaranchal University, Dehradun, India Chowtakuri Jagath Vardhan Reddy B.V Tuljaraopet, Telangana, India
Raju
Institute
of
Technology,
K. Adi Narayana Reddy BVRIT HYDERABAD College of Engineering for Women, Bachupally, Hyderabad, India Mure Vamsi Kalyan Reddy School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
Editors and Contributors
xxiii
Vandana Reddy Department of Computer Science and Engineering, School of Engineering and Technology, CHRIST (Deemed to be University), Bangalore, Karnataka, India Kartik Sahoo Indian Institute of Technology, Mandi, India Gopal Sakarkar D. Y. Patil Institute of Master of Computer Applications and Management, Pune, India T. M. Sanjeevaditya Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai, India Meenakshi Sarkar Indian Institute of Science, Bengaluru, India Tarun Saroch Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab, India E. Sarvesh Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai, India Lars Schmidt-Thieme Information Systems and Machine Learning Lab, Hildesheim, Germany Emanuele Lindo Secco Robotics Lab, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK Anshul Sharma Electronics and Communication Engineering Department, Chandigarh University, Mohali, Punjab, India Durga Prasad Sharma AMUIT MOEFDRE under UNDP and MAISM under RTU, Kota, India Hosahalli Lakshmaiah Shashirekha Department of Computer Science, Mangalore University, Mangalore, Karnataka, India Sarita Silaich Department of Computer Science and Engineering, Government Polytechnic College Jhunjhunu, Jhunjhunu, India Ajay Singh UCALS, Uttaranchal University, Dehradun, India Anuraj Singh ABV—Indian Institute of Information Technology and Management, Gwalior, India Divya Singh Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India Kiran Jot Singh Electronics and Communication Engineering Department, Chandigarh University, Mohali, Punjab, India Raghuraj Singh Harcourt Butler Technical University, Kanpur, India Utkarsh Singh Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab, India
xxiv
Editors and Contributors
Neeraj Sirawag Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab, India S. Sona School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, TamilNadu, India Sonal Department of Computer Science and Engineering, DCRUST, Murthal, Sonipat, India B. Srinivasa Varma Department of EE, Nanasaheb Mahadik College of Engineering, Peth, India Suresh Kumar Sudabattula Lovely Professional University, Phagwara, Punjab, India R. Suganya School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, TamilNadu, India Suman Department of Computer Science and Engineering, UIET, MDU, Rohtak, Haryana, India Amit Thakkar Department of Computer Science and Engineering, Faculty of Technology and Engineering (FTE), Chandubhai S. Patel Institute of Technology (CSPIT), Charotar University of Science and Technology (CHARUSAT), Anand, Gujarat, India Khushal Thakur Electronics and Communication Engineering Department, Chandigarh University, Mohali, Punjab, India Zacrada Françoise Odile Trey Laboratoire de Recherche en Informatique et Télécommunication (LARIT), INPHB, Abidjan, Côte d’Ivoire M. Upendra Kumar Department of Computer Science and Engineering, Muffakham Jah College of Engineering and Technology, Affiliated to OU, Hyderabad, Telangana, India M. Vasim Babu KPR Institute of Engineering and Technology, Arasur, Coimbatore, Tamil Nadu, India Renu Vig Panjab University, Chandigarh, India C. N. S. Vinoth Kumar SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India R. Vishal Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai, India Venkata Nitin Voona School of Electronics Engineering, Vellore Institute of Technology, Vellore, India C. Xavier Department of Computer Science and Engineering, Christ (Deemed to be University), Bengaluru, India
Editors and Contributors
xxv
Vaishali Yadav Manipal University Jaipur, Jaipur, Rajasthan, India B. Yamuna Department of Mathematics, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India Mária Ždímalová Slovak University of Technology in Bratislava, Bratislava, Slovakia
Design and Analysis of Genetic Algorithm Optimization-Based ANFIS Controller for Interleaved DC-DC Converter-Fed PEMFC System CH Hussaian Basha , Shaik. Rafikiran , M. Narule , G. Devadasu , B. Srinivasa Varma , S. Naikawadi, A. Kambire, and H. B. Kolekar Abstract The Maximum Power Point Tracking (MPPT) controllers are used for enhancing the working performance of proton exchange membrane fuel cell (PEMFC)-related power generation systems. In this article, a genetic optimizationbased Artificial Neuro-Fuzzy Inference System (ANFIS) concept is applied to the interleaved non-isolated boost converter-interfaced fuel cell stack system in order to improve the operating efficiency of a transformerless DC-DC converter. The proposed MPPT controller is compared with the other conventional adaptive Perturb & Observe controller in terms of settling time of peak power point, oscillations related to fuel cell output voltage, and tracing time period of MPPT controller. The second objective of this work is design of an interleaved DC-DC converter for improving the output voltage profile of the fuel block. The characteristics of this converter are wide output operation, less potential stress, and more voltage gain. Keywords Duty ratio · Genetic algorithm · High MPP tracking speed · Interleaved DC-DC converter · Low oscillations of MPP
1 Introduction Nowadays, most of the global industries as well as human beings are focusing on automotive systems because of its fast growth, low maintenance cost, less dependence on fossil fuels, and high profits [1]. In addition, the availability of fossil fuels is CH Hussaian Basha (B) Nitte Meenakshi Institute of Technology, Karnataka 560064, India e-mail: [email protected] Shaik. Rafikiran Sri Venkateswara College of Engineering (Autonomous), Tirupati, India M. Narule · B. Srinivasa Varma · S. Naikawadi · A. Kambire · H. B. Kolekar Department of EE, Nanasaheb Mahadik College of Engineering, Peth, India e-mail: [email protected] G. Devadasu CMR College of Engineering and Technology (Autonomous), Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_1
1
2
CH Hussaian Basha et al.
reducing slowly. Hence, the power generation cost of fossil fuel-dependent systems is very high. From the literature review, the conventional or nonrenewable power production systems are illustrated as thermal, nuclear, oil, and natural gasses [2]. The disadvantage of conventional energy systems is global warming [3]. All over the world, coal is the very inexpensive and efficient way to produce electricity. The features of thermal power plants are cheap in coal use and require less space for installing [4]. The demerits of coal power production systems are that it requires high cost for starting the boilers and is difficult in an ash handling plant [5]. However, the demerits of coal systems are limited by using the oil power stations. The merits of oil power generation systems are high energy density, encourage the economy of the world, and are moderately reliable [6]. But, the drawbacks of oil power systems are high air pollution and ozone depletion. In article [7], nuclear power systems are utilized for converting explosion radius energy into useful active power. The merits of nuclear power plants are low carbon energy source, less carbon footprints, less cost in operation, and high energy density [8]. The disadvantages of nuclear power production systems are high initial cost and high impact on human life [9]. There are many disadvantages due to the conventional energy systems which are limited by using the nonconventional energy systems. The nonconventional energy systems are illustrated as solar, wind, tidal, and fuel stack. The features of wind power generation systems are cost-effective, clean energy, less working cost, and efficient utilization of land space and create more economy for human lives [10]. The drawbacks of wind energy systems are unpredictable, continuous interruptible power supply, less flexible, and high impact on wildlife [11]. Solar is a natural source which is available freely in nature. The working nature of solar is similar to the normal P–N junction diode [12]. The features of solar PV are less maintenance cost and environmental free source of energy. The only disadvantage is less continuity in supply. So, in this article, a PEMFC is used for automotive systems in order to run the vehicle without any distortions [13]. The fuel stack demands every year are mentioned in Fig. 1. The major consideration of fuel cells is nonlinear behavior which is solved by using the maximum power point finding controller. At various temperature conditions, the fuel cell gives various peak power points on the I-V curve. In article [14], Fig. 1 Yearly fuel stack production strategy in terms of units
Design and Analysis of Genetic Algorithm Optimization-Based ANFIS …
3
a P&O controller is utilized for continuous tracing of MPP. Here, the perturbation of power has been done and it is equated with the previously stored data. The equated results give the positive signal, and then the perturbation moves in identical direction or else the variation moves in inverse direction. The disadvantages of the P&O method are more oscillations in fuel cell output voltage, requiring more sensors to sense the fuel stack output variables [15]. Due to this situation, an incremental conductance methodology is used in article [16] for fast improving the entire hybrid system performance. This incremental conductance concept gives less power oscillations and high MPP tracing speed when equated with the other conventional controller. The only drawback is more implementation cost. However, the disadvantages of above-explained MPPT techniques are compensated by using an adaptive variable step size P&O controller [17]. In this adaptive controller, at first, an extreme step value is utilized for making the fast convergence speed of fuel cell MPP. After that, the power point finding controller step length is optimized to determine the accurate position of MPP. But, this MPPT method is most suitable for the uniform working temperature of the fuel stack. At the dynamic working temperature of a fuel cell, an ANFIS is used for optimizing the overall system size.
2 Operation of Fuel Cell Power System At present, there are different types of fuel stack modules offered in the market. The major types of fuel cell modules are alkaline fuel module, direct methanol fuel module, PEMFC, and solid oxide-based fuel module [18, 19]. In article [20], the authors used the phosphoric acid-based fuel cell module for running the batteryrelated electric vehicle system. The most attractive features of this fuel cell are available in commercial, high lifetime, and low market cost when compared to the other fuel cell topologies. The molten carbonate fuel cell is used in article [21] for enhancing the efficiency of internal fuel processing. This type of fuel cell works at high temperature conditions. The molten carbonate fuel cell stack has the instability in electrolyte and produces high carbon dioxide content. So, in this work, PEMFC is considered for electric vehicle application. The features of PEMFC are high lifetime period, low power losses, and most adoptable by automotive systems. The operation of the selected fuel cell is explained by using Fig. 2. From Fig. 2, the hydrogen chemical and water membrane are converted to electricity by utilizing the redox chemical reaction. Here, each fuel module provides uninterrupted energy until supplying the hydrogen input. Whereas in batteries, the fuel is available by inbuilt and the working of batteries is completely dependent on the oxidant material’s chemical reaction. The applications of fuel cell technology are residential, commercial, and industry-related systems. The fuel cell design constants and its power curves are shown in Table 1 and Fig. 3. Based on Fig. 2, the chemical oxidation of fuel cell is obtained as, H2 → 2H + + 2e−
(1)
4
CH Hussaian Basha et al.
H2 +
1 O2 → H2 O + Energy 2
2H + + 2e− +
(2)
1 O2 → H2 O 2
(3)
V0 = N ∗ VFC
(4)
VOh = E Ot − VFC − VAc − VCo
(5)
VFC = E Ot − VOh − VAc − VCo
(6)
As related to Eq. (1), it is clearly determined that the input supply of a fuel cell is hydrogen and this hydrogen is converted into hydrogen ions along with the electrons. The separated electrons flow to the external circuit in order to generate the light energy. After that, the hydrogen ions are transferred from the gas diffusion layer to the cathode. In the cathode layer, the oxygen is combined with the input fuel for delivering the water. Here, each and every fuel cell gives very less power Fig. 2 Working strategy of proton exchange membrane fuel cell system
Table 1 Complete design parameters of fuel module at various working temperatures
Variables
Specifications
Peak working current of stack (I MPP )
52A
Peak working voltage of PEM fuel cell (V MPP )
24.0 V
Utilized open circuit potential of PEMFC (V OC ) 42.00 V Maximum generated power of PEMFC
1260 W
Pressure of oxygen on PEMFC
1.00 bar
Flow rate of air in fuel stack (I npm )
2.4*103
Hydrogen-related partial pressure on fuel stack
1.5 bar
Basic air utilizing quantity in PEMFC (I pm )
4.615*103
Total cells utilized (n)
42.00
Design and Analysis of Genetic Algorithm Optimization-Based ANFIS …
5
Fig. 3 PEM fuel cell output, a voltage versus current characteristics, plus, b power versus current characteristics
which is not useful for industry application. So, there are multiple types of fuel cells that are interconnected for improving the power supply capability of fuel-dependent electric vehicle systems. The total selected fuel cells for forming a module are given in Eq. (4). In Eq. (4), the variable ‘N’ gives the entire amount of cells in the fuel stack, and V0 is total cell output voltage. Similarly, the variable V FC is defined as one fuel cell potential which is obtained by using Eq. (5). The detailed working considerations of the fuel system are shown in Table 1. The ohmic polarization loss, active, and concentrated ohmic polarization losses are defined as V Oh , V Ac , and V Co , respectively.
3 Analysis of Transformerless Interleaved DC-DC Converter From the previous published articles, the power converters are illustrated as isolated with transformer, and non-isolated power converters. The isolated converters require an extra rectifier circuit for protecting the semiconductor switches from high input power supply. In article [22], the authors used the isolated based zero voltage switching-related power converter for hybrid fuel cell grid-connected systems in order to make the constant grid voltage. The features of this topology are high voltage gain, more flexible, less manufacturing cost, and wide output operation. But, it gives more power conduction losses. So, to compensate for the disadvantages of conventional power converters, in this article, an interleaved methodology is used for designing the transformerless boost converter.
6
CH Hussaian Basha et al.
The proposed non-isolated converter consists of two switches which are defined as S x and S y (refer to Fig. 4 and Table 2). Here, the metal oxide field effect transistors are selected as the switches. The features of power MOSFETs are high voltage withstand capability, less on resistance, ability to optimize the size, lower power absorption, and high drain resistance. But, it consists of many disadvantages which are less lifetime, very slow operation, and high heat generation. In the proposed converter, the variables L x , L y , and L z are working as the input and output inductors in order to filter out the voltage oscillations and current ripples. The selected inductor voltages and currents are equal to V Lx , I Lx , V Ly , I Ly , V Lz , and I Lz , respectively. Similarly, the voltage across the switch is equal to VDS and its corresponding current flowing through the switch is IDS . The diodes Dx , Dy , and Dz voltages are represented as V Dx , V Dy , and V Dz . The voltage conversion ratio of interleaved converter is derived as, ⎧ ⎪ ⎨
VFC VCx = (1−D) VFC VCy = VCx + (1−D) ⎪ ⎩ VFC VCz = (1−D)
(7)
Vout = VCx + VCy − VFC
(8)
V0 2+ D = VFC 1− D
(9)
L x = L y = L z = L eq = Cx =
D ∗ VFC I ∗ f s
Vout R ∗ f s ∗ VCx
Cy = Cz = Ceq =
D ∗ Vout R ∗ f ∗s V
Fig. 4 Proposed high step-up non-isolated power converter for fuel stack system
(10) (11) (12)
Design and Analysis of Genetic Algorithm Optimization-Based ANFIS …
7
Table 2 Working strategy of proposed non-isolated converter Qq
Qr
Dp
Dq
Dr
Working
Working
Stopped
Stopped
Stopped
Working
Stopped
Working
Stopped
Working
Stopped
Working
Stopped
Working
Stopped
4 Genetic Algorithm Optimization-Based ANFIS Controller In article [23], the GA is used for improving the output power of hybrid fuel celldependent micro-grid-connected networks. But, the drawbacks of this technique are high complexity in implementation, more human resources necessity, high manufacturing cost, and excessive size. Hence, the GA-based proportional controller does not give any accurate results at maximum power point evaluating. Now, most of the research scholars are implementing the artificial neural network-related power controllers for standalone as well as grid-integrated systems. In article [24], the authors used the radial function-related power point tracking controller for fuel cellbased permanent magnet motors in order to run automotive systems. Here, the radial function works as an action function. The merits of radial function neural networks are time series prediction, function approximation, classification-related issue solving, and system optimization. The problems of radial function-related neural networks are very slow classification and high computational time. So, the authors used the multilayer perceptron networks for achieving the accurate response. The features of multilayer neural networks are strong tolerance to the supply side noise, ability of online learning, good generalizing, and fast response. But it required a lot of space for designing the network. However, the drawbacks of neural controllers are compensated by utilizing the fuzzy logic controller. The fuzzy logic controllers are very popular and the most utilized soft computing technique for solving the most complex nonlinear problems. In article [16], the researchers proposed the type-2 fuzzy-related controller for solid oxide-related fuel cell systems. These controller features are wide operating control, more flexibility, highly customizable, and more efficient. The demerits of fuzzy systems are needed for efficient human resources for implementing the controllers. So, in this work, a GAfunctioned ANFIS technique is utilized for fast variation fuel cell temperature and partial pressure conditions. The merits of ANFIS are pattern reorganization, subway controlling, washing-related machines, power transmission, and vacuum cleaners. The following temperatures and partial pressure signals are given to the proposed controller as given in Fig. 5.
8
CH Hussaian Basha et al.
Fig. 5 Genetic algorithm-dependent ANFIS power point finding technique
If p is a1 , and q is b1 at that time r = A1 ∗ P + B1 ∗ q + r1
(13)
If p is a2 , and q is b2 at that time r = A2 ∗ P + B2 ∗ q + r2
(14)
r1,t = μa1 ( p) + μa2 ( p); x = 1, 2, . . . n
(15)
r1,t = μb1 (q) + μb2 (q); x = 1, 2, . . . n
(16)
5 Simulation Results of Proposed System Here, the fuel cell-supplied interleaved boost converter system is designed successfully by utilizing the MATLAB-Simulink tool. The selected fuel cell supply power is 1260 W, and the selected converter input and output inductor values are 2.02mH, 1.96mH, plus 1.98mH, respectively. Similarly, the selected capacitor values of the converter are 465µF, 271µF, plus 285µF, respectively. The converter input side parameters are helpful for stabilizing fuel supply voltages.
Design and Analysis of Genetic Algorithm Optimization-Based ANFIS …
9
Fig. 6 Variation of input supply temperature to the PEMFC system
In addition, the output side converter capacitors are helpful for balancing the loads. Here, the adjustable step value P&O controller and GA-dependent ANFIS controllers are compared in terms of fuel cell supply voltages and powers. From Fig. 6, it is given that the supply temperature at 0.2 s is constant. After that, the supply temperature suddenly dropped to 312 K at 0.4 s as given in Fig. 6. At 0.6 s time period, the supply temperature is raised to 352 k. Based on Fig. 7a–c, the conventional MPPT technique-related fuel cell output parameters at 332 k temperature are 910.8 W, 33A, and 27.6 V, respectively. Similarly, the ANFIS-related MPPT controller gives more output power, current, and voltage when compared to the other conventional controller. The obtained fuel cell output parameters by applying ANFIS are 1208 W, 43A, and 28.11 V, respectively. The achieved maximum converter output parameters are given in Fig. 7d–f.
6 Conclusion The genetic algorithm related to ANFIS controller is designed successfully by applying a MATLAB-Simulink tool. Based on the simulative performance results, it is determined that the ANFIS controller gives optimum switching pulses to the incorporated transformerless converter. The merits of the proposed hybrid MPPT method are high tracking efficiency, more accurate, less steady state oscillations, and less power conduction losses. The fuel cell output voltage is stepped up by utilizing the interleaved three-phase converter. The attractive feature of this DC-DC converter is high voltage gain. In addition, this converter gives wide input operation as well as output operation. Based on the simulation results, the converter gives less power stress on semiconductor devices.
10
CH Hussaian Basha et al.
Fig. 7 Fuel cell fed output voltage, current, and power waveforms by applying various power point finding controllers
References 1. Song W, Liu XY, Hu CC, Chen GY, Liu XJ, Walters WW, Liu CQ (2021) Important contributions of non-fossil fuel nitrogen oxides emissions. Nat Commun 12(1):1–7 2. Bouyghrissi S, Murshed M, Jindal A, Berjaoui A, Mahmood H, Khanniba M (2022) The importance of facilitating renewable energy transition for abating CO2 emissions in Morocco. Environ Sci Pollut Res 29(14):20752–20767 3. Basha CH, Rani C (2020) Different conventional and soft computing MPPT techniques for solar PV systems with high step-up boost converters: A comprehensive analysis. Energies 13(2):371 4. Basha CH, Rani C, Odofin S (2017) A review on non-isolated inductor coupled DC-DC converter for photovoltaic grid-connected applications. Int J Renew Energy Res (IJRER) 7(4):1570–1585 5. Ding H, Huang H, Tang O (2018) Sustainable supply chain collaboration with outsourcing pollutant-reduction service in power industry. J Clean Prod 186:215–228 6. Blanco JM, Pena F (2008) Increase in the boiler’s performance in terms of the acid dew point temperature: environmental advantages of replacing fuels. Appl Therm Eng 28(7):777–784
Design and Analysis of Genetic Algorithm Optimization-Based ANFIS …
11
7. Park J, Kim T, Seong S, Koo S (2022) Control automation in the heat-up mode of a nuclear power plant using reinforcement learning. Prog Nucl Energy 145:104107 8. Kaygusuz K, Avci AC (2021) Nuclear power in Turkey for low carbon economy and energy security: a socioeconomic analysis. J Eng Res Appl Sci 10(2):1881–1889 9. Hussaian Basha CH, Rani C (2020) Performance analysis of MPPT techniques for dynamic irradiation conditions of solar PV. Int J Fuzzy Syst 22(8):2577–2598 10. Panwar NL, Kaushik SC, Kothari S (2011) Role of renewable energy sources in environmental protection: a review. Renew Sustain Energy Rev 15(3):1513–1524 11. Wang S, Wang S (2015) Impacts of wind energy on environment: a review. Renew Sustain Energy Rev 49:437–443 12. Pandey S, Kesari B (2018) Consumer purchase behaviour of solar equipments: paradigm shift towards the ecological motivation among rural working consumers in developing countries. J Adv Res Dyn Control Syst 10:363–375 13. Curtin DE, Lousenberg RD, Henry TJ, Tangeman PC, Tisack ME (2004) Advanced materials for improved PEMFC performance and life. J Power Sources 131(1–2):41–48 14. Chen T, Liu S, Zhang J, Tang M (2019) Study on the characteristics of GDL with different PTFE content and its effect on the performance of PEMFC. Int J Heat Mass Transf 128:1168–1174 15. Basha CH, Rani C (2020) Design and analysis of transformerless, high step-up, boost DC-DC converter with an improved VSS-RBFA based MPPT controller. Int Trans Electr Energy Syst 30(12):e12633 16. Basha CH, Rani C (2022) A new single switch DC-DC converter for PEM fuel cell-based electric vehicle system with an improved beta-fuzzy logic MPPT controller. Soft Comput 1–20 17. Hussaian Basha CH, Naresh T, Amaresh K, Preethi Raj PM, Akram P (2022) Design and performance analysis of common duty ratio controlled zeta converter with an adaptive P&O MPPT controller. In: Proceedings of international conference on data science and applications. Springer, Singapore, pp 657–671 18. Yamamoto O (2000) Solid oxide fuel cells: fundamental aspects and prospects. Electrochim Acta 45(15–16):2423–2435 19. Gülzow E (1996) Alkaline fuel cells: a critical view. J Power Sources 61(1–2):99–104 20. Kiran SR, Basha CH, Kumbhar A, Patil N (2022) A new design of single switch DC-DC converter for PEM fuel cell based EV system with variable step size RBFN controller. S¯adhan¯a 47(3):1–14 21. Dicks AL (2004) Molten carbonate fuel cells. Curr Opin Solid State Mater Sci 8(5):379–383 22. Rafikiran S, Basha CH, Murali M, Fathima F (2022) Design and performance evaluation of solid oxide-based fuel cell stack for electric vehicle system with modified marine predator optimized fuzzy controller. Mater Today: Proc 60:1898–1904 23. Gabbar HA, Abdelsalam AA (2014) Microgrid energy management in grid-connected and islanding modes based on SVC. Energy Convers Manag 86:964–972 24. Kiran SR, Basha CH, Singh VP, Dhanamjayulu C, Prusty BR, Khan B (2022) Reduced simulative performance analysis of variable step size ANN based MPPT techniques for partially shaded solar PV systems. IEEE Access 10:48875–48889
A Digital Transformation (DT) Model for Intelligent Organizational Systems: Key Constructs for Successful DT Michael Brian George and Grant Royd Howard
Abstract Intelligent organizational systems or intelligent organizations are being created through processes of digital transformation (DT) that typically involves intelligent technologies such as robotic process automation (RPA), the Internet of Things (IoT), artificial intelligence (AI), machine learning (ML), blockchain, drones, virtual reality (VR), 3D printing, big data and big data analytics (BDA). However, there is the significant problem of high DT failure rates with severe financial losses. Thus, understanding the key constructs for DT success in terms of organizational performance is an essential knowledge requirement and was the research problem and knowledge gap addressed by the study. The study followed an integrative literature review method since the purpose was analysis and synthesis for research model development. The study exposed the key constructs for successful DT in terms of organizational performance, namely organizational performance, sustainability, information technology (IT)/digital capability, change management, dynamic capabilities and risk management. This knowledge enables DT management to effectively manage DT toward success. For DT and related scientists, the study establishes the foundations for measuring DT and the associated organizational performance and facilitates DT theory progression and new knowledge generation. Keywords Change management · Digital transformation · Dynamic capabilities · Information technology/digital capabilities · Integrative literature review · Intelligent organizational systems · Organizational performance · Risk management · Sustainability
M. B. George · G. R. Howard (B) University of South Africa (Unisa), 28 Pioneer Ave, Florida, Roodepoort 1709, South Africa e-mail: [email protected] M. B. George e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_2
13
14
M. B. George and G. R. Howard
1 Introduction 1.1 Background and Context A system is an organized and collective entity with purpose, comprising interrelated and interdependent elements. An intelligent system is a system that can interact with its environment and other systems, demonstrate action control, perception and purposeful reasoning, behave according to social norms and rationality and adapt itself by learning [1]. Organizations are significant social systems and can be described as complex, organic, alive and dynamic, with system features including open and closed exchange, interdependence, homeostasis and nonsummativity [2]. Furthermore, an organization can be regarded as an intelligent organizational system or an intelligent organization if it can sense its environment and learn and adapt itself to improve its performance in relation to its effectiveness and efficiency [3]. Importantly, intelligent organizations today are being developed through processes of digital transformation (DT) that typically involves intelligent technologies such as robotic process automation (RPA), the Internet of Things (IoT), artificial intelligence (AI), machine learning (ML), blockchain, drones, virtual reality (VR), 3D printing, big data and big data analytics (BDA) [4], and these are often supported by cloud computing. These intelligent technologies are reshaping business practices and processes across industries and in both public and private organizations [5]. Nevertheless, globally, many organizations are having difficulty with DT since it is reported that most organizations fail in their DT initiatives [6]. Regardless, the trend of executives and organizations either considering DT or implementing DT continues [7].
1.2 Research Problem, Questions and Objective While there have been many DT successes, there is the significant problem of high DT failure rates [8], and the significance is demonstrated by DT failure costs estimated at over $900 billion in 2018 [9]. Furthermore, in a study of over a thousand global organizations that underwent DT initiatives, only, 5% were able to meet or surpass their business objectives [10]. In another study of 289 global organizations that underwent DT initiatives, only, 8% were able to meet their business objectives [10]. Also, in a survey of 2380 senior executives across seventy-six countries, only, 5% realized positive results from their DT initiatives [10]. Other reports indicate that 84% of companies fail in their DT initiatives [11], over 70% of companies fail in their DT initiatives [6], and 60% or more fail in their DT initiatives [6]. This exposes a significant global real-world problem relevant to all countries, economies, organizations and the billions of individuals to which they provide products and services.
A Digital Transformation (DT) Model for Intelligent Organizational …
15
To understand the extent to which prior research has addressed this real-world problem, an initial literature survey was conducted in July and August 2021, and of the articles analyzed, 40% focused on DT business/organizational implications, 36% on DT drivers, factors and concepts, 18% on DT leadership and skills and 6% on DT governance. Additional searches returned DT literature focusing on DT benefits [12], DT successes [13] and DT failures [6]. From the literature analysis, it was evident that the literature had not focused on the important constructs for DT success in terms of organizational performance, the absence of which precludes vital knowledge for effective management, relevant insights into key aspects and their interrelationships, for investigating, answering questions, identifying areas needing improvement and for guiding DT initiatives to address the problem of low DT success rates. Against the background of extensive DT failure reports and the huge costs resulting from DT failures, understanding the constructs for DT success in terms of organizational performance is an essential knowledge requirement and was the research problem and knowledge gap that the study addressed. In addition, this gap was emphasized as an important research opportunity and agenda in recent DT literature [14]. Addressing the research problem is the study’s original contribution to the field and provided scientific evidence and original knowledge about the important constructs for DT success in terms of organizational performance, which is valuable to industry and organizations trying to leverage digital and intelligent technologies to create efficiencies, develop new customer value, establish novel digital business opportunities and build intelligent organizational systems. The study’s research objective was to explain the key constructs for successful DT in terms of organizational performance by analyzing and synthesizing relevant literature. Corresponding to the research problem and research objective, the study aimed to answer the following research question, what are the key constructs for successful DT in terms of organizational performance? There are five sections in the paper with Sect. 1 introducing and contextualizing the study and presenting the research problem, objective and question. Section 2 is the literature analysis, and Sect. 3 explains the study’s methodology. Section 4 provides the developed research model to answer the research question and address the research problem, and Sect. 5 concludes with the study’s contribution, limitations and future research opportunities.
2 Literature Analysis and Synthesis This section presents each of the emerging significant themes and constructs in the literature that develops the study’s research model in Sect. 4.
16
M. B. George and G. R. Howard
2.1 Digital Transformation (DT) Digital and intelligent technologies have led to the phenomenon known as digital transformation (DT), which is a disruption of the traditional status quo and a radical redesign of business models, services, products, cultures, processes and organizational structures [15]. DT attempts to optimize business performance, enable agility, enhance customer experiences, promote competitiveness [5] and develop intelligent organizations [4]. At the core of DT are digital technologies and organizational change, which is a consequential process of change to increase business efficiency and effectiveness [7]. DT is not simply about utilizing digital technologies effectively; DT requires the reorganization of multiple organizational aspects, including the financial, human, physical and intellectual aspects of an organization. DT is digital innovation and the alignment of an organization to the novel opportunities created by digital technologies [16]. Successful DT is reported to depend on the alignment of four key dimensions, namely technologies, change in value creation, structural changes and financial aspects [17]. The technologies dimension indicates the importance of an organization’s choice of technologies, including its approach to developing new digital technologies and the digital capabilities to exploit them [18]. The changes in value creation dimension relate to the effect of DT on an organization’s value creation activities. The structural changes dimension relates to an organization’s change processes, structures and capabilities involving novel technologies. The financial aspects dimension considers the ability of an organization to finance DT and responds to the contending issues of the core business [19]. In addition, DT involves the implementation of innovative technologies, products, services, processes and business models, requiring an organization to manage the transformation in a sustainable manner [8], indicating that sustainability is an important consideration for DT. However, DT can be challenging. The continuous and accelerated development in digital technologies, any organizational deficiencies and uncertain market conditions can complicate DT and present challenges which may require considerable effort to overcome [20]. Some of the challenges include the lack of digital knowledge and leadership, rigid culture, change resistance, ambiguous objectives and vision and inadequate alignment and collaboration [21].
2.2 Organizational Performance An organization’s success depends on its performance since this is what determines its continued existence [22]. Importantly, DT impacts organizational performance [23], and during any DT, an organization should identify and measure organizational performance for successful management of the DT process [14]. Ultimately, for DT to be successful, DT would have to positively contribute to organizational performance.
A Digital Transformation (DT) Model for Intelligent Organizational …
17
Organizations generally measure their performance using financial measures, but studies in business strategy recommend that multiple measures of organizational performance be employed as the perception of DT may vary among different stakeholders and multiple measures would prevent executives from being misled and from making inappropriate strategic decisions [24]. There is no universally accepted single measure of organizational performance, but its importance is universally accepted [25], and organizational performance can refer to both financial and non-financial aspects of an organization [26]. Nevertheless, the choice of organizational performance measures should be based on which measures are most appropriate for the research objective [25]. With specific reference to DT, it is reported that organizational performance measures should include organizational agility, operations flexibility, digital resources, transformation management, customer focused and value proposition measures [27]. In addition, a growing concern about environmental, social and economic sustainability has contributed to the demand for sustainability performance measures in relation to DT [28].
2.3 Sustainability Building a sustainable future is a priority for governments around the world and private and public organizations. Sustainability is defined as the ability of a system to sustain itself or to persist over time [29]. Sustainability involves the use of natural resources, human resources and financial resources in a way that ensures long-term continuity [30]. To gain a sustainable competitive advantage, organizations should create sustainable products and services and employ sustainable methods. As a result of competitive pressures, social pressures and institutional pressures, organizations are seeking ways to improve and address their impact on society and the environment, as well as transform themselves digitally. DT has accelerated and necessitated sustainability performance in terms of social, economic and environmental sustainability [31]. Sustainability for organizations and governments is becoming dependent on the development of intelligent systems to integrate human and technological resources that respond to the changing environment [32]. Digital technologies are not just a means for organizations and governments to benefit economically; they are also a means for organizations and governments to enhance and support sustainable business values. Increasingly, digital technologies are supporting innovation and revenue creation and are encouraging organizations to develop and establish sustainability practices. DT has become a critical part of organizational sustainability initiatives from strategic planning to the adoption of new products and services [33] and impacts many aspects of sustainable development [34]. Thus, DT can influence all three sustainability dimensions, namely the economic, social and environmental dimensions.
18
M. B. George and G. R. Howard
2.4 Information Technology (IT)/Digital Capabilities Information technology (IT)/digital technologies are a key and central aspect in DT, with IT/digital capabilities reported to positively affect organizational performance through the mediating effect of DT [23] and IT/digital capabilities enable innovation [35]. An organization’s ability to obtain, integrate and deploy IT/digital resources to meet an organization’s business objectives and exploit business opportunities is referred to as IT/digital capabilities. IT/digital capabilities are valued by organizations as an asset and provide a response to and enable the creation of disruptive innovations [36]. IT/digital capabilities support innovation and the adoption and implementation of novel digital technologies and DT [36]. IT/digital resources include physical IT/digital infrastructure, human IT/digital resources and intangible IT/digital resources [37]. In addition, DT success depends on an organization’s ability to manage its IT/digital capabilities comprising IT/digital infrastructure capabilities, IT/digital business spanning capabilities and IT/digital proactive stance [35, 38]. In particular, IT/digital proactive stance refers to the exploration of new IT/digital innovations to produce profitable business opportunities [23] and is viewed in the study as a synonym for digital innovation capabilities.
2.5 Change Management Since DT involves digital technologies and organizational change [7], another key and central aspect of DT is the management of the corresponding organizational change or change management [39, 40]. Change management has been identified as one of the DT success determinants and is a strategic priority that enables focus on the human resource elements in addition to the required focus on the digital technology elements and facilitates participation, empowerment, innovative-thinking and knowledge sharing [41]. In addition, an important part of successful DT change management is considered to be innovation management, where digital innovation exploration is encouraged and can be safely isolated in projects with the option of termination [42]. There are many different change management processes proposed in the literature, and the most appropriate would have to be evaluated based on the specific context of each DT. For instance, one proposal for AI-based DT involves both bottom-up and top-down change management processes, including the introduction of a set of AI key performance indicators (KPIs), integration of AI components into the top-level strategy, introduction of an AI education system for all employees, facilitation of frequent communication throughout the organization, initiation of measurable shortterm AI projects and the usage of AI experts to collaborate throughout and lead the AI DT [43].
A Digital Transformation (DT) Model for Intelligent Organizational …
19
2.6 Dynamic Capabilities Dynamic capabilities are emphasized as an important construct for successful DT [44]. Dynamic capabilities are an organization’s potential to effectively adapt itself to a fast-changing external environment, which is a central aim of DT. In addition, dynamic capabilities can be explained in terms of an organization’s ability to sense threats and opportunities, to make timely and market-oriented decisions and to change its resource base. Furthermore, the literature exposes the relationship between dynamic capabilities and organizational learning, where organizational learning explains processes of dynamic capabilities [45]. Thus, dynamic capabilities involve sensing, learning, integrating and coordinating, and include the detection of silent signals of potential problems or undiscovered opportunities [46]. Furthermore, routines for dynamic capabilities relevant for DT include inside-out digital infrastructure sensing, cross-industrial digital sensing, formulation of organizational boundaries, digital strategy creation, decomposition of DT into projects and the development of a unified digital infrastructure [47].
2.7 Risk Management DT is based on non-traditional digital and intelligent technologies that have resulted in new organizational risks that require specific risk management for successful DT [48]. Risk management refers to the identification, evaluation and control of potential losses, and in the context of the study, losses resulting directly or indirectly from DT. The risks resulting from DT include strategic, technology, operations, third party, regulatory, forensic, cyber, resilience, data leakage and privacy risks [49]. Strategic risks refer to DT forcing changes to successful strategies resulting in losses; technology risks include scalability, compatibility and accuracy problems; operations risks include process and control failures; third party risks involve data sharing, operations dependencies and technology integration risks; regulatory risks refer to general, industry and technology laws; forensic risks relate to data evidence capturing risks; cyber-risks include unauthorized access, confidentiality and integrity of data risks; resilience risks involve service unavailability and disruption of operations; data leakage risks refer to data in use, in transit and at rest risks, and privacy risks relate to the inappropriate use of personal data.
3 Methodology The study’s methodology is a literature review, which is a relevant and valuable method contributing to knowledge production by integrating many perspectives and findings within and across research domains, results in studies typically broader than
20
M. B. George and G. R. Howard
any single study and is a critical method for developing research models [50]. Specifically, the study followed an integrative literature review method since the purpose was analysis and synthesis for research model development instead of surveying every article published about the subject [50]. The method is founded on an interpretivist philosophy which refers to an epistemology where knowledge is subjectively and socially constructed through language and context [51]. Subsequently, the literature was interpreted, analyzed and synthesized by way of a qualitative thematic approach [52]. To determine which articles to include and exclude, inclusion and exclusion criteria were developed. Inclusion and exclusion criteria are important to facilitate objectivity. The study proceeded by doing literature searches using keywords based on the research problem and then used appropriate derivatives of these [53]. The searches were conducted primarily on Google Scholar and any included article had to be accessible from within the University of South Africa’s (Unisa’s) e-library to avoid predatory or fake articles. For each search conducted, inclusion and exclusion criteria were applied to the corresponding search results. Initially, at the identification stage, the following exclusion criteria were applied; all duplicates were excluded (DP); all non-English articles were excluded (LC); all articles whose full text could not be accessed were excluded (NF); all articles without evidence of peer-review or not accessible from within the Unisa e-library were excluded (NP). In addition, all articles were excluded that were not related to the research problem (NR1); all search results were excluded from the first irrelevant article onward where the search results were sorted by relevance and irrelevance means that an article’s main contribution is not related to the research problem (NR2), and all articles were excluded that only briefly or loosely used terms or synonyms related to the research problem (CA). The next stage was the screening stage where only those eligible articles that satisfied the inclusion criteria (PR) or (CR1) or (CR2) were included in the study. (PR) refers to partially related where an article focuses on specific aspects, constructs or uses different terminology, but directly relates to the research problem, (CR1) refers to closely related where an article focuses on the constructs that are relevant to the research problem, and (CR2) refers to an article that is a high-quality industry publication where the authors or publishers have an excellent reputation, and the publication refers to relevant aspects of the research problem. Figure 1 demonstrates the literature review process implemented based on the PRISMA guidelines [54].
4 Findings—Research Model The literature review guided the development of the research model by exposing significant themes and constructs in the domain. The literature emphasized the importance of organizational performance in relation to organizational survival and DT, such that where DT does not positively contribute to organizational performance DT
A Digital Transformation (DT) Model for Intelligent Organizational …
21
Fig. 1 Literature review process implemented based on the PRISMA guidelines
may be regarded as a failure. Thus, organizational performance was a key construct for managing DT. In terms of sustainability, economic sustainability typically relates directly to the financial aspects of organizational performance, so economic sustainability was excluded to prevent duplication. Also, there was overlap between social sustainability and the non-financial aspects of organizational performance, for instance employee development and organizational culture. Hence, social sustainability was included without restriction, but the non-financial aspects of organizational performance were limited to only those aspects not already covered by social sustainability. The literature also provided convincing arguments for the inclusion of IT/digital capabilities, change management, dynamic capabilities and risk management as vital constructs for successful DT. The resultant research model is presented in Fig. 2.
5 Conclusion The study addressed the research problem by developing a research model to guide organizational management as they steer their organizations through the processes of DT. The study answered the research question by exposing the key constructs for successful DT in terms of organizational performance. This knowledge enables DT management to understand the key concepts involved in DT and their interrelationships, which is vital for effectively managing DT. In addition, the study is
22
M. B. George and G. R. Howard
Fig. 2 Research model explaining the key constructs for successful DT
significant, indirectly, to the many millions of organizational stakeholders, including customers, suppliers and other stakeholders, that rely on organizations for daily goods and services. When the organizations fail or suffer severe losses due to failed DT, these stakeholders are negatively impacted. For DT and related scientists, the study establishes the foundations for measuring DT and related organizational performance using a positivistic epistemology for theory development and testing to gain precise and objective scientific knowledge. These foundations facilitate DT theory progression and new knowledge generation. Nevertheless, the main limitation of the study was its lack of empirical data for testing the research model in organizational settings. However, this limitation presents research opportunities for scientists to test the research model across industries and organizational types for new theory and knowledge development.
References 1. Molina M (2022) What is an intelligent system? https://www.researchgate.net/publication/344 334868_What_is_an_intelligent_system. Accessed 23 June 2022 2. Lawson RB, Anderson ED, Rudiger LP (2016) Introduction to organizations and systems. In: Psychology and systems at work. Routledge, Abingdon, pp 3–19 3. Adamczewski P (2018) Knowledge management of intelligent organizations in turbulent environment. In: Omazic MA, Roska V, Grobelna A (eds) 28th International scientific conference on economic and social development (ESD). Varazdin Development and Entrepreneurship Agency, Paris, pp 413–422 4. Maheshwari A (2019) Digital transformation: building intelligent enterprises. Wiley, Hoboken
A Digital Transformation (DT) Model for Intelligent Organizational …
23
5. Appio FP, Frattini F, Petruzzelli AM, Neirotti P (2021) Digital transformation and innovation management: a synthesis of existing research and an agenda for future studies. J Prod Innov Manag 38:4–20. https://doi.org/10.1111/jpim.12562 6. Straub F, Weking J, Kowalkiewicz M, Krcmar H (2021) Understanding digital transformation from a holistic perspective. In: 25th Pacific Asia conference on information systems (PACIS). Association for Information Systems, Dubai, pp 1–8 7. Hanelt A, Bohnsack R, Marz D, Antunes Marante C (2021) A systematic review of the literature on digital transformation: insights and implications for strategy and organizational change. J Manag Stud 58:1159–1197. https://doi.org/10.1111/joms.12639 8. Vial G (2019) Understanding digital transformation: a review and a research agenda. J Strateg Inf Syst 28:118–144. https://doi.org/10.1016/j.jsis.2019.01.003 9. Zobell S (2018) Why digital transformations fail: closing the $900 billion hole in enterprise strategy. https://www.forbes.com/sites/forbestechcouncil/2018/03/13/why-digital-transf ormations-fail-closing-the-900-billion-hole-in-enterprise-strategy/?sh=6446ce187b8b 10. Wade M, Shan J (2020) Covid-19 has accelerated digital transformation, but may have made it harder not easier. MIS Q Executive 19:213–220. https://doi.org/10.17705/2msqe.00034 11. Sainger G (2018) Leadership in digital age: A study on the role of leader in this era of digital transformation. Int J Leadersh 6:1–6 12. Tan FTC, Ondrus J, Tan B, Oh J (2020) Digital transformation of business ecosystems: evidence from the Korean pop industry. Inf Syst J 30:866–898. https://doi.org/10.1111/isj.12285 13. Gurbaxani V, Dunkle D (2019) Gearing up for successful digital transformation. MIS Q Executive 18:209–220. https://doi.org/10.17705/2msqe.00017 14. Verhoef PC, Broekhuizen T, Bart Y, Bhattacharya A, Qi Dong J, Fabian N, Haenlein M (2021) Digital transformation: a multidisciplinary reflection and research agenda. J Bus Res 122:889– 901. https://doi.org/10.1016/j.jbusres.2019.09.022 15. Gong C, Ribiere V (2021) Developing a unified definition of digital transformation. Technovation 102:1–17. https://doi.org/10.1016/j.technovation.2020.102217 16. Jewapatarakul D, Ueasangkomsate P (2022) Digital transformation: the challenges for manufacturing and service sectors. In: ECTI DAMT and NCON. IEEE, Chiang Rai, pp 19–23. https://doi.org/10.1109/ECTIDAMTNCON53731.2022.9720411 17. Matt C, Hess T, Benlian A (2015) Digital transformation strategies. Bus Inf Syst Eng 57:339– 343. https://doi.org/10.1007/s12599-015-0401-5 18. Jones MD, Hutcheson S, Camba JD (2021) Past, present, and future barriers to digital transformation in manufacturing: a review. J Manuf Syst 60:936–948. https://doi.org/10.1016/j.jmsy. 2021.03.006 19. Hess T, Matt C, Benlian A, Wiesböck F (2016) Options for formulating a digital transformation strategy. MIS Q Exec 15:123–139 20. Kutnjak A (2021) Covid-19 accelerates digital transformation in industries: challenges, issues, barriers and problems in transformation. IEEE Access. 9:79373–79388. https://doi.org/10. 1109/ACCESS.2021.3084801 21. Bouarar AC, Mouloudj S, Mouloudj K (2022) Digital transformation: opportunities and challenges. In: Mansour N, Ben Salem S (eds) Covid-19’s impact on the cryptocurrency market and the digital economy. IGI Global, pp 33–52. https://doi.org/10.4018/978-1-7998-9117-8. ch003 22. Muthuveloo R, Shanmugam N, Teoh AP (2017) The impact of tacit knowledge management on organizational performance: evidence from Malaysia. Asia Pac Manag Rev 22:192–201. https://doi.org/10.1016/j.apmrv.2017.07.010 23. Nwankpa JK, Roumani Y (2016) IT capability and digital transformation: a firm performance perspective. In: 37th International conference on information systems (ICIS). Association for Information Systems, Dublin, pp 1–16 24. Chen Y-YK, Jaw Y-L, Wu B-L (2016) Effect of digital transformation on organisational performance of SMEs. Internet Res 26:186–212. https://doi.org/10.1108/IntR-12-2013-0265 25. Richard PJ, Devinney TM, Yip GS, Johnson G (2009) Measuring organizational performance: towards methodological best practice. J Manag 35:718–804. https://doi.org/10.1177/014920 6308330560
24
M. B. George and G. R. Howard
26. Singh S, Darwish TK, Potoˇcnik K (2016) Measuring organizational performance: a case for subjective measures. Br J Manag 27:214–224. https://doi.org/10.1111/1467-8551.12126 27. Ahmad A, Alshurideh M, al Kurdi B, Aburayya A, Hamadneh S (2021) Digital transformation metrics: a conceptual view. J Manag Inf Decis Sci 24:1–18 28. Forcadell FJ, Aracil E, Úbeda F (2020) The impact of corporate sustainability and digitalization on international banks’ performance. Global Pol 11:18–27. https://doi.org/10.1111/1758-5899. 12761 29. Costanza R, Patten BC (1995) Defining and predicting sustainability. Ecol Econ 15:193–196. https://doi.org/10.1016/0921-8009(95)00048-8 30. Todorov V, Marinova D (2011) Modelling sustainability. Math Comput Simul 81:1397–1408. https://doi.org/10.1016/j.matcom.2010.05.022 31. George G, Schillebeeckx SJD (2022) Digital transformation, sustainability, and purpose in the multinational enterprise. J World Bus 57:101326. https://doi.org/10.1016/j.jwb.2022.101326 32. Weichhart G, Molina A, Chen D, Whitman LE, Vernadat F (2016) Challenges and current developments for sensing, smart and sustainable enterprise systems. Comput Ind 79:34–46. https://doi.org/10.1016/j.compind.2015.07.002 33. Gomez-Trujillo AM, Gonzalez-Perez MA (2021) Digital transformation as a strategy to reach sustainability. Smart Sustain Built Environ 1–31. https://doi.org/10.1108/SASBE-01-20210011 34. Renn O, Beier G, Schweizer P-J (2021) The opportunities and risks of digitalisation for sustainable development: a systemic perspective. GAIA Ecol Perspect Sci Soc 30:23–28. https://doi. org/10.14512/gaia.30.1.6 35. Ravesteijn P, Ongena G (2019) The role of e-leadership in relation to IT capabilities and digital transformation. In: Proceedings of the 12th IADIS international conference information systems 2019. IADIS Press, Utrecht, pp 171–179. https://doi.org/10.33965/is2019_201905 L022 36. Gong Y, Janssen M, Weerakkody V (2019) Current and expected roles and capabilities of CIOs for the innovation and adoption of new technology. In: 20th annual international conference on digital government research. ACM, Dubai, pp 462–467. https://doi.org/10.1145/3325112. 3325214 37. Bharadwaj AS (2000) A resource-based perspective on information technology capability and firm performance: an empirical investigation. MIS Q 24:169–196. https://doi.org/10.2307/325 0983 38. Ghosh S, Hughes M, Hodgkinson I, Hughes P (2022) Digital transformation of industrial businesses: a dynamic capability approach. Technovation 113:102414. https://doi.org/10.1016/ j.technovation.2021.102414 39. Bellantuono N, Nuzzi A, Pontrandolfo P, Scozzi B (2021) Digital transformation models for the I4.0 transition: lessons from the change management literature. Sustainability 13:12941. https://doi.org/10.3390/su132312941 40. Hartl E (2019) A characterization of culture change in the context of digital transformation. In: 25th Americas conference on information systems (AMCIS). Association for Information Systems, Cancun, pp 1–10 41. Ghobakhloo M, Iranmanesh M (2021) Digital transformation success under Industry 4.0: a strategic guideline for manufacturing SMEs. J Manuf Technol Manag 32:1533–1556. https:// doi.org/10.1108/JMTM-11-2020-0455 42. Glaser J, Shaw S (2022) Digital transformation success: what can health care providers learn from other industries? NEJM Catal Innov Care Delivery. 3:1–11. https://doi.org/10.1056/CAT. 21.0434 43. Valtiner D, Reidl C (2021) On change management in the age of artificial intelligence: a sustainable approach to overcome problems in adapting to a disruptive, technological transformation. J Adv Manag Sci 9:53–58. https://doi.org/10.18178/joams.9.3.53-58 44. Kraus S, Durst S, Ferreira JJ, Veiga P, Kailer N, Weinmann A (2022) Digital transformation in business and management research: an overview of the current status quo. Int J Inf Manag 63:102466. https://doi.org/10.1016/j.ijinfomgt.2021.102466
A Digital Transformation (DT) Model for Intelligent Organizational …
25
45. Souza CPDS, Takahashi ARW (2019) Dynamic capabilities, organizational learning and ambidexterity in a higher education institution. Learn Organ 26:397–411. https://doi.org/10. 1108/TLO-03-2018-0047 46. Vartiainen K (2020) In search of the “how” of dynamic capabilities in digital transformation: contradictions as a source of understanding. In: 26th Americas conference on information systems (AMCIS). Association for Information Systems, Salt Lake City, pp 1–5 47. Ellström D, Holtström J, Berg E, Josefsson C (2022) Dynamic capabilities for digital transformation. J Strateg Manag 15:272–286. https://doi.org/10.1108/JSMA-04-2021-0089 48. Chouaibi S, Festa G, Quaglia R, Rossi M (2022) The risky impact of digital transformation on organizational performance—evidence from Tunisia. Technol Forecast Soc Chang 178:121571. https://doi.org/10.1016/j.techfore.2022.121571 49. Deloitte Risk Advisory (2018) Managing risk in digital transformation. https://www2.deloitte. com/content/dam/Deloitte/za/Documents/risk/za_managing_risk_in_digital_transformation_ 112018.pdf. Accessed 29 June 2022 50. Snyder H (2019) Literature review as a research methodology: an overview and guidelines. J Bus Res 104:333–339. https://doi.org/10.1016/j.jbusres.2019.07.039 51. Myers MD (2013) Qualitative research in business and management. Sage, London 52. Schryen G, Wagner G, Benlian A, Paré G (2020) A knowledge development perspective on literature reviews: validation of a new typology in the IS field. Commun Assoc Inf Syst 46:134– 186. https://doi.org/10.17705/1CAIS.04607 53. Rowley J, Slack F (2004) Conducting a literature review. Manag Res News 27:31–39. https:// doi.org/10.1108/01409170410784185 54. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Br Med J (BMJ) 372:n71. https://doi.org/10.1136/ bmj.n71
Smart Sewage Monitoring Systems Sujatha Rajkumar, Shaik Janubhai Mahammad Abubakar, Venkata Nitin Voona, M. Lakshmi Vaishnavi, and K. Chinmayee
Abstract Sewage management is one of the most common and important domains, especially in big cities, where population density is more. So, monitoring of these sewage systems is an important and necessary task of the municipal authorities of the city as leakage of the sewage water has many adverse effects on the health of the living beings like spreading of various communicable and non-communicable diseases, inhaling problems due to the poisonous gasses, etc. The above problems can be addressed by using smart sewage monitoring systems based on IoT technology. Here, first, we will calculate the distance between the sensor and the water level in the sewage, and then as the ultrasonic sensor accuracy depends on the surrounding air humidity and temperature, so we will correct the initial calculated distance using the temperature and humidity values which are found using the DHT11 sensor. After calculating the actual distance, we will use the RGB LED as an indicator, like for different ranges of the distance/water levels, different colors will be emitted by the LED. If the object is very near or is less than some threshold value to the ultrasonic sensor, then the LED glows in red color; if the distance is greater than the threshold value, then it glows in blue color; and if the object is far away from the sensor, then it will glow in green color. With the help of the MQ-9 gas sensor, we will calculate the concentration of the poisonous or toxic gasses present in the sewer. With the S. Rajkumar · S. J. M. Abubakar (B) · V. N. Voona · M. Lakshmi Vaishnavi · K. Chinmayee School of Electronics Engineering, Vellore Institute of Technology, Vellore, India e-mail: [email protected] S. Rajkumar e-mail: [email protected] V. N. Voona e-mail: [email protected] M. Lakshmi Vaishnavi e-mail: [email protected] K. Chinmayee e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_3
27
28
S. Rajkumar et al.
use of a GPS module, we will track the location of the sewer where overflow or blockages have been detected. The entire setup is enclosed in a box which will be deployed/attached to the sewage lid for real-time monitoring. We have also used a PWM controlled servo motor to open and close the setup box under normal and overflow conditions, respectively, in order to protect the electronic components from the sewer water. Now, we will send an alert notification through the IFTTT app if any abnormal condition has occurred such as accumulation/formation of high concentrations of poisonous gasses or overflow of the sewage water. Storage of the incoming sensor data in firebase is done through Node-RED. Keywords Internet of things · Prototype · Node-RED · Sewage system · Sensor data · Poisonous gases · Data storage · Visualization
1 Introduction Leakages in the sewage system are a global issue. There have been hundreds of cases of sewage leakage being reported every year. Even though there is one sewer leakage, it can affect the entire city as harmful and deadly diseases may spread and cause severe health issues to the people. So, this problem needs to be solved in order to ensure a healthy and sustainable life for all living beings. Water levels in the sewage system must be monitored on a daily basis in order to prevent the sewage blockages and overflow of the sewage water. The sewage system also contains poisonous gasses like carbon-monoxide, hydrogen-sulfide, etc., which can cause death if inhaled in higher concentrations. So, the sewage workers need to get informed about the concentration levels through alert notifications of those poisonous gasses to ensure their safety. The municipal authorities in the city should have details/conditions of the sewage system of the city for daily monitoring, which can be done by providing a visualization of the sensor data collected in a sewage system through Web or mobile application, and also alert notifications must be sent to them regarding any abnormal conditions of a sewage system. If we know location coordinates of the particular sewer area, then we can be able to take any necessary actions immediately without any further damage to the surrounding environment and also to the living things in that area.
2 Literature Survey There have been a few previous research works on sewage monitoring. Ganesh et al. [1] presented an enhanced microcontroller system that detects hazardous, combustible gasses, and warns users. A closed loop system is created by real-time health monitoring equipment using a canal (safety precaution). A gas detector uses real-time data that is made available on the Internet available to identify harmful gasses. Kumar et al. [2] proposed a gadget that is utilized to monitor the worker’s
Smart Sewage Monitoring Systems
29
(sewage) pulse rate using a pulse oximetry sensor and to warn the worker when parameters deviate from an acceptable level for methane and atmospheric oxygen content. These metrics will quickly detect harmful gasses before any threat crops up at the moment. Kumar et al. [3] detailed the strategy to preserve sewage employees’ safety while they work on sewer pipes as suggested in the study. NodeMCU is connected to a number of sensors in this system that are used to assess the concentration of methane, the air quality index of the sewer environment, and the location of sewage employees. The data gathered by the sensors is transmitted to the Blynk cloud for analysis, which then transmits output. Jeffery et al. [4] proposed a smart sewerage system based on cloud computing, and LoRaWAN technologies were developed. The proposed solution in this study makes use of IoT devices as well as a system for workforce ticketing, which will lower CAPEX and OPEX costs while increasing productivity, data flexibility, and data dependability. Hisham et al. [5] explained that in order to keep track of the waste at the chosen site of the garbage collection area, this article proposes a smart waste collection monitoring and alert system based on the Internet of things (IoT). To monitor the level of garbage in the waste bins, the system uses an ultrasonic sensor connected to an Arduino UNO. Mantoro and Istiono [6] proposed a technique to improve an existing water-saving device by suggesting an automatic device that automatically opens and closes the water tab in order to reduce water waste. When the tub is filled, the smart home automatically shuts off the water supply. A water-tap will be turned off before it overflows; if an ultrasonic sensor detects, the water surface closes to it and warns the user of a greater water level. Here, the fuzzy logic algorithm and Arduino Uno are employed. To detect the water-level indicator and stop the water at the appropriate time, fuzzy logic is applied. Jaikumar et al. [7] explained that in order to organize the collection of waste, this research effort has suggested a design for waste segregation and monitoring utilizing message queuing telemetry transport (MQTT). Wastes are divided into dry and wet waste during segregation. Bins for dry and wet garbage contain sensors that can be used to monitor and analyze the level of the bin using an IoT platform. Singh and Borschbach [8] detailed on the extrinsic factors that may have an impact on the distance measurement and accuracy of the ultrasonic sensor which are covered in this study. Under all possible experimental circumstances, the distance between the object and the ultrasonic sensor is compared to determine the measured distance error (DE). Wang et al. [9] proposed a methodology of automating data collecting and enabling remote monitoring, and this paper contends that we will be able to enable predictive maintenance, which is based on big data analytics. Real-time data can be gathered from more than 500 sensors that have been partially deployed. For the purpose of data forecasting and anomaly detection, statistical data models for groundwater, sewage water flows, and rainwater can be designed. Asthana and Bahl [10] proposed a technique for detecting and monitoring the level of live sewage. Every time a threshold is crossed, an alert is delivered to the observer, who is looking at the system from a distance. This data is sent together with various gas ppm values that indicate whether it is safe to work in that environment or not. Rahman et al. [11] detailed the development of Bangladesh’s sewer infrastructure and the implementation of a prototype hazard monitoring system, and the researchers believe to forecast sewer system overflow. Calculated risk variables
30
S. Rajkumar et al.
for this include the quantity of hazardous gasses present in the underground tunnels. Umapathi et al. [12] aim to identify potentially dangerous sewage gasses. The created system alerts the sewage employees by using an Arduino Mega, gas sensors, and a GSM module. When the gasses reach the predetermined threshold value, Arduino sends an alert message to the registered number. For the benefit of those in the immediate area, the values are presented on the LCD screen. Granlund and Brännström [13] developed a sensor-based system for tracking sewer floods. For early testing, the presented system is installed in a number of strategically located sewers. The device should be able to interact with common alarm systems in order to monitor the sewer’s natural changes. The system is constrained by the communication path and mounting location. Prabavathi et al. [14] developed a prototype by integrating microcontrollers and sensors into a Web application to show the sewage water levels in the water plant with protected cloud Firebase, and this article offered an efficient method for fully automating the sewage treatment plan. The designed system enhances the predictability of outflow water quality. Latif et al. [15] detailed a method in which the sewage system is depicted graphically, with a junction assumed to be a node and the water flow represented as a direct edge in the graph theory. The formal model known as Vienna Development Method-Specification Language (VDM-SL) is created by changing the previous graphical model.
3 Proposed Smart Sewage Monitoring System In our proposed work, we have tried to solve the problem of the overflowing and the blockages in a sewage system, constant monitoring of the sewage water levels, and detecting the presence of any poisonous gasses in higher concentrations. We have designed the prototype with respect to the smart-city domain. So, in our prototype, we have used HC-SR04 ultrasonic sensor, temperature and humidity sensor, waterlevel sensor, MQ-9 gas sensor, GPS module, and a servo motor. We have also used RGB LED as a distance indicator. We have placed all the components and the entire setup in a box whose lid is movable. Here, we have used a water tub in reference to a sewer. The main idea is to deploy the entire prototype in the sewage man-holes in specific regions in a city so that we can monitor the sewage water levels in those regions. The main reason behind using a box is to ensure component safety and also for compact design. Constant monitoring of sewer water levels is done by using the ultrasonic sensor, which calculates the distance between the setup, which is attached under the sewer lid, and the sewer water levels. This distance is then corrected by the temperature and humidity values calculated by the DHT11 sensor, due to the dependency of the ultrasonic sensor on surrounding temperature and humidity. The functionality of the RGB LED used here is to emit different colors for various ranges of the distances, indicating about the sewage water levels of a particular sewer. The water-level and the MQ-9 gas sensors are used to detect presence of water and poisonous and flammable gasses, respectively. The GPS module has also been used to fetch the location coordinates of the sewer, so as to know the location of the sewer
Smart Sewage Monitoring Systems
31
beforehand in case of abnormal conditions in the sewer. The design of the box is done in such a way that the LED indicator and the GPS module are kept outside of the box, though these are connected to the microcontroller board (Arduino) placed inside the box. This is done due to the sole purpose of the LED used, i.e., waterlevel indicator and to fetch the location coordinates faster through the GPS module. Further, the collected sensor data from the sensors is sent to a Web application, i.e., Node-RED via the USB cable. This sensor data is then visualized and stored in the firebase using Node-RED, and also with the fetched location coordinates, i.e., latitude and longitude, we will plot the point in the Node-RED’s world map. In case of any abnormal conditions like sewage water overflow or detection of high concentrations of poisonous gasses, the box which is attached under the sewer lid gets closed automatically in case of sewer water overflow, with the help of servo motor to ensure component protection, and then, alert notifications will be sent to the concerned authorities in the city of through email, by integration of email service through Node-RED and IFTTT Web applications. The Fig. 1 shows the images of the built prototype, which is to be deployed in the sewer or is attached to the sewer lid.
4 Technical Specifications Our proposed work is divided mainly into four parts – hardware, Web app, email service, and data storage.
4.1 Hardware The hardware consists of ultrasonic sensor, MQ-9 gas sensor, DHT 11 sensor module, RGB LED, LCD, Arduino Uno, bread-board, and an USB cable and jumper-wires. Initially, we will use an ultrasonic sensor, to calculate the distance between the sewage water levels and the sensor, which is normally used to calculate the distance of the obstacles/objects in its proximity. As this sensor accuracy is dependent on the surrounding air temperature and humidity, we will use the temperature and humidity sensors to account for the changes in temperature and humidity, and we will calculate the actual distance/level of water using these temperature and humidity values. If we want to measure and monitor the water level, where the water is in motion as in the case of a sewage system, then we have to use a guided medium like plastic/steel pipe between the water and the sensor to get the correct readings. MQ-9 gas sensors must be preheated, i.e., sufficient power-supply must be provided initially to the sensor, in order to function properly, which detects the presence of any poisonous or
32
S. Rajkumar et al.
Fig. 1 a–c - Prototype images of the proposed model
flammable gasses like methane and CO. The RGB LED is used as an indicator for various levels of the sewage water as discussed. It is controlled by the PWM signals given by Arduino Uno for emitting various colors, and it glows in red, blue, green color when the water levels are high, moderate, and low, respectively. LCD with I2C module is used for displaying the actual distance calculated and air temperature. GPS module has also been used to get the location coordinates of the sewer so as to take immediate action wherever overflow, blockages or presence of poisonous gasses have been detected in the sewer, and use of servo motors is done to ensure component safety.
Smart Sewage Monitoring Systems
33
4.2 Web App We will send the data collected by the sensors in JSON string format to Node-RED for visualization of the sensor data. A USB cable is used to send the data. We have used different visualization methods for the various types of the data collected.
4.3 Data Storage The data is stored in the firebase through Node-RED by using the firebase nodes. Incoming sensor data is parallelly pushed to the real-time database location of firebase by providing the http link of the database in the firebase nodes.
4.4 Email Service We will integrate Node-RED with IFTTT Web application in order to send alert notifications to the concerned authorities in the city in case of sewage water overflow and detection of poisonous gasses in higher concentrations. The Fig. 2 describes the algorithm of the built prototype model with brief functionality of the components (software/hardware) used. The above Table 1 shows few sets of sensor values obtained from the serial monitor of the Arduino IDE. As we can see that the initial distance calculated by merely using an ultrasonic sensor varies from the actual distance between the ultrasonic sensor and the water level. Explanation of Node-RED Flow The sensor data in JSON string format is sent to Node-RED application which has flow as shown in Fig. 3 by serial communication using an USB cable. The data is read/received in Node-RED by using “serial in” node, then the data is parsed by “JSON” node, which changes the sensor data in JSON string to JavaScript object and parallelly store the data in the firebase using “firebase modify” node. The extracted location coordinates (latitude and longitude) are used in the “world map” node to display the location in the Node-RED world map. The data sent to each sensor is extracted using the function nodes as the data is a JavaScript object and thereby visualizing the data of each sensor by using “gage or chart” nodes. The sensor data is displayed in the Node-RED debug window by using debug nodes. The “switch” nodes have been used at the output of function nodes, which extracts the data sent by water and gas sensors, to check the conditions for water overflow and gas detection, and at the end of these switch nodes, “change” node has been used to change the sensor data, which is in string format (yes/no) into a suitable Boolean (0/1) value for visualization purpose. Also, “http request” nodes have been used at the output of switch
34
S. Rajkumar et al.
Fig. 2 System flow diagram
nodes, which are responsible for sending email notifications through Webhooks email service in IFTTT Website in case of any abnormal conditions.
5 Parameters Measured from the Sewage System As mentioned earlier, the sensor data is visualized or displayed in the Node-RED user interface by using a set of dashboard nodes. So, in Fig. 4, the data sent by various sensors is being visualized through graphs and gages.
Smart Sewage Monitoring Systems
35
Table 1 Sensor data collected from Arduino IDE’s serial monitor Calculated distance (in cm)
Temperature (c)
Relative humidity
Actual distance (in cm)
Water detected (Boolean yes/no)
Gas detected (Boolean yes/no)
Location coordinates (latitude, longitude)
29.72
30.8
70.0
31.48
No
No
(12.9692° N, 79.1559° E)
30.14
30.8
70.0
31.92
No
No
(12.9692° N, 79.1559° E)
30.12
30.8
70.0
31.90
No
No
(12.9692° N, 79.1559° E)
29.29
30.8
70.0
31.02
No
No
(12.9692° N, 79.1559° E)
29.33
30.8
70.0
31.06
No
No
(12.9692° N, 79.1559° E)
29.28
30.8
70.0
31.0
No
No
(12.9692° N, 79.1559° E)
The sensor data is simultaneously stored in firebase database as shown in Fig. 5. Getting the location of the sewer is as shown in Fig. 6 from the fetched coordinates. The email notifications are sent from IFTTT in case of sewage water overflow or detection of poisonous gasses in higher concentrations as shown in Fig. 7.
6 Conclusion As there are many leakages of the sewage systems being reported every year, it becomes a necessary task to address this issue by monitoring the sewage systems. So, our designed prototype has been able to solve the issues related a sewage system, which includes monitoring of sewage water levels (avoiding from sewage water overflows) and sewer blockages and detection of poisonous gasses in the sewage system and also sends alert notifications to the concerned authorities in the city through email in the above situations. The incoming sensor data is properly visualized and also stored successfully in the firebase cloud, which might be helpful for analytical and future purposes. The RGB LED used here emitted different colors for various sewage water levels, which will be helpful for knowing the status (water levels) of the particular sewage system. The location of the sewer is also pointed out accurately in the world map by using the fetched coordinates by the GPS module. Also, the automatic closing of the box lid using servo motors is such a thing that ensures the component safety whenever there is an overflow in the sewer. We have used the HCSR04 ultrasonic sensor for sewer water-level monitoring, which can measure distance levels up to four meters, which is sufficient as the average depth of the sewer manhole is less than three meters. If the sewer/sewer manhole depth is more than four meters,
36
S. Rajkumar et al.
Fig. 3 Node-RED flow visualization of the proposed system
then we can use long-range ultrasonic/distance sensors. This prototype can be used to monitor sewage water level on a small scale such as in a village/town as well as on large scale such as a mega/metropolitan city provided that these places have a proper sewage system. While deploying on a large scale to collect the information from multiple prototypes, we can use an USB-hub which connects the USB cables of all the prototypes into a single cable which can in turn be connected to a single computer where all the information for every prototype can be visualized. Also, we can use devices like Bluetooth or ESP8266 Wi-Fi module to send the data to the Node-RED wirelessly instead of using an USB cable, and we can also use a mobile-app for data visualization and to send alert notifications, which would be easy for the municipal authorities and even for people of the city to know about the status of the particular sewage system.
Smart Sewage Monitoring Systems
Fig. 4 a–c Visualization of the incoming sensor data in Node-RED
37
38
Fig. 5 Google firebase for storing sensor data
Fig. 6 Location mapping using Node-RED world map
S. Rajkumar et al.
Smart Sewage Monitoring Systems
39
Fig. 7 Email notifications from IFTTT, a when the sewage water overflows, b when poisonous gasses are detected in higher concentrations
References 1. Senthil Ganesh R , Mahaboob M, Janarthanan AN, Lakshman C, Poonthamilan S, Kumar KK (2021) smart system for hazardous gases detection and alert system using internet of things. In: 5th International conference on electronics, communication and aerospace technology (ICECA), pp 511–515. https://doi.org/10.1109/ICECA52323.2021 2. Kumar S, Kumar S, Tiwari PM, Viral R (2019) Smart safety monitoring system for sewage workers with two way communication. In: 6th International conference on signal processing and integrated networks (SPIN), pp 617–622. https://doi.org/10.1109/SPIN.2019.8711628 3. Kumar A, Gupta SK, Rai M (2021) Real-time communication based IoT enabled smart sewage workers safety monitoring system. In: 5th international conference on information systems and computer networks (ISCON), pp 1–4. https://doi.org/10.1109/ISCON52037.2021.9702405 4. Jeffery DNSBPM, Newaz SHS, Shams S, Guillou N, Nafi NS (2021) LoRaWAN-based smart sewerage monitoring system. In: International conference on electronics, communications and information technology (ICECIT), pp 1–6. https://doi.org/10.1109/ICECIT54077.2021. 9641263 5. Hisham Che Soh Z, Azeer Al-Hami Husa M, Afzal Che Abdullah S, Affandi Shafie M (2019) Smart waste collection monitoring and alert system via IoT. In: IEEE 9th symposium on computer applications and industrial electronics (ISCAIE), pp 50–54. https://doi.org/10.1109/ ISCAIE.2019.8743746
40
S. Rajkumar et al.
6. Mantoro T, Istiono W (2017) Saving water with water level detection in a smart home bathtub using ultrasonic sensor and Fuzzy logic. In: Second international conference on informatics and computing (ICIC), pp 1–5. https://doi.org/10.1109/IAC.2017.8280602 7. Jaikumar K, Brindha T, Deepalakshmi TK, Gomathi S (2020) IOT assisted MQTT for segregation and monitoring of waste for smart cities. In: 6th International conference on advanced computing and communication systems (ICACCS), pp 887–891. https://doi.org/10.1109/ICA CCS48705.2020.9074399 8. Singh NA, Borschbach M (2017) Effect of external factors on accuracy of distance measurement using ultrasonic sensors. In: International conference on signals and systems (ICSigSys), pp 266–271. https://doi.org/10.1109/ICSIGSYS.2017.7967054 9. Wang Q, Westlund V, Johansson J, Lindgren M (2021) Smart sewage water management and data forecast. In: Swedish artificial intelligence society workshop (SAIS), pp 1–4. https://doi. org/10.1109/SAIS53221.2021.9484017 10. Asthana N, Bahl R (2019) IoT device for sewage gas monitoring and alert system. In: 1st International conference on innovations in information and communication technology (ICIICT), pp 1–7. https://doi.org/10.1109/ICIICT1.2019.8741423 11. Rahman MM, Abul Kashem M, Mohiuddin M, Hossain MA, Nessa Moon N (2020) Future city of Bangladesh: IoT based autonomous smart sewerage and hazard condition sharing system. In: IEEE international women in engineering (WIE) conference on electrical and computer engineering (WIECON-ECE), pp 126–130. https://doi.org/10.1109/WIECON-ECE 52138.2020.9397950 12. Umapathi N, Teja S, Roshini, Kiran S (2020) Design and implementation of prevent gas poisoning from sewage workers using Arduino. In: IEEE international symposium on sustainable energy, signal processing and cyber security (iSSSC), pp 1–4. https://doi.org/10.1109/iSS SC50941.2020.9358841 13. Granlund D, Brännström R (2012) Smart city: the smart sewerage. In: 37th Annual IEEE conference on local computer networks—workshops, pp 856–859. https://doi.org/10.1109/ LCNW.2012.6424074 14. Prabavathi R, Duela JS, Chelliah BJ, Saranya SM, Sheela A (2021 An intelligent stabilized smart sewage treatment plant (STP). In: 4th International conference on computing and communications technologies (ICCCT), pp 381–385. https://doi.org/10.1109/ICCCT53315.2021.971 1758 15. Latif S, Afzaal H, Zafar NA (2017) Modeling of sewerage system using internet of things for smart city. In: International conference on frontiers of information technology (FIT), pp 46–51. https://doi.org/10.1109/FIT.2017.00016
Explainable Stacking Machine Learning Ensemble for Predicting Airline Customer Satisfaction R. Pranav and H. S. Gururaja
Abstract Customer satisfaction plays a significant element for the business. This paper proposes a machine learning approach to analyze and improve the customer’s experience with a particular airline. The dataset utilized contains information given by a real aircraft. The genuine name of the organization is not given because of the carrier’s protection motivations. A stacking ensemble model with logistic regression, random forests, and decision tree classifiers as Layer 1 and XGBoost classifier as the combiner classifier in Layer 2 was used to predict whether a future client would be happy with their administration given the subtleties of the other boundaries’ qualities. The idea behind developing ensemble models is to mitigate the risk of overfitting data and to enhance the performance of ML models. Stacking ensemble models are beneficial because they can utilize and harness the capabilities of a bunch of classifiers (called base classifiers/learners) into a single classifier, thus resulting in a robust model. It was found that stacking classifier gave an accuracy of 96.26% on the test dataset. The proposed method outperformed the best base learner, random forest, by a 2.6% margin. ML techniques, regardless of their colossal achievement, experience the ill effects of the ‘black-box’ issue, which alludes to circumstances where the information examiner cannot make sense of why the ML methods show up at specific choices. This issue has energized interest in explainable artificial intelligence (XAI), which alludes to strategies that can without much of a stretch be deciphered by people. This study further focused on tackling the black-box problem with the help of DALEX XAI, to help airlines know which part of the services presented by them must be stressed more to produce more fulfilled clients. Keywords Ensemble learning · Predictive analysis · Explainable AI · DALEX
R. Pranav (B) Department of EEE, B.M.S. College of Engineering, Bengaluru, India e-mail: [email protected] H. S. Gururaja Department of ISE, B.M.S. College of Engineering, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_4
41
42
R. Pranav and H. S. Gururaja
1 Introduction Assessing air transport administration quality performance is of great importance for the airline industry. Offering excellent types of assistance fulfilling travelers and vacationers is likewise a center upper hand to arrive at important financial advancement in the country [1]. Companies utilize consumer loyalty reviews and questionnaires to figure out clients’ opinion on an item, administration, brand, or the organization. In principle, assuming that a client is happy with the help, he/she will be faithful to utilize the assistance [2, 3]. The COVID-19 pandemic basically impacted the air transportation framework around the world because of which airlines crucially evaluate factors that affect customer satisfaction [4]. An effective customer satisfaction data analysis represents a challenge and provides excellent research opportunities in several areas as data mining, AI, and ML. Machine learning methods are extremely powerful and can predict customer satisfaction when a service is provided [3]. Be that as it may, most clients have little ability to perceive and information about how AI frameworks arrive at a specific choice and their results in the different fields where AI and machine learning are being applied. A significant number of the algorithms’ outcomes cannot be perceived and made sense of as far as how and why a particular choice was taken [5, 6]. This is the ‘black-box’ nature of the model. The target of this study is to not only develop an accurate machine learning ensemble model to predict airline customer satisfaction but also to tackle the black-box nature of the model with explainable AI.
2 Related Work Different approaches have been applied in measuring the airline customer satisfaction. An et al. [7] zeroed in on the effect of the inflight administration quality on aircraft consumer loyalty and reliability. Machine learning is a hot research topic at the moment. Kumar et al., Baygodan and Alatis, Tan [8–10] proposed customer satisfaction using various ML and DL models from tweets by the customers. This approach is one the lines of sentiment analysis of the customer. Tan [10] further extended the research by performing a comparative analysis of ML frameworks and implementing model-specific approach for extracting important features. A different study proposed by Shu [11] used machine learning models to analyze flight delays and cancelation of airlines, and the results were applied to improve customer satisfaction. In recent studies, Jiang et al. [12] proposed a feature selection model with RF-RFE-LR to extract the features influencing the satisfaction of airline clients. Hong et al. [13] proposed a comparative analysis of ML and DL models on a similar dataset used in this study, to predict airline customer satisfaction [13].
Explainable Stacking Machine Learning Ensemble for Predicting …
43
Machine learning models are subject to high variance and bias as they are mostly developed and trained by humans. It can also lead to overfitting when there are many training features, thereby increasing model complexity. Developing ensemble models help to reduce variance and bias while improving predictions. Our proposed stacking ensemble model has been put to use ubiquitously for various classification tasks. Berliana and Bustamam [14] proposed a stacking ensemble model with support vector machine, random forests, and K-nearest neighbors as base classifiers and support vector machine as meta-classifier for classifying COVID-19 utilizing two image datasets, namely lung X-rays and CT scan. Zhou et al. [15] used four different tree-based classifiers as primary weak classifiers and XGBoost classifier as metaclassifier for purchase prediction. Liu et al. [16] proposed early diabetes prediction by using gradient boosting decision tree, AdaBoost, and random forest as primary learners and logistic regression as secondary learners. Other applications include student achievement prediction [17] and mobile traffic prediction [18]. This brings us to a conclusion that there is not one right or wrong stacking ensemble model. As machine learning algorithms get more and more complex, interpretability of the model also becomes a challenge. This has paved the way to a new research domain called explainable AI. Several explainable models like LIME and SHAP are two very popular explainable methods doing the rounds nowadays [6, 19, 20]. A relatively unexplored field is the DALEX XAI [21] which is proposed in this study.
3 Proposed Method Ensemble learning is usually built by base learners using several homogeneous or heterogeneous learners whose predictions are combined. Stacking is joining several base learners at the first level to a meta-classifier at the second level [14]. Stacking model is intended to accomplish improved results than a solitary model [15]. The base learners for the proposed ensemble model in this paper are discussed below.
3.1 Logistic Regression Logistic regression is a classification model that has been around for a while in statistical machine learning, and it is also a supervised learning technique. To arrive at logistic regression, consider the basic multivariate linear regression as shown below [22]: yˆ (i) = θ0 + θ1 X 1(i) + θ2 X 2(i) + . . . + θn X n(i)
(1)
The above equation can also be rewritten as follows: yˆ (i) = X (i) · θ
(2)
44
R. Pranav and H. S. Gururaja
The above can be written in matrix form as follows: yˆ = X b · θ
(3)
Substituting the multivariate linear regression into compression function, i.e., sigmoid function, smoothing, and linearizing it, and using probability value of regression to classify discrete non-linear data, the sigmoid function’s structure can be composed as shown below: σ (t) =
1 1 + e−t
(4)
Substituting (3) in (4), we arrive at the logistic regression’s form as shown below: ρˆ = σ (X b · θ ) =
1 1 + e−X b ·θ
(5)
3.2 Decision Trees Decision trees, as suggested in its name, come with a tree-like design, where leaves address result labels, i.e., 0 for negative and 1 for positive, and branches address conjunctions of information that brought about those results [23].
3.3 Random Forests Random forests are an ensemble machine learning model which includes growing of different decision trees through boot-strap aggregation, also called bagging. It has an implicit component choice framework and in this manner can deal with various information boundaries without erasing a few boundaries for decreased dimensionality [23]. The marginal function for a given classifier’s bundle r 1 (x), r 2 (x) … r k (x) and the training set drawn aimlessly from the dispersion of the arbitrary vector Y, X is characterized as shown [24]: mg(X, Y ) = avk I (h k (X ) = Y ) − max jY avk I (h k (X ) = j)
(6)
where I(·) is the indicator function. The generalization error is given by PE∗ = PX,Y (mg(X, Y ) < 0)
(7)
Explainable Stacking Machine Learning Ensemble for Predicting …
45
Fig. 1 Proposed stacking model
3.4 XGBoost XGBoost is an ensemble tree-based supervised learning algorithm. Numerous classification projects in different fields are being applied using this model [25]. XGBoost provides excellent research opportunities and is a hot model winning many Kaggle competitions nowadays. It targets enhancing a cost objective capacity made out of a loss function (l) and a regularization term (): C(θ ) =
n K l yi , yˆi + ( f k ) i=1
(8)
k=1
Where yi is the predictive value, n is the amount of cases in the preparation set, K is the amount of trees that are to be delivered, and fk is a particular tree from the ensemble group trees. The regularization term is characterized as follows: ⎤ T T 1 w j + λ ( f t ) = γ T + ⎣α w 2j ⎦ 2 j=1 j=1 ⎡
(9)
Where γ loss reduction factor, λ is regularization term and w is weight that corresponds to a particular tree leaf. Finally, the proposed stacking ensemble model is designed as seen in Fig. 1.
3.5 Dalex Xai Nowadays, there are increased concerns regarding the explainability and fairness of ML predictive models in research and business areas. To work with the mindful improvement of ML models, this paper introduces DALEX, a Python-based package
46
R. Pranav and H. S. Gururaja
whose interface carries out a model-agnostic approach for intuitive explanation and fairness [26]. The proposed stacking ensemble model/explainer combination aims to join the best performing metric and reasonable rationale behind its predictions. Experimentally, it was found that the ensemble model outperformed the individual ‘weak’ models.
4 Result Analysis 4.1 Data Preprocessing and Analysis This study was conducted using a Lenovo ThinkPad T470 laptop powered by Intel CORE i5 vPro processor. Google Colaboratory with the default CPU runtime hardware accelerator was used to write the program and generate results. Python, which is used extensively nowadays because of its ML capabilities, was used to write the program. The dataset used for this study was taken from Kaggle. It consists of 129,880 samples with 23 features. The dataset was preprocessed by checking for missing values. The missing values were filled by the median of that respective column. The dataset was divided based on categorical and numerical variables. The ‘get_dummies’ method was used to convert the categorical variables into numerical variables. The dataset was split for training and testing in the proportion 7:3 (70% for training and 30% for testing). Figure 2 shows a fairly balanced distribution of the target values in the dataset with 56.6% of the customers feeling dissatisfied or felt indifferent and 43.4% of the customers felt satisfied. Fig. 2 Distribution of target values
Explainable Stacking Machine Learning Ensemble for Predicting …
47
Table 1 Table captions should be placed above the tables Models
Performance metrics Precision
Recall
F1-score
Accuracy
Logistic regression
0.8689
0.8376
0.8529
0.8741
Decision trees
0.8934
0.9211
0.9070
0.9177
Random forest
0.9332
0.9196
0.9263
0.9362
Stacking ensemble
0.9764
0.9369
0.9562
0.9626
4.2 Model Evaluation Table 1 shows that the stacking ensemble outperformed the best individual model (random forest) by an accuracy of around 2.6%. The reason behind this huge difference in accuracy between the stacking ensemble and random forest can be attributed to XGBoost being the meta-classifier. Though random forest is the best base learner, an interesting observation is that decision trees gave a better recall compared to random forest, which suggests it was more sensitive to the relevant data and gave a more accurate number of true positives. It can be seen that logistic regression is the worst performer in terms of every metric, suggesting that it is not an ideal model for predicting customer satisfaction. Study [13] has proposed that RF is the best performing model on a similar dataset. However, the stacking model proposed in this paper has shown robustness by outperforming the RF model and coming off as the better choice in predicting customer satisfaction.
4.3 Model Explainability Model explainability plays a crucial role to help overcome the black-box nature of machine learning models. In this paper, a model-agnostic explainable model, DALEX, is discussed. The following features are discussed. Variable Importance. This provides an insight as to which variables/features contributed how much for the prediction of the model. The most important and least important variables can be found of from variable importance. In this paper, 20 variables are considered. Those features that do not appear are of no use to the particular model. Longer the bar, more important is the variable. The variables with the shortest bar are of least importance and are better off being removed. Stacking Ensemble. It can be seen that the ‘inflight Wi-Fi service’ feature contributes more to the stacking model. This comes as no surprise because we live in the age of the Internet, and air travel means getting disconnected from the outside world courtesy of ‘airplane mode’. An inflight Wi-Fi will definitely help engage customers and make the journey less boring, especially for long distance travel.
48
R. Pranav and H. S. Gururaja
Fig. 3 Variable importance for stacking ensemble
The inflight Wi-Fi feature is followed by type of travel and customer type which contribute majorly toward the model prediction. Business travelers tend to be more satisfied than leisure travelers because they mostly choose business class, which is once again a variable that is contributing to the customer satisfaction. Business class offers more amenities than economy adding to the satisfaction of customers. Features like age, gender, flight distance, gate location, departure/arrival time convenient, and arrival/departure delay contribute the least to the stacking model and can be removed. Figure 3 shows variable importance for stacking classifier. Decision Trees. The inflight Wi-Fi service contributed majorly for the customer satisfaction according to decision trees. This is followed by a type of travel. As mentioned previously, one would believe that business travelers tend to be more satisfied than leisure travelers. Interestingly, online boarding and business class contribute more to decision trees model which is in contrast to the stacking ensemble which performed better with type of customer feature than the above mentioned. Besides Wi-Fi, the inflight entertainment system is also an important feature for this model. Age, gender, economy class, food and drinks served, and flight distance are of no importance to the model. Figure 4 shows variable importance for decision tree classifier. Random Forests. As the best performing base learner, it is of no surprise that the inflight Wi-Fi service was the most important variable for the random forest algorithm. This is followed by the type of travel feature. The model gives a decent amount of importance to the check in feature as well. Inflight entertainment, ease of online booking, and business class are also fairly important to the model. Seat comfort is given more importance in this model similar to the stacking ensemble. But this was not as important to the decision trees.
Explainable Stacking Machine Learning Ensemble for Predicting …
49
Fig. 4 Variable importance for decision trees
It can be observed that legroom service is fairly important to random forests. This was not seen in the above two models. Once again, age, gender, gate location, flight distance, arrival delay, and departure/arrival time convenience contribute least to nothing to the model prediction and can, hence, be removed. Figure 5 displays the variable importance for random forest classifier. Logistic Regression. This was the worst performing model among the base learners. The type of a customer’s travel has contributed the most for the model prediction. It
Fig. 5 Variable importance for random forest
50
R. Pranav and H. S. Gururaja
Fig. 6 Variable importance for logistic regression
is can be observed in Fig. 6 that according to logistic regression, the feature gender is considered important. Clearly, this was not the case in the previously discussed models. Inflight Wi-Fi service is not the most important variable according to this model. In fact, it has given more importance to economy class than inflight Wi-Fi service which does not make any sense. It is clear that this model has missed the important features/variables of the dataset. On-board service and check in service also contribute enough for the model prediction. Baggage handling, food and drinks, and inflight service make no impact on the model and hence are of least importance to this model. Partial Dependence Plots (PDP). Partial dependence plots feature the minor impact of the variables on the predictions of the model. The following sets of parameters are taken into consideration. With respect to inflight Wi-Fi service, the three models except logistic regression seem to agree with each other in the trend. Customers who gave good ratings for the Wi-Fi service seem to be more satisfied with the airline indicating that the Wi-Fi feature is an important variable, perhaps the most. It can be seen that the decision tree uses more variables than random forest and the stacking ensemble. Logistic regression, however, shows a linearly increasing prediction but deviates from the trend of the other three models, thus indicating Wi-Fi feature is not important. Inflight service, Leg room service and Seat comfort have little to no effect on the decision tree model, while the logistic regression shows a linearly increasing trend. The stacking ensemble and random forests seem to agree on a similar trend with each other and the former using more variables than the latter. The low predictions indicate that these are not very important to the models as even good ratings like 4 or
Explainable Stacking Machine Learning Ensemble for Predicting …
51
Fig. 7 (Clockwise from top) PDP for inflight Wi-Fi service, inflight service, seat comfort, leg room service
5 stars for these features did not completely determine if the passenger was satisfied with the airline or not. This is shown in Fig. 7. Random forests, decision trees, and the stacking ensemble seem to behave similarly when it comes to ease of online booking and online boarding. The prediction of the three models shows an increase in trend especially when it comes to online boarding. This is because online boarding is hassle-free. Here, a customer will not have to be worried about losing their physical boarding pass as the QR code will just be on their phone. Customers who gave good ratings for online boarding facilities seem to be more satisfied indicating the model depends fairly on this feature to predict the satisfaction of the customer. An interesting interpretation is of the logistic regression which shows a linearly decreasing trend for the ease of online booking feature. This suggests that customers who rated good for this service did not seem to be satisfied with the overall services of the airline. On-board service and baggage handling does not affect decision tree, while the random forest and stacking ensemble agree on the trend with each other. Logistic regression also does agree on a similar trend but is linear. This is shown in Fig. 8.
52
R. Pranav and H. S. Gururaja
Fig. 8 (Clockwise from top) PDP for ease of online booking, online boarding, baggage handling, on-board service
It is important to note that type of travel and customer type are rather important features for the models. As discussed earlier, business travelers (class 1) seem to be satisfied more than leisure travelers (class 0). It can be seen that the stacking model uses more variables than random forests or decision trees and hence leads to higher prediction. Knowing the customer type is helpful for airlines to predict satisfaction. The stacking model, random forests, and decision trees agree on a similar trend with the stacking model providing better prediction followed by random forest and decision trees. Loyal customers (class 1) are more satisfied than disloyal customers (class 0). Logistic regression too agrees on the trend, but it is linearly increasing. Travelers who gave a rating of 4 or more stars to cleanliness and 3.5 or more for inflight entertainment seem to be satisfied with the airline according to stacked model, RF, and DTs. Logistic regression once again shows a linearly increasing trend. This is shown in Fig. 9. Business class, economy, and economy plus classes give a rather different interpretation of the models. Business class customers seem to be increasingly satisfied with the airline according to decision tree, random forest, and the stacking ensemble,
Explainable Stacking Machine Learning Ensemble for Predicting …
53
Fig. 9 (Clockwise from top) PDP for type of travel, customer type, inflight entertainment, cleanliness
whereas logistic regression shows a downward trend which makes no logical sense. This can be attributed to logistic regression not comprehending the essential features of the dataset well. Economy and economy plus seem to make no impact on decision trees. Random forest and the stacking ensemble seem to complement each other by showing lower satisfaction progressively. This can be because a ‘better’ economy or economy plus class seem to only increase the fare with no definable upgrade in amenities. This is seen in Fig. 10. This paper also brings explainable AI to the table. Study [10] has proposed a ‘model-specific’ approach for implementing feature importance. However, DALEX XAI, proposed in this paper, implements a ‘model-agnostic’ approach, for interpreting the model predictions of the base learners and the stacking ensemble, thereby providing flexibility while comparing different models with the same metrics.
54
R. Pranav and H. S. Gururaja
Fig. 10 (Clockwise from top) PDP for business class, economy class, and economy plus class
5 Conclusion and Future Work In this paper, a stacking ensemble model was developed using a combination of 3 machine learning models as base learners to determine the satisfaction of airline customers. Diverting from convention, an XGBoost classifier was used as the metalearner which can be the reason why the stacked ensemble outsmarted the individual learners by a big margin. It was found that the stacked model outsmarted the best base learner (random forest) by around 2.6%, decision tree by 4.5%, and logistic regression by a whopping 8.85% in terms of accuracy. It was also established that the stacking classifier performed better than existing ML models such as SVM and RF as proposed in a recent study. Attempts were made to target the black-box nature of the model with the help of DALEX explainable AI. This study can further be taken forward by developing a better tuned model. A study can be conducted on how using ensemble models as meta-classifiers affect the overall stacking model as compared to using classical models. Further, a comparative analysis of different explainable methods can be carried out which could help the
Explainable Stacking Machine Learning Ensemble for Predicting …
55
airline industry further understand and work on the factors that customers find most appealing. The field of explainable AI, though doing the rounds in the past few years, is still in its infancy and provides excellent research opportunities in various fields.
References 1. Bellizzi MG, Eboli L, Mazzulla G (2020) Air transport service quality factors: a systematic literature review. Transp Res Procedia 45:218–225 2. Tegar K, Lestari A, Pratiwi SW (2017) An analysis of airlines customer satisfaction by improving customer service performance. In: Global research on sustainable transport (GROST 2017). Atlantis Press, pp 619–628 3. García V (2019) Predicting airline customer satisfaction using k-nn ensemble regression models. Instituto de Ingeniería y Tecnología 4. Monmousseau P, Marzuoli A, Feron E, Delahaye D (2020) Impact of Covid-19 on passengers and airlines from passenger measurements: managing customer satisfaction while putting the US Air transportation system to sleep. Transp Res Interdisc Perspect 7:100179 5. Daudt F, Cinalli D, Garcia ACB (2021) Research on explainable artificial intelligence techniques: an user perspective. In: 2021 IEEE 24th international conference on computer supported cooperative work in design (CSCWD). IEEE, pp 144–149 6. Carta SM, Consoli S, Piras L, Podda AS, Recupero DR (2021) Explainable machine learning exploiting news and domain-specific lexicon for stock market forecasting. IEEE Access 9:30193–30205 7. An M, Noh Y (2009) Airline customer satisfaction and loyalty: impact of in-flight service quality. Serv Bus 3(3):293–307 8. Kumar S, Zymbler M (2019) A machine learning approach to analyze customer satisfaction from airline tweets. J Big Data 6(1):1–16 9. Baydogan C, Alatas B (2019) Detection of customer satisfaction on unbalanced and multi-class data using machine learning algorithms. In: 2019 1st International Informatics and Software Engineering Conference (UBMYK). IEEE, pp 1–5 10. Tan C (2021) Bidirectional LSTM model in predicting satisfaction level of passengers on airline service. In: 2021 2nd International conference on artificial intelligence and computer engineering (ICAICE). IEEE, pp 525–531 11. Shu Z (2021) Analysis of flight delay and cancellation prediction based on machine learning models. In: 2021 3rd International conference on machine learning, big data and business intelligence (MLBDBI). IEEE, pp 260–267 12. Jiang X, Zhang Y, Li Y, Zhang B (2022) Forecast and analysis of aircraft passenger satisfaction based on RF-RFE-LR model. Sci Rep 12(1):1–15 13. Hong SH, Kim B, Jung YG (2020) Correlation analysis of airline customer satisfaction using random forest with deep neural network and support vector machine model. Int J Internet Broadcast Commun 12(4):26–32 14. Berliana AU, Bustamam A (2020) Implementation of stacking ensemble learning for classification of COVID-19 using image dataset CT scan and lung X-Ray. In: 2020 3rd International conference on information and communications technology (ICOIACT). IEEE, pp 148–152 15. Zhou A, Ren K, Li X, Zhang W (2019) MMSE: a multi-model stacking ensemble learning algorithm for purchase prediction. In: 2019 IEEE 8th joint international information technology and artificial intelligence conference (ITAIC). IEEE, pp 96–102 16. Liu J, Fan L, Jia Q, Wen L, Shi C (2021) Early diabetes prediction based on stacking ensemble learning model. In: 2021 33rd Chinese control and decision conference (CCDC). IEEE, pp 2687–2692
56
R. Pranav and H. S. Gururaja
17. Fang T, Huang S, Zhou Y, Zhang H (2021) Multi-model stacking ensemble learning for student achievement prediction. In: 2021 12th International symposium on parallel architectures, algorithms and programming (PAAP). IEEE, pp 136–140 18. Li Z, Cai D, Wang J, Fu J, Qin L, Fu D (2020) A stacking ensemble learning model for mobile traffic prediction. In: 2020 IEEE/CIC international conference on communications in China (ICCC). IEEE, pp 542–547 19. Sahay S, Omare N, Shukla KK (2021) An approach to identify captioning keywords in an image using LIME. In: 2021 International conference on computing, communication, and intelligent systems (ICCCIS). IEEE, pp 648–651 20. Zou L, Goh HL, Liew CJY, Quah JL, Gu GT, Chew JJ, Ta A et al. (2022) Ensemble image explainable AI (XAI) algorithm for severe community-acquired pneumonia and COVID-19 respiratory infections. IEEE Trans Artif Intell 21. Baniecki H, Kretowicz W, Piatyszek P, Wisniewski J, Biecek P (2021) Dalex: responsible machine learning with interactive explainability and fairness in python. J Mach Learn Res 22(1):9759–9765 22. Yang Z, Li D (2019) Application of logistic regression with filter in data classification. In: 2019 Chinese control conference (CCC). IEEE, pp 3755–3759 23. Shaikhina T, Lowe D, Daga S, Briggs D, Higgins R, Khovanova N (2019) Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation. Biomed Signal Process Control 52:456–462 24. Breiman L (2001) Random forests. Mach Learn 45(1):5–32 25. Cherif IL, Kortebi A (2019) On using extreme gradient boosting (XGBoost) machine learning algorithm for home network traffic classification. In: 2019 Wireless days (WD). IEEE, pp 1–6 26. Srinath T, Gururaja HS (2022) Explainable machine learning in identifying credit card defaulters. Global Transitions Proc
A Water Cycle Algorithm for Optimal Design of IIR Filters Teena Mittal
Abstract In this work, a water cycle algorithm (WCA) is applied for the optimal design of infinite impulse response (IIR) low pass and high pass filter. The WCA is based on evaporation and raining processes, which help to sustain equilibrium among diversification and intensification ability of the algorithm. In order to ensure the stability of the IIR filter, a lattice equivalent approach has been used. For experimentation, 8th order low pass and high pass IIR filters have been undertaken. For low pass IIR filters, the objective function value obtained by WCA is 0.9966, and passband error, stopband error, and squared error values are very less and in the order of E-05. Further, it is found that WCA is able to achieve maximum stopband attenuation, i.e., 28.4681 dB. The obtained results are compared with that of the other state-of-the-art techniques and conclude that WCA is capable of finding the best solution. Keywords Water cycle algorithm · Low pass IIR filter · High pass IIR filter
T. Mittal (B) Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_5
57
58
T. Mittal
1 Introduction An infinite impulse response (IIR) filter is widely used for different applications in the field of digital signal processing due to its superior performance, minimal computational and memory requirements, and fewer coefficients as compared to FIR filter. However, in comparison with FIR filters, designing a digital IIR filter is more complex. Researchers have applied two approaches to design digital IIR filters; first is transformation approach, and second is optimization approach. In the first method, a digital filter is converted into an equivalent analog filter and then designed by various approaches, i.e., the Chebyshev type-I, II, Butterworth. However, the performance of these filters is not satisfactory [1].
1.1 Literature Review In this work, the literature review focuses on application of various global optimization techniques for optimal design of digital IIR filters. Zou et al. [2] have applied improved PSO technique to resolve IIR filter identification problem. For the digital IIR filter system, Mohammadi and Zahiri [3, 4] used inclined planes system optimization. The optimal parameters of the IIR filter system are searched by Lèvy flight technique [5]. Dhaliwal and Dhillon [1] have proposed an integrated optimization technique to search optimal coefficients of IIR filters. The Firefly algorithm has been applied for IIR filter design problems [6]. Oppositional artificial bee colony algorithm has been applied for optimization of IIR filters [7]. The cat swarm optimization technique is explored for identification of IIR system [8]. To find the best IIR system design coefficients, an integrated moth-flame and variable-neighborhood search technique is investigated [9]. Liang et al. [10] have applied a slime mould algorithm for designing digital IIR filters. Gotmare et al. [11] have presented a review paper on evolutionary algorithms for optimal filter design. Recently, another survey paper was published on the design of digital IIR filters by Agrawal et al. [12]. Ghibeche et al. [13] have designed an optimal IIR filter by applying a crow search algorithm. Two variants of the ant colony optimization algorithm are applied to deal with the optimal design of IIR filters [14]. Pelusi et al. [15] have proposed a fuzzy gravitational search algorithm for the optimal design of 8th order IIR filters. In order to search optimal parameters of IIR filter, Singh et al. [16] have applied the dragonfly algorithm, and results are compared with cat swarm optimization, PSO, and bat algorithm. An improved global best guided cuckoo search algorithm has been proposed for multiplierless design of IIR filters by Dhabal and Venkateswaran [17]. An improved firefly algorithm has been proposed to search the best coefficients of signal blocking IIR filters by Dash et al. [18]. A hybrid algorithm based on integration of gravitational search algorithm with biogeography-based optimization has been utilized to design IIR filters [19]. Bisen et al. [20] have applied a multiverse optimization technique to design a 10th order IIR high pass filter. Ali et al.
A Water Cycle Algorithm for Optimal Design of IIR Filters
59
[21] have proposed an approach based on weighted L1 -norm in combination with salp swarm optimization for the optimal design of IIR differentiators. Karthik et al. [22] have applied invasive weed optimization techniques for high pass FIR filters. Biogeography-based optimization techniques have been applied to design low pass IIR filters [23]. In the last three decades, researchers have been exploring various random search techniques to search for the optimal solution. In one of such attempts, Eskandar et al. [4] have presented an algorithm, namely “WCA”, which is based on the water flow process of rivers and streams moving toward sea. Sadollah et al. [24] have applied a mine blast algorithm and WCA for weight minimization of truss structures. Heidari et al. [25] have applied a chaotic WCA for optimization tasks, in which chaotic patterns have been incorporated into the stochastic process of WCA for better exploitation. The modified WCA algorithm is presented by Sadollah et al. [26], in which new concepts of evaporation rate for different rivers and streams are proposed for better search ability. Ravichandran et al. [27] have applied water droplets algorithm with mutation-based local search technique to search optimal values of numerical function. Barakat et al. [28] have applied WCA for load frequency control of multi-area power systems. Ma et al. [29] have established a calibration model of the viscous boundary’s adjustment coefficient based on WCA to improve the accuracy of dynamic response analysis. WCA is applied for environmentally sound short-term hydrothermal generation scheduling [30]. This research study intends to investigate the solution methodology for optimal design of low pass and high pass IIR filters using a water cycle algorithm. Recently, researchers have applied WCA to solve various complex optimization problems. However, as per the best of the author’s knowledge, it is a unique attempt in which the designing of IIR filters is carried out by applying WCA. In order to ensure the stability of the IIR filter, a lattice equivalent approach has been used. The obtained results are compared with other published results in terms of different attributes and found satisfactory; further, pole–zero is plotted to ensure the stability of an optimally designed filter. The remainder of the manuscript is divided into five sections. The problem formulation of IIR filter is presented in Sect. 2. The detailed discussion about the water cycle algorithm is given in Sect. 3. The result and discussion are elaborated in Sect. 4. Finally, conclusions of research work are presented in Sect. 5.
2 Problem Formulation of IIR Filter A frequency domain presentation of the IIR filter’s transfer function has been presented as [31]: H (z) =
b0+ b1 z −1 + b2 z −2 + . . . + bm z −m Y (z) = X (z) 1 + a1 z −1 + a2 z −2 + . . . + an z −n
(1)
60
T. Mittal
where X (z) and Y (z) correspond to the IIR filter’s input and output responses, respectively, in z domain; a1 , a2 , . . . , an and b0 , b1 , . . . , bm represent filter coefficients. The primary goal of this effort is to design an optimal IIR filter by minimizing the magnitude response error and is given as ⎡
⎤ |H (ω)| − D(ω) − δ p + ||H (ω)| − D(ω) − δs |⎦ J (ω) = ⎣ ω∈ω p
(2)
ω∈ωs
The desired magnitude response is given by Eq. (3). D(ω) = {1; ω ∈ passband 0; ω ∈ stopband
(3)
where |H (ω)| indicates actual filter magnitude response; δ p , δs show the permissible passband and stopband ripples, respectively. For the stability of the filter, denominator polynomial of the transfer function is changed into lattice form using the lattice equivalent approach, and set of decision variable is given as [32]: A = b0 , b1 , b2 , . . . , bm , gm+1 , gm+2 , gm+3 , . . . , gm+n
(4)
where gm+1 , gm+2 , gm+3 , . . . , gm+n show the lattice equivalent coefficients. The filter coefficients are searched, and then, the searched coefficients are retransformed into original form by using Eqs. (5) and (6). Ui (z) = Ui−1 + gi z −1 × Vi−1 (i = 1, 2, . . . n − 1)
(5)
with U0 (z) = V0 = 1
Vi (z) = z −m × U z −1
(6)
The stopband attenuation, passband, and stopband error are used to measure the performance of optimally designed filters. These attributes are given by Eqs. (7–10), which are presented as follows: As = −20 log10 (|H (ω)|) ep =
(7)
[D(ω) − H (ω)]2
(8)
[D(ω) − H (ω)]2
(9)
ω∈ω p
es =
ω∈ωs
SE = e p + es
(10)
A Water Cycle Algorithm for Optimal Design of IIR Filters
61
where As , e p , es, , SE indicate the stopband attenuation, the passband error, the stopband error, and the squared error, respectively.
3 Water Cycle Algorithm The water cycle algorithm draws its inspiration from the approaches rivers and streams move through the water cycle, which is downhill and toward the sea. In WCA, the streams are taken as potential candidate solutions, and based on objective function evaluation, few better streams are chosen as rivers. The best performing river is designed as a sea. The total population can be represented as follows: P = SeaRiver1 River2 . . . Rivernr Streamnr+1 Streamnr+2 . . . Streamnr+ns For an n dimensional decision variable, a stream candidate is represented as follows: S = [x1 , x2 , . . . , xn ] where S represents the candidate solution. The population matrix consists of the number of rivers, ns number of streams, and one sea; hence, the size of n dimensional decision variable population matrix is [(nr + ns + 1) × n]. The number of streams n k allocated to rivers and sea is decided by intensity of the flow and is given as follows: Fit k n k = round nr+1 × ns (k = 1, 2, . . . , nr, nr + 1) i=1 Fiti
(11)
where fitk is the fitness function of kth river or sea. The position of streams and rivers is updated by following river and sea, respectively, and is given as follows:
it it+1 it it = X stream + rand() × C × X river − X stream X stream
(12)
it it+1 it it X river = X river + rand() × C × X sea − X river
(13)
it it it , X river , and X sea represent the position of stream, river, and sea, where X stream respectively, at itth iteration; C is an algorithm constant, and its value lies in [0, 2]. The position of stream and river is updated based on fitness value of updated position of stream, and the similar interchange is also possible between river and sea.
62
T. Mittal
Evaporation Process: In the water cycle process, evaporation is also an important phenomenon. The water is evaporated from water sources, and evaporated water forms the clouds, which release the water back to the earth. By this process, new streams are created which flow toward rivers and further reach to sea. The evaporation and raining process are decided by distance between river and sea, and if the river is nearby to sea, then only this process takes place. In WCA, this process is controlled by algorithm parameter d and is given as [33]: X sea − X i
river
< d (i = 1, 2, . . . , nr)
In order to intensify the search process, the value of d decreases during the search process and is given as follows: d=d
max
max − d min × it d − itmax
(14)
where d min , d max represent the minimum and maximum value of algorithm parameter d; itmax represents the maximum number of set iterations for the algorithm. Raining Process After the evaporation process is finished, it will begin to rain. In this process, the new streams will be formed and randomly spread as follows:
u un l l X stream × rand() = X stream + X stream − X stream
(15)
l u un , X stream indicate lower and upper limits of X stream , respectively; X stream where X stream indicates the stream generated by a uniformly distributed random number.
4 Results and Discussion The 8th order low pass and high pass IIR filter’s optimal coefficients have been referred from Ref. [32]. The various attributes ( As , e p , es, , SE) are investigated to analyze the performance of WCA. The minimum value of objective function achieved by various optimization techniques is given in Table 1. The objective function value obtained by WCA is 0.9966, which is less than results obtained by other optimization techniques, i.e., PSO [31], constant weight inertia PSO (CWI-PSO) [32], constrained factor PSO inertia (CFI-PSO) [32], linearly decay inertia PSO (LDI-PSO) [32], modified dynamic inertia PSO (MDI-PSO) [32], time varying coefficients PSO (TVCPSO) [32], artificial bee colony (ABC) [32], and PSO with constriction factor and inertia weight approach [34]. It is observed from Table 1 that passband error, stopband error, and squared error values are very less and in the order of E-05, which is
A Water Cycle Algorithm for Optimal Design of IIR Filters
63
minimum as compared to its counterparts. It is also illustrated from the results that WCA is able to achieve maximum stopband attenuation, i.e., 28.4681 dB. Further, average CPU time is also compared, and it is found that WCA requires less CPU time as compared to other compared optimization techniques. Hence, it is concluded that WCA is able to search for the best optimal solution, which satisfies the requirement of desirable attributes. The optimal coefficients obtained by the WCA algorithm of low pass filter (LPF) are given in Table 2. The pole–zero plot is shown in Fig. 1. It can be observed from the plot that stability of the LPF is ensured by the fact that all the poles come under unit circle. For a high pass filter, a comparison of different optimization techniques on the basis of various attributes is presented in Table 3. The minimum value of objective Table 1 Comparison of attribute values for LPF Technique
Min objective function
Passband error
Stopband error
Squared error
Max stopband attenuation (dB)
Average CPU time (sec)
PSO [31]
2.131
26.8 × 10−2
6.33 × 10−2
33.13 × 10−2
25.15
–
PSO-CFIWA [34]
2.7808
47.35 × 10−2
6.28 × 10−2
53.63 × 10−2
25.25
–
CWI-PSO [32] 3.9165
13.22 × 10−2
106.82 × 10−2
120.04 × 10−2
14.27
97.01
CFI-PSO [32]
1.0218
29.88 × 10−2
46.96 × 10−2
76.84 × 10−2
15.15
94.19
LDI-PSO [32]
1.1365
24.18 × 10−2
21.28 × 10−2
45.46 × 10−2
24.74
74.56
MDI-PSO [32] 1.3927
21.22 × 10−2
37.24 × 10−2
58.46 × 10−2
18.11
72.77
TVC-PSO [32] 1.7584
23.95 × 10−2
50.43 × 10−2
74.38 × 10−2
15.52
70.71
ABC [32]
1.3804
13.58 × 10−2
27.22 × 10−2
40.8 × 10−2
14.82
Water cycle algorithm
0.9966
9.24 × 10−5
1.91 × 10−4
2.83 × 10−4
28.46
– 35.90
Table 2 Optimal LPF coefficients bi
− 0.0471
0.0370
0.2614
0.2697
− 0.0405
− 0.3000
− 0.3000
− 0.1882
− 0.0734
ai
1.0000
− 0.3000
0.0044
0.0166
− 0.2568
− 0.1636
0.1105
− 0.0270
− 0.0036
64
T. Mittal
Fig. 1 Pole–zero location for LPF
function obtained by WCA is 0.4218, which is better than other technique results reported in literature, i.e., CWI-PSO [32], CFI-PSO [32], LDI-PSO [32], TVC-PSO [32], ABC [32], and MDI-PSO [32]. The optimally designed filter by WCA shows minimum passband error, stopband error, and squared error along with maximum stopband attenuation. In addition to that, the average CPU time required to achieve the optimal results by WCA is least. Hence, it is concluded that WCA has successfully applied for optimal design of high pass IIR filters. The optimal coefficients of HPF searched by WCA are given in Table 4. Figure 2 presents the pole–zero plot for an optimally designed filter by WCA, and it is observed that all poles lie in a unit circle, which ensures the stability of the system.
5 Conclusions The water cycle algorithm is effectively implemented for optimal design of IIR LPF and HPF. In this work, lattice equivalent approach is used, in which, initially, the denominator polynomial of transfer function is transformed into lattice form, and then, filter coefficients are searched by WCA, and then, the searched coefficients are retransformed into original form. In WCA, the search process is performed by evaporation and the rain mechanism. These processes of algorithms help to explore new search areas and avoid the stagnation during the search process. The outcomes are
A Water Cycle Algorithm for Optimal Design of IIR Filters
65
Table 3 Comparison of attribute values for HPF Technique
Min objective function
Passband error
Stopband error
Squared error
Max stopband attenuation (dB)
Average CPU time (sec)
CWI-PSO [32]
2.8108
46.216 × 10−2
50.29 × 10−2
96.5 × 10−2
19.89
75.69
CFI-PSO [32]
1.1852
11.17 × 10−2
44.39 × 10−2
55.56 × 10−2
15.14
94.63
LDI-PSO [32]
1.4
17.98 × 10−2
45.8 × 10−2
63.78 × 10−2
15.48
92.4
MDI-PSO [32]
2.5807
03.55 × 10−2
110.33 × 10−2
113.88 × 10−2
8.21
97.64
TVC-PSO [32]
0.9987
18.85 × 10−2
37.7 × 10−2
56.55 × 10−2
16.29
70.32
ABC [32]
1.0982
14.11 × 10−2
19.4 × 10−2
33.51 × 10−2
17.22
–
Hybrid 1 [32] 3.5336
51.4 × 10−2
91.21 × 10−2
142.61 × 10−2
13.72
76.12
Hybrid 2 [32] 1.8818
19.43 × 10−2
52.4 × 10−2
71.83 × 10−2
18.98
79.38
Hybrid 3 [32] 1.1942
21.15 × 10−2
36.28 × 10−2
57.43 × 10−2
16.93
52.97
Water cycle algorithm
6.73 × 10−6
0.37 × 10−2
0.37 × 10−2
24.74
28.90
0.4218
Table 4 Optimal HPF coefficients bi
− 0.0611
ai 1.0000
0.1266
− 0.1613
0.4090 0.5000
0.1749 0.0419
− 0.4160
0.4304
− 0.1218
− 0.0239
− 0.5000
− 0.0773
0.0560 0.1687
0.0659
0.1580
superior to those given in the literature in terms of various attributes of filter design. A further integration of WCA is possible with other nature inspired optimization techniques and can be applied to search for the global best solution of practical optimization problems. In addition to that, a multi-objective version of WCA can also be proposed.
66
T. Mittal
Fig. 2 Pole–zero location for HPF
References 1. Dhaliwal KK, Dhillon JS (2017) Integrated cat swarm optimization and differential evolution algorithm for optimal IIR filter design in multi-objective framework. Circ Syst Signal Process 36:270–296 2. Zou DX, Deb S, Wang GG (2018) Solving IIR system identification by a variant of particle swarm optimization. Neural Comput Appl 30:685–698 3. Mohammadi A, Zahiri SM (2018) Inclined planes system optimization algorithm for IIR system identification. Int J Mach Learn Cyber 9:541–558 4. Mohammadi A, Zahiri SH (2017) IIR model identification using a modified inclined planes system optimization algorithm. Artif Intell Rev 48:237–259 5. Kumar M, Rawat TK, Aggarwal A (2017) Adaptive infinite impulse response system identification using modified-interior search algorithm with Lèvy flight. ISA Trans 67:266–279 6. Upadhyay P, Kar R, Mandal D, Ghoshal SP (2016) A new design method based on firefly algorithm for IIR system identification problem. J King Saud Univ Eng Sci 28:174–198 7. Dhaliwal KK., Dhillon JS (2016) On the design and optimization of digital IIR filter using oppositional artificial bee colony algorithm. In: IEEE students’ conference on electronics and computer science, 978-1-4673-7918-2/16 8. Sarangi A, Sarangi SK, Panigrahi SP (2016) An approach to identification of unknown IIR system using crossover cat swarm optimization. Perspect Sci 8:301–303 9. Mittal T (2022) A hybrid moth flame optimization and variable neighbourhood search technique for optimal design of IIR filters. Neural Comput Appl 34:689–704 10. Liang X, Wu D, Liu Y, He M, Sun L (2021) An enhanced slime mould algorithm and its application for digital IIR filter design. Discret Dyn Nat Soc. https://doi.org/10.1155/2021/533 3278
A Water Cycle Algorithm for Optimal Design of IIR Filters
67
11. Gotmare A, Bhattacharjee SS, Patidar R, George NV (2017) Swarm and evolutionary computing algorithms for system identification and filter design: a comprehensive review. Swarm Evol Comput 32:68–84 12. Agrawal N, Kumar A, Bajaj V, Singh GK (2021) Design of digital IIR filter: a research survey. Appl Acoust 172:107669 13. Ghibeche Y, Saadi S, Hafaifa A (2018) Optimal design of IIR filters based on least p-norm using a novel meta-heuristic algorithm. Int J Numer Model Electron Netw Devices Fields 32:1–18 14. Loubna K, Bachir B, Izeddine Z (2018) Optimal digital IIR filter design using ant colony optimization. In: 4th International conference on optimization and applications (ICOA), pp 1–5. https://doi.org/10.1109/ICOA.2018.8370500 15. Pelusi D, Mascella R, Tallini L (2018) A fuzzy gravitational search algorithm to design optimal IIR filters. Energies 11:736–754 16. Singh S, Ashok A, Kumar M, Garima, Rawat TK (2019) Optimal design of IIR filter using dragonfly algorithm. Adv Intell Syst Comput 698:211–223 17. Dhabal S, Venkateswaran P (2019) An improved global-best-guided cuckoo search algorithm for multiplierless design of two-dimensional IIR filters. Circ Syst Signal Process 38:805–826 18. Dash J, Dam B, Swain R (2020) Improved firefly algorithm based optimal design of special signal blocking IIR filters. Measurement 149:106986 19. Susmitha K, Karthik V, Saha SK, Kar R (2020) Optimal design of IIR band pass and band stop filters using GSA-BBO technique and their FPGA implementation. In: International conference on communication and signal processing, pp 1106–1110 20. Bisen M, Saha SK, Kar R (2021) MVO based optimal design of stable IIR HPF and its FPGA implementation. In: 3rd International conference on signal processing and communication (ICPSC), pp 202–206 21. Ali TAA, Xiao Z, Sun J, Mirjalili S, Havyarimana V, Jiang H (2019) Optimal design of IIR wideband digital differentiators and integrators using salp swarm algorithm. Knowl Based Syst 182:104834 22. Karthik V, Susmitha K, Saha SK, Kar R (2021) Invasive weed optimization-based optimally designed high-pass IIR filter and its FPGA implementation. In: Evolutionary computing and mobile sustainable networks. Lecture notes on data engineering and communications technologies, vol 53. Springer, Singapore. https://doi.org/10.1007/978-981-15-5258-8_24 23. Susmitha K, Karthik V, Saha SK, Kar R (2021) Biogeography-based optimization technique for optimal design of IIR low-pass filter and its FPGA Implementation. In: Evolutionary computing and mobile sustainable networks. Lecture notes on data engineering and communications technologies, vol 53. Springer, Singapore. https://doi.org/10.1007/978-981-15-5258-8_23 24. Sadollach A, Bahreinineja A, Eskandar H, Abd Shukor MH (2012) Mine blast algorithm for optimization of truss structures with discrete variables. Comput Struct 102:49–63 25. Heidari AA, Abbaspour RA, Jordehi AR (2017) An efficient chaotic water cycle algorithm for optimization tasks. Neural Comput Appl 28:57–85 26. Sadollah A, Eskandar H, Bahreininejad A, Kim JH (2015) Water cycle algorithm with evaporation rate for solving constrained and unconstrained optimization problems. Appl Soft Comput 30:58–71 27. Ravichandran SK, Sasi A, Vatambeti R (2022) Intelligent water drops algorithm hand calculation using a mathematical function. In: Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 114. Springer, Singapore. https://doi.org/ 10.1007/978-981-16-9416-5_8 28. Barakat M, Donkol A, Hamed HFA, Salama GM (2022) Controller parameters tuning of water cycle algorithm and its application to load frequency control of multi-area power systems using TD-TI cascade control. Evolving Syst 13:117–132 29. Ma C, Gao Z, Yang J, Cheng L, Zhao T (2022) Calibration of adjustment coefficient of the viscous boundary in particle discrete element method based on water cycle algorithm. Water 14:439
68
T. Mittal
30. Kumar A, Dhillon JS (2022) Environmentally sound short-term hydrothermal generation scheduling using intensified water cycle approach. Appl Soft Comput 127:109327 31. Saha SK, Kar R, Mandal D, Ghoshal SP (2011) IIR filter design with craziness based particle swarm optimization technique. Int J Electron Comput Ener Electron Commun Eng 5(12):1810– 1817 32. Agrawal N, Kumar A, Bajaj V (2018) Design of digital IIR filter with low quantization error using hybrid optimization technique. Soft Comput 22:2953–2971 33. Eskandar H, Sadollah A, Bahreininejad A, Hamdi M (2012) Water cycle algorithm-A novel metaheuristic optimization method for solving constrained engineering optimization problems. Comput Struct 110–111:151–166 34. Saha SK, Kar R, Mandal D, Ghoshal SP (2012) Digital stable IIR low pass filter optimization using PSO-CFIWA. In: 1st International conference on recent advances in information technology, pp 196–201
Comparative Evaluation of Machine Learning Algorithms for Credit Card Fraud Detection Kiran Jot Singh, Khushal Thakur, Divneet Singh Kapoor, Anshul Sharma, Sakshi Bajpai, Neeraj Sirawag, Riya Mehta, Chitransh Chaudhary, and Utkarsh Singh
Abstract Banks are the backbone of the economy of any country, and hence, it is essential that they function properly so as to achieve a booming and sustainable economy. The major barrier for banks is the defaults on the credits they give to the customer. As banks profit mostly from the loans they give to customer, hence, it is required that they possess a solid and efficient model to minimize the losses they incur via the credit defaults. This article assesses the credit card default prediction’s performance. Earlier the credit card defaults were tallied using standard tools such as FICO scores, but with the development of machine learning, it became much easier to build highly effective risk prediction model. Anyone with a bit of machine learning knowledge would know that credit risk prediction is nothing but a mere binary classification problem. Thus, various machine learning models such as KNN, decision tree, random forest, and logistics regression have been applied. When we look into the credit risk of credit card clients, it indicates that random forest best specifies the aspects to examine with an accuracy of 82% and an area under curve of roughly 77%. Keywords Credit card fraud · Machine learning · Data science default model prediction · Credit cards · Risk prediction
1 Introduction The credit card industry has existed for years and is a result of both changing consumer behavior and improved national income. According to the Reserve Bank of India’s latest data, credit card spending in India increased by 57% year on year in September 2021 to Rs 80,000 crore [11]. According to a survey, average monthly K. J. Singh · K. Thakur (B) · D. S. Kapoor · A. Sharma Electronics and Communication Engineering Department, Chandigarh University, Mohali, Punjab 140413, India e-mail: [email protected] S. Bajpai · N. Sirawag · R. Mehta · C. Chaudhary · U. Singh Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab 140413, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_6
69
70
K. J. Singh et al.
credit card spending increased to Rs 12,400 from Rs 10,700 in the previous six months [15]. Nevertheless, with the increase in credit card transactions, outstanding amounts and delinquency rates of credit card loans have also become problems that cannot be ignored. The rise in delinquencies will cause commercial banks to lose a large amount of money. Therefore, in order to prevent these losses, banks need an accurate risk prediction model which can help to sort the most common traits of people who are more likely to default on their debt into categories. Credit history and personal experience are currently used by loan officers when assessing whether or not to accept applicants. These requirements, on the other hand, are far from ideal. For starters, this method will not work because not every candidate has a credit history. Second, loan officers are often subjective; one person’s needs may differ from another. Personal concerns, in other words, may have a substantial impact on their actions. Despite the fact that most research studies now focus on financial policies and banking regulation to see how these policies affect default likelihood, it is widely acknowledged that banks must anticipate credit card default. In September 2021, the Indian banking sector recorded approximately 1.1 million new credit card additions, bringing the total number of outstanding credit cards to nearly 1.3 million [1]. Hence, it is essential that banks minimize credit card defaults and for that they need to adopt new and efficient models to achieve that, and this is exactly what we are trying to achieve through our report. Our main objective through this correspondence is to equip banks with an accurate risk prediction model that can help define the most related traits that are suggestive of persons who are more likely to default on credit in order to avoid these losses. By doing so, we can help banks to minimize huge losses on credit defaults.
2 Related Work Credit default delinquency is the major problem for each and every bank. The increase in delinquencies results in a significant amount of money loss from the commercial banks. Therefore, in order to prevent such huge losses, banks need new and accurate risk prediction models which can help them minimize the defaults. Furthermore, advances in technologies like the Internet of things, wireless sensor networks, and computer vision can be used to develop newer multi-domain solutions [5, 6, 10, 12–14]. Some of currently used approaches to detection of such fraud are logistic regression, random forest, decision tree, K-nearest neighbor, and support vector classifier. KNN—Finding neighboring data points and then settling on a class based on the classifications of the neighbors are what K-nearest neighbor categorization is all about. Logistic Regression—Logistic regression calculates discrete values like yes or no, true or false, and 0 or 1. It forecasts the likelihood of an event using data in a logistic function. Decision Tree—As the name implies, all decision tree algorithms recursively split data into branches to construct a tree in order to improve prediction accuracy. Random Forest—Random forest is used for both classification and regression, and it helps in combining the multiple classifiers so that we can solve complex
Comparative Evaluation of Machine Learning Algorithms for Credit …
71
problems. It thus increases the efficiency of the model. It gets the predictions from each tree and gets the final output by most of the predictions. SVC—Linear SVC is the best machine learning algorithm for our problem. The job of linear support vector classifier (SVC) is to fit our data and returns a “best fit” hyperplane that divides or categorizes it. In law, fraud is intentional deception to secure unfair or unlawful gain. Purpose of fraud may be monetary gain or other benefits, for example, credit default fraud where the perpetrator may try to stop paying the dues on their credit cards [8]. A payment default occurs when you fail to pay the minimum amount due on the credit card for a few consecutive months. Usually, the default notice is sent by the card issuer after 6 consecutive missed payments. Vast number of articles and papers have already been published in this domain and are available for public usage. By classifying customers on the basis of healthy and loan default consumers, [7] established a credit score methodology. They built a model using the P2P lending dataset and preprocessed the data due to noisy values. The results were put to the test using advanced gradient boosting models and keyword clustering-based approaches. To improve the performance of classifiers, they extracted prominent features [16]. Their tests revealed that the Catboost model, which is based on gradient boosting, outperformed other standard models. Looking at the basic rough working model, the full architecture diagram can be represented as depicted in Fig. 1.
3 Methodology This paper proposes credit card default prediction using various machine learning algorithms. We have obtained our dataset from Kaggle (a Website which provides dataset, making readers explore and build models. It provides information on payments, bill statements, education, age, and id of credit card clients in Taiwan from April 2005 to September 2005. The proposed framework is shown in Fig. 2. The very first thing we did on our credit dataset is data normalization. Data normalization is the process of organizing data in a database which includes creating tables and even establishing relations so as to achieve similarity in data across all records and fields. After that various machine learning models such as KNN [17], logistics regression [2], decision tree [4], random forest [9], and SVM [3] were applied on the normalized data. The next step was to validate our prediction models by applying various models and comparing their accuracy, AUC, recall, and much more and even boost their performance by using “PCA” and “outlier removal”. The final step is to deploy and distribute our model to various banks so as to improve the functionality of banks and minimize their credit defaults. The dataset defines the variables as follows: • ID: It showed the id of the customer. • LIMIT_BAL: It depicts the amount of given credit in NT dollars. It includes individual and family/supplementary credit.
72
K. J. Singh et al.
Fig. 1 Flow diagram
• SEX: It tells about the gender of client, it is defined in 1 and 0, and here, 1 is defined for male, and 2 is for female. • EDUCATION: It talks about education of client, and it defined in numbers 1,2,3,4,5,6. • MARRIAGE: It defines the marital status of a person, it is defined in 1,2, and 3, and here, 1 is defined for married, 2 is for single, and 3 is for others. • AGE: Age is defined in years. Range is between 21 and 79. • PAY: It defines repayment status (−1 = pay duly, 1 = indicates payment delay for one month, 2 = indicates payment delay for two months, … 8 = indicates payment delay for eight months, while 9 = indicates payment delay for nine months and above). • BILL_AMT: It defines the amount of bill statements in NT dollar. • PAY_ATM: It is defined for the previous payments in NT dollar. • DEFAULT: It defines the default payments, and it is in 1 and 0; here, 1 means yes, and 0 means no. The bar graph (shown in Fig. 3) depicts the relationship between LIMIT_BAL and AGE of customer. The results show how it is important for banks to know the
Comparative Evaluation of Machine Learning Algorithms for Credit …
73
Fig. 2 Proposed framework
age of the customer when deciding to give a credit card. This bar graph shows that the age group of people from 45 to 55 are more likely to pay credit card bills and people who are younger (below 20) or are older (above 60) fail to pay credit card bills. The people in the 48 age group have maximum chances to pay credit card bills. The scatter plot between limit balance and age of the customer is depicted in Fig. 4. It can help us to examine which age group has higher limit balance and which age group has lower. It may be observed from the scatter plot that at the age of 48, there exists an outlier with a high limit balance. Thus, outlier removal is essential for better prediction. The above feature importance plot (Fig. 5) indicates which characteristics of the credit dataset are crucial for predicting whether the customer will default or not. According to the above plot, it is clearly visible that AGE is the most important variable (feature) in our dataset followed by LIMIT_BAL, while the bill statements in respective months are of least importance. Hence, this feature importance plot can help us decide which factors are to be considered while giving credit. To enhance the efficiency of our model, we are using 2 model boosters. 1. Principal component analysis (PCA) 2. Outlier removal
74
K. J. Singh et al.
Fig. 3 Graph of LIMIT_BAL versus AGE
Fig. 4 Scatter plot of LIMIT_BAL versus AGE
PCA: One important machine learning method for dimensional reduction is called principal component analysis. It is a method which employs simple matrix operations. It is a method which helps in reducing dimensionality without information loss. It also increases interpretability of the dataset. This algorithm will transform columns to a new set of features called principal components which helps in compressing large information into few columns. Outlier Removal: It is a method to make data look nice and tidy. It is the process of detecting the outliers and excluding it. They are called outliers as they “lie outside” the normal distribution curve.
4 Results and Discussion The dataset includes information about default payments, demographic factors, and payment history for credit card users in Taiwan from April 2005 to September 2005.
Comparative Evaluation of Machine Learning Algorithms for Credit …
75
Fig. 5 Feature significance plot
There are a total of 30,000 distinct credit card users, and the average value for credit card limit is 167,484. Education level is mostly graduate school and university. Average age is 35.5 years with a standard deviation of 9.2. The mean value of 0.324 indicates that 32.4% of credit card contracts will default the following month. The value “0” stands for default payment, while “1” represents “not default”. This concept is difficult to put into practice since it demands cooperation from banks, which are unwilling to disclose information owing to market competition, as well as other stakeholders, due to legal considerations and the protection of their user’s data. However, if slight cooperation is received from bank’s side, then after implementing various machine learning models on the data from bank’s database, the efficiency of the models can be improved, and banks can benefit from the model by using them to sort the most common traits of people who are more likely to default on their debt into categories. Banks profit mostly from the loans they give to customers; hence, it is required that they possess a solid and efficient model to minimize the losses they incur via the credit defaults. Since our main aim was to build a prediction model, there was a dire need for a dataset to build the model. There is a need for data to test whether our model made correct predictions on new data. We split our datasets into training and test data with a ratio of 70:30. Various classifiers have been utilized to evaluate the performance of imbalanced datasets. The performance of various traditional machine learning models on imbalanced datasets was compared. The results are presented in Tables 1 and 2 with Table 1 representing the comparison by using “PCA” and Table 2 with “outlier removal”. All models performed well with transformed train and test data. KNN has a 75% accuracy rate, while logistics regression has a 77.84% accuracy
76
K. J. Singh et al.
Table 1 Performance metrics using PCA Model
Accuracy
AUC
Recall
Precision
TT (sec)
Ridge classifier
0.8226
0.0000
0.3178
0.6731
0.034
Linear discriminant analysis
0.8225
0.7555
0.3325
0.6627
0.105
Gradient boosting classifier
0.8219
0.7681
0.3118
0.6726
4.060
Light gradient boosting machine
0.8210
0.7666
0.3213
0.6606
0.308
Recall
Precision
TT (sec)
Table 2 Performance metrics using outlier removal Model
Accuracy
AUC
Ridge classifier
0.8226
0.0000
0.3178
0.6731
0.034
Linear discriminant analysis
0.8225
0.7555
0.3325
0.6627
0.105
Gradient boosting classifier
0.8219
0.7681
0.3118
0.6726
4.060
Light gradient boosting machine
0.8210
0.7666
0.3213
0.6606
0.308
rate. Decision tree gives least accuracy at around 73% accuracy, while random forest gives most accurate prediction which is something close to 82%. Principal component analysis (PCA) is very useful to speed up the computation by reducing the dimensionality of the data. After using the PCA model booster, we found out that random forest gives the highest accuracy of 78.39% followed by KNN as given in Table 1. Extreme numbers beyond the expected range and dissimilar to the rest of the data may exist in a dataset. These values are known as outliers, and detecting and removing them can help improve machine learning modeling and model skill in general. The default value for outliers threshold is 0.05. After using the outlier removal technique, we found that the accuracy of many of the machine learning models has increased. Random forest classifier showed a slight improvement from 0.8162 to 0.8163, and logistics regression improved from 0.7881 to 0.7784 after using outlier removal technique.
5 Conclusion and Future Scope Credit cards are becoming an important thing in today’s world and so are the frauds. We are trying to help the banks in such a way that they can keep a check on payments of a client, and it will also increase the prediction performance of the model. This article discusses different aspects regarding credit defaults and solutions for the same. It will prevent banks from incurring high losses. We are giving banks a new and accurate prediction model which helps to classify the most vital characteristics of people whose probability of defaults is higher. We are strengthening risk management, control, and defining the most related traits that are suggestive of persons who
Comparative Evaluation of Machine Learning Algorithms for Credit …
77
are more likely to default on credit in order to avoid these losses. Banks can make the best use of these algorithms in order to boost their performance and image in the industry. According to our comparative analysis, we found that random forest gives highest accuracy and precision be it with or without model boosters. Without applying any model booster, random forest has an accuracy of 82.26%, while after applying PCA and outlier removal, it gives 82.33% accuracy. As we know that efficiency of the algorithm increases as the size of the dataset is increased. Hence, more data will surely make the model more accurate in detecting frauds and reduce the number of false positives (defaults).
References 1. Abor JY, Gyeke-Dako A, Fiador VO, Agbloyor EK, Amidu M, Mensah L (2019) Overview of the monetary system. Adv Afr Econ Soc Polit Dev 3–30. https://doi.org/10.1007/978-3-31977458-9_1 2. Bukhari MM, Ghazal TM, Abbas S, Khan MA, Farooq U, Wahbah H, Ahmad M, Adnan KM (2022) An intelligent proposed model for task offloading in fog-cloud collaboration using logistics regression. Comput Intell Neurosci 2022. https://doi.org/10.1155/2022/3606068 3. Chauhan VK, Dahiya K, Sharma A (2019) Problem formulations and solvers in linear SVM: a review. Artif Intell Rev 52(2):803–855. https://doi.org/10.1007/S10462-018-9614-6/FIGURE S/16 4. Dai QY, Zhang CP, Wu H (2016) Research of decision tree classification algorithm in data mining. Int J Database Theory Appl 9(5):1–8. https://doi.org/10.14257/ijdta.2016.9.5.01 5. Jawhar Q, Thakur K, Singh KJ (2020) Recent advances in handling big data for wireless sensor networks. IEEE Potentials 39(6):22–27 6. Jawhar Q, Thakur K (2020) An improved algorithm for data gathering in large-scale wireless sensor networks. In: Lecture notes in electrical engineering, vol 605, pp 141–151. https://doi. org/10.1007/978-3-030-30577-2_12 7. Knapp EA, Dean LT (2018) Consumer credit scores as a novel tool for identifying health in urban U.S. neighborhoods. Ann Epidemiol 28(10):724–729. https://doi.org/10.1016/J.ANN EPIDEM.2018.07.013 8. Piquero NL, Piquero AR, Gies S, Green B, Bobnis A, Velasquez E (2021) Preventing identity theft: perspectives on technological solutions from industry insiders. Vict Offenders 16(3):444– 463. https://doi.org/10.1080/15564886.2020.1826023 9. Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems. ACM Comput Surv 51:3. https://doi.org/10.1145/3178582 10. Sachdeva P, Singh KJ (2015) Automatic segmentation and area calculation of optic disc in ophthalmic images. In: 2015 2nd International conference on recent advances in engineering and computational sciences (RAECS). IEEE, pp 1–5. https://doi.org/10.1109/RAECS.2015. 7453356 11. Saxena NC (2018) Hunger, under-nutrition and food security in India. In: Poverty, chronic poverty and poverty dynamics policy imperatives, pp 55–92. https://doi.org/10.1007/978-98113-0677-8_4 12. Sharma A, Kapoor DS, Nayyar A, Qureshi B, Singh KJ, Thakur K (2022) Exploration of IoT nodes communication using LoRaWAN in forest environment. Comput Mater Contin 71(2):6240–6256. https://doi.org/10.32604/CMC.2022.024639 13. Sharma A, Agrawal S (2012) Performance of error filters on shares in halftone visual cryptography via error diffusion. Int J Comput Appl 45:23–30
78
K. J. Singh et al.
14. Singh K, Singh KJ, Kapoor DS (2014) Image retrieval for medical imaging using combined feature fuzzy approach. In: 2014 International conference on devices, circuits and communications (ICDCCom). IEEE, pp 1–5. https://doi.org/10.1109/ICDCCom.2014.7024725 15. Swiedler EW, Muehlenbachs LA, Chu Z, Shih JS, Krupnick A (2019) Should solid waste from shale gas development be regulated as hazardous waste? Energy Policy 129:1020–1033. https:// doi.org/10.1016/J.ENPOL.2019.02.016 16. Varuna Shree N, Kumar TNR (2018) Identification and classification of brain tumor MRI images with feature extraction using DWT and probabilistic neural network. Brain Inf 5(1):23– 30. https://doi.org/10.1007/S40708-017-0075-5/FIGURES/4 17. Xing W, Bei Y (2020) Medical health big data classification based on KNN classification algorithm. IEEE Access 8:28808–28819. https://doi.org/10.1109/ACCESS.2019.2955754
Future Commercial Prospects of Unmanned Aerial Vehicles (UAVs) Divneet Singh Kapoor, Kiran Jot Singh, Richa Bansal, Khushal Thakur, and Anshul Sharma
Abstract This correspondence addresses the rising demand of unmanned aerial vehicles (UAVs), also referred as drones, for civilian as well as military applications. With the recent advancement in the drone industry, drones are now being manufactured/developed not only as toys but also cater to the unexplored applications with the use of some sophisticated machinery. In view of the increased utilization of drones in diverse professional sectors, the commercial market of UAVs or drones is set to rise exponentially in the coming years. These flying devices can be utilized in impending or pre-existing commercial markets for the fulfillment of underlying objectives (spanning social, economic, and environmental facets) of the businesses. In the present work, we discussed multiple applications of drones in line with the current relevant business landscape, future prospects and opportunities, and also the markets that can be addressed with drone usage in the near future. Keywords Drone · UAV · Military · Agriculture · Consumer · Global market
1 Introduction The flying object that can be controlled remotely or has autonomous flying capability is named as unmanned aerial vehicle (UAV), most commonly known as “drone”. These vehicles are manufactured or designed in various shapes, functions, and sizes and are utilized in different military and civilian applications across the globe. The UAVs can hover in confined spaces or can fly spanning thousands of kilometers as well. UAVs are considered as emerging platforms for sophisticated robotics, electronics, and aeronautics coupled with artificial intelligence for possible enhancement of their performance to meet diverse objectives of varied applications. Recent advancements in manufacturing techniques in terms of fabrication of the outer body, capabilities of controlling remotely, improved navigation techniques, and enhanced power storage systems have enabled the exponential rise in the utilization of drones D. S. Kapoor (B) · K. J. Singh · R. Bansal · K. Thakur · A. Sharma Electronics and Communication Engineering Department, Chandigarh University, Mohali, Punjab 140413, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_7
79
80
D. S. Kapoor et al.
in the areas where human intervention is difficult. Drones were utilized first during warfare in 1849, in the form of balloons loaded with explosives being sent by Austrians to attack the city of Venice in Italy [1]. Also, the decorated history of the flights of early UAVs has been penned by Keane and Carr in [2]. In recent times, drones are being used in military operations which are remotely controlled by soldiers to perform various operations. As per the latest global market report, the global unmanned defense aerial vehicle market has grown from $ 12.23 billion in 2021 to $ 13.87 billion in 2022. The growth can be attributed to the fact that companies have resumed their operations to adapt to the new normal in an attempt to recover from COVID-19 impact, which had earlier led to limiting containment actions involving social distancing, isolated work style, and closing of commercial activities which resulted in operational constraints. The market is expected to reach $ 22.04 billion in 2026 at a compound annual growth rate (CAGR) of 12.3%. Considering the numerous advantages of the drones and upon optimization and enhancement of the performance, potential operations including surveillance, payload transportation, and protection of a fixed geographical area can be conducted. Since the drones can differ largely on the grounds of their configurations based on the application areas, they are being designed; numerous categories exist to divide the drone classifications on the basis of different parameters. The UAVs have been divided into classes/categories based on their underlying weight of the drone, its size, and also on the type/weight of the payload it can carry [3]. The wide spectrum of the drones extends from UAV class with a maximum wing span of 61 m with a total weight of 15,000 kg [4], to a minimum size of 1 mm with a weight of approximately 5 mg [5]. Steady development is a progression of little enhancements or redesigns made to an organization’s current items, administrations, procedures, or techniques leading to incremental innovation. Innovation progression follows an exponential development bend. Each new innovation begins with a couple of exploratory models that create fervor among the general population. The universe of the UAVs is no special case. Improvement and utilization of such drones kept on being prevalent for fighting by the military. Toward the end of the twentieth century, flying RC airplanes as a side interest developed significantly, while other non-military business applications were likewise investigated by governments. The timeline of incremental innovation in the applications of UAVs is shown in Fig. 1 [6–9].
2 Application Areas An eclectic range of applications (as depicted in Fig. 2) in which drones are utilized span multiple military and civil operations, ranging from accustomed to very challenging environments [10]. A UAV which is equipped with multiple sensors and actuators can be categorized in various classes based on the broad area of application, flight space confinement, and type of environments. Few of the applications have been discussed in detail.
Future Commercial Prospects of Unmanned Aerial Vehicles (UAVs)
81
Fig. 1 Timeline of incremental innovation in UAVs
2.1 Defense The UAVs can be used for tackling terrorism and security-related challenges and identifying areas which are vulnerable to different risks. Along with conventional weapons and other warfare technologies, drones enhance capabilities of defense forces to control and counter emerging challenges in security. Primarily, in the defense sector, drones are used for [11]: border security, counter insurgency, crime control, surveillance of sensitive locations, etc.
82
D. S. Kapoor et al.
Fig. 2 Applications of UAVs or drones
2.2 Agriculture Agricultural industry is embracing drone technology so as to modernize farming. Different types of drones are available which can be used to increase overall productivity of the farming process. Also, with commercialization of technology, the cost of drones used for agriculture has reduced significantly, which has made it easy for big far owners to adopt this technology. Few major cases are mentioned below [12]: crop monitoring, livestock management, crop spraying, irrigation mapping, checking for weeds and spot treating plants, etc.
Future Commercial Prospects of Unmanned Aerial Vehicles (UAVs)
83
2.3 Industrial Use With worldwide adoption of Industrial Internet of Things (IIOT) technologies and transition toward Industry 4.0, drones are taking the main stage in industries. Different sensors mounted on drones can be used for capturing different arrays of data which can be used for digitalization of industrial processes. Below are few major use cases [13]: infrastructure development, maintenance and asset management, warehousing and inventory management, last mile delivery, real-time surface surveys in boilers and rapid pre/post-blast data in mines, detect gas leaks in pipelines and find out oil spills in oil rigs, etc.
2.4 Entertainment Industry Even though the market for drones in entertainment industry is not as big as other industries, but still a significant number of experimentation has been conducted in this sector with drones [14], including aerial photography and videography, live broadcast of sports and other public events, drone-based solutions in advertising, use of drones for special effects, etc.
2.5 Firefighting Fire departments and facility management teams have started using drones for quickly understanding the crisis situation using an aerial view. Special purpose infrared cameras and sensors are mounted on drones which are helpful in fighting fires since these drones can even fly and provide live feed in buildings with smoke. Designers are also trying to mount fire extinguishers along with drones for the purpose of fire suppression. Below are some firefighting drone use cases [15]: risk and disaster assessment, hotspot detection, fire suppression, emergency deliveries, firefighting in high rise buildings, etc.
2.6 Traffic and Crowd Monitoring With increasing vehicular as well as pedestrian traffic on streets, urban planners need better traffic and crowd management technologies as just using fixed cameras is not efficient. Drones mounted with cameras can help in computing density of traffic with respect to capacity of road and understanding flow of traffic to make strategies which can help in reducing congestion on roads and public places. This can help in reducing fuel consumption as well as save time of commuters [16]: control center assistance
84
D. S. Kapoor et al.
for road condition monitoring, traffic guidance and activity analysis, capturing wider view during mass gatherings, etc.
2.7 Others People and companies are innovating with drone technology and are using drones for many other unconventional purposes also, such as wireless Internet access, 3D mapping of natural landscapes and industrial setups, weather forecast, public administration, and environment management.
3 Commercial Aspects The UAVs, just like the World Wide Web (internet) and GPS navigation, have advanced from their initial military applications (which was the inception of drone applications) to incredible business devices to suit the commercial market. They have just made the jump to the customer market, and now, they are being given something to do in business and government applications from firefighting to cultivating. That is making a market opportunity that is too huge to even think about ignoring. Goldman Sachs report estimates a $100 billion market open door for drones because of the developing interest from the business and government segments (as illustrated in Fig. 3). Military. Automatons drones with a $70 billion market got their beginning as more secure, less expensive, and frequently progressively skilled options in contrast to
Fig. 3 Market size for drones (in billion dollars)
Future Commercial Prospects of Unmanned Aerial Vehicles (UAVs)
85
manual military airplanes. The barrier will remain the biggest market for a long time to come as worldwide rivalry warms up and innovation keeps on improving. Consumer. The drone market, sized to $17 billion for consumers, was the first to be created outside the military. The request has taken off over the most recent two years, and specialist drones have become a recognizable sight; however, there is a lot of space for development. Commercial. The quickest development from organizations and governments. They are simply starting to investigate the potential outcomes; it is expected they will burn through $13 billion on drones between now and 2020, placing a large number of them in the sky. With the business developing, the automation work showcase inside it is turning out to be progressively mind-boggling and increasingly hard to explore—organizations are contending not exclusively to build up the most recent UAV innovation, yet additionally to get the absolute best architects out there. Drone market will generate multiple jobs in different areas for software and hardware engineers, sales, operations, etc. Furthermore, it will serve different domain areas like construction, agriculture, insurance, etc. Figure 4 shows the total addressable market by industry/function as per Goldman Sachs report. In the coming years, it is expected that buyers will keep on building. By 2020, it is expected 7.8 million customer drone shipments orders and $3.3 billion in income, versus just 450,000 shipments and $700 million in income in 2014 as shown in Fig. 5. Table 1 demonstrates the analysis of various consumer drones market reports, forecasting the rise in market trend in various industries [17]. As per report of risk survey [18, 19], fire, terrorism, and natural hazards have become the top three risks for the first time over the last three years, owing to the substantial number of reported occurrences causing losses to material and physical assets. This shift in recent years is shown in Fig. 6. It can be inferred from the data that UAVs can play a pivotal role in mitigating the aforementioned risk factors.
Fig. 4 Total addressable market by industry/function (in million dollars)
86
D. S. Kapoor et al.
Fig. 5 Retail/consumer drone market (in billion dollars)
4 Discussion It is believed that Amazon is going to take the lead in human-less delivery business by utilization of drones. The USA has also sanctioned the use of drones for military, recreational activities, and research purposes. It is a notable achievement of NASA to use an “Ingenuity” drone on Mars for terrestrial and aerial surveillance. The reluctance of the Federal Aviation Administration (FAA), in providing the license to drone companies for consumer services in the US, is causing a bit of delay. If drone delivery gains permission and widespread usage, it will directly impact the automobile industry as well. Also, on the other hand, countries would be able to reduce emission in order to meet global agreements. On a macroeconomic scale, with the approval of UAVs in commercial applications will create new jobs, and will directly give a boost to manufacturing sector and drone operators’ jobs. However, it may also be noted that many states in the USA have already passed their own laws regarding use of UAVs for delivery and public use applications. In India, a “Drone Rule” was liberalized in 2021 by the ministry of civil aviation which gave a big lift to the drone industry after the government announced it as “Drone Shakti”. As stated by the Indian government, multiple start-ups will be established in the country to facilitate “Drone Shakti” through various use cases and applications and are expected to make the drone sector a $5 Billion market [20]. The recent announcement by the government on drone (Amendment) rules, 2022, abolishing that mandate of remote pilot certificate for flying small to medium size drones of up to 2 kg for non-commercial activities, has given a new momentum to the drone industry in India. Despite all the possible benefits, the UAV’s can offer to varied facets of life; there are few inevitable downsides of UAVs, and one of the most substantial ones is that they are very vulnerable to software hacks, and they cannot work in all weather conditions. Further, UAVs also pose a threat to safety and privacy, which is a concern for the users/government. It is said that “great power comes with great responsibilities”; hence, there is a need to develop a strategy for the use of UAVs to strengthen the
Global market insights
2016
1.5 Billion (USD)
2024
18
9 Billion (USD) by 2024
Product, technology, application, region
Source1: https://www. gminsights.com/indust ryanalysis/consumerdrone-market
Report coverage
Base year:
Market size in base year:
Forecast period:
CAGR (%)
Value projection:
Segments covered:
Source:
Source2: https://www.busine ssinsider.com/droneindustryanalysis-market-trends-gro wthforecasts?IR=T
Agriculture, construction and mining, insurance, media and telecommunications, law enforcement
63.6 Billion (USD) by 2025
66.80
2023
1.25 Billion (USD)
2016
Insider intelligence
Table 1 Consumer drones markets forecast and analysis
Source3: https://www. marketdataforecast.com/ marketreports/consumerdrones-market
Product, technology, region
17.5 Billion (USD) by 2026
20
2026
1 Billion (USD)
2020
Market data forecast
28.66
2026
1.5 Billion (USD)
2021
Researchandmarkets.com insights
Source4: https://www. fortunebussinessinsig hts.com/small-dronesmarket-102227
Type, maximum takeout weight, end-use, geography
Source5: https://finance. yahoo.com/news/globalconsumer-drones-marketpoised-164500132.html
Distribution channel, product, geographical
46.68 Billion (USD) by 22.26 Billion (USD) by 2028 2026
22.86
2028
9.31 Billion (USD)
2020
Fortune business insights
Future Commercial Prospects of Unmanned Aerial Vehicles (UAVs) 87
88
D. S. Kapoor et al.
Fig. 6 Shift in risk factors over the years
whole UAV/drone-related ecosystem with fixed standards and regulations. On the whole, drones possess a great potential in development of the economic front and mankind in general, and we must strive for excellence in this sphere of life.
5 Conclusion UAVs have been used by humans since the 1900s for military applications and since 2000s for civilian operations. However recently, more applications have evolved exponentially in the past decade. The technological advancement, process enhancements, and optimizations have fueled this rapid growth. This correspondence has pointed out some of the applications where the drones are being utilized to the full extent. Moreover, the current business landscape of drones in the commercial sector has also been laid down. It can be inferred from the analysis posted in the previous section that there are a number of markets that can address the utilization of drones as a potential opportunity to enhance the business perspective. The development of drones and its utilization is a revolutionary field similar to the mobile cellular phones or the smartphones and will define the future trends in various other commercial sectors as well in the coming times.
References 1. Hargrave. Remote piloted aerial vehicles: an anthology. http://www.ctie.monash.edu.au/har grave/rpav_home.html#Beginnings
Future Commercial Prospects of Unmanned Aerial Vehicles (UAVs)
89
2. Keane JF, Carr SS (2013) A brief history of early unmanned aircraft. Johns Hopkins APL Tech Dig 32:558–571 3. Hassanalian M, Abdelkefi A (2017) Classifications, applications, and design challenges of drones: a review. Prog Aerosp Sci 91:99–131 4. Wikipedia. Boeing condor. https://en.wikipedia.org/wiki/Boeing_Condor 5. Kahn JM, Katz RH, Pister KSJ (1999) Next century challenges: mobile networking for “Smart Dust”. In: Proceedings of the 5th annual ACM/IEEE international conference on Mobile computing and networking—MobiCom ’99. ACM Press, pp 271–278. https://doi.org/10.1145/ 313451.313558 6. Crilly R (2011) Drones first used in 1848. The Telegraph. https://www.telegraph.co.uk/news/ worldnews/northamerica/8586782/Drones-first-used-in-1848.html 7. Leu C (2015) The secret history of World War II-era drones. Wired. https://www.wired.com/ 2015/12/the-secret-history-of-world-war-ii-era-drones/ 8. Consortiq (2020) A not-so-short history of unmanned aerial vehicles (UAV). https://consortiq. com/short-history-unmanned-aerial-vehicles-uavs/ 9. NASA (2020) MARS Helicopter 10. Rodríguez RM, Alarcón F, Rubio DS, Ollero A (2013) Autonomous management of an UAV airfield. In: Proceedings of the 3rd international conference on application and theory of automation in command and control systems. Naples, pp 28–30 11. IdeaForge. Defence and homeland security. https://www.ideaforge.co.in/drone-uses/defencegovernment-applications/ 12. Jensen J (2019) Agricultural drones: how drones are revolutionizing agriculture and how to break into this booming market. https://uavcoach.com/agricultural-drones/ 13. Newton M (2017) Top 5 industrial applications for drones. https://blog.opto22.com/optoblog/ top-industrial-applications-for-drones 14. Flynn S (2016) Market for drone applications in media and entertainment industry. https:// skytango.com/market-value-of-drone-applications-in-media-entertainment-industry-valuedat-over-8-bn-dollars-says-pwc/ 15. Dronefly. Firefighting drone infographic. https://www.dronefly.com/firefighting-drones-dro nes-in-the-field-infographic 16. IdeaForge. Traffic monitoring. https://www.ideaforge.co.in/drone-uses/traffic-monitoring/ 17. PR Newswire (2022) Consumer drones market size to grow by USD 22.26 Billion | 17,000+Technavio research reports. https://www.prnewswire.com/news-releases/consumerdrones-market-size-to-grow-by-usd-22-26-billion--17-000-technavio-research-reports-301 504198.html 18. Pinkerton and FICCI (2017) India risk survey 2017 19. Pinkerton and FICCI (2019) India risk survey 2019 20. Business Today (2022) Budget 2022: drone industry gets a big boost. https://www.busine sstoday.in/union-budget-2022/news/story/budget-2022-drone-industry-gets-a-big-boost-321 063-2022-02-01
Experimental Analysis of Deep Learning Algorithms Used in Brain Tumor Classification Kapil Mundada, Toufiq Rahatwilkar, and Jayant Kulkarni
Abstract Timely detection and analysis of brain tumors is necessary for saving lives of people in world. In recent times, deep learning using transfer learning (TL) approaches are commonly used for identifying 3 common types of tumors, e.g., meningioma, glioma, and pituitary glands. We used pre-trained transfer learning techniques to identify meningioma, glioma, and pituitary brain tumors. The goal of this research is to evaluate and compare the performance of various deep learning algorithms which can be used for brain tumor classification. Six pre-trained TL classifiers are used in the experimental analysis. InceptionV3, Xception, Resnet50, EfficienNetB0, VGG16, and MobileNet use a fine-grained classification approach to automatically identify and classify brain tumors. The experimentation is conducted on brain tumor MR image dataset with 7022 images available freely on Kaggle, and the tool used is Python. The results are computed using commonly used matrix. Classification experiments have shown that the VGG16 TL algorithm showed excellent performance and achieves the highest accuracy of 99.09% in the detection and classification of no tumors, gliomas, meningiomas, and pituitary brain tumors. Other models’ except EffecientNet B0 are also performing satisfactorily. Keywords Brain tumor classification · Deep learning · Transfer learning · CNN
1 Introduction Center of human control is the human brain which is responsible for activities that happen in daily life. Pituitary, meningioma, and glioma tumors are the major types. Meningiomas are bundles of extra blood vessels that usually appear in the brain K. Mundada (B) · T. Rahatwilkar · J. Kulkarni Department of Instrumentation Engineering, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] T. Rahatwilkar e-mail: [email protected] J. Kulkarni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_8
91
92
K. Mundada et al.
as well as the spinal cord. Glioma is made up of glial cells in the brain. Pituitary increases in the brain when gland (pituitary) cells grow near the brain abnormally. This situation is critical and important to be checked if any symptoms are present [1]. One of the solutions is to classify this tumor using ML algorithm [2]. However, it is difficult in multi-classification, i.e., into meningeal tumors, pituitary tumors and gliomas because of their large size, shape, and strength [3]. In addition, meningioma, pituitary, and glioma tumors have the largest occurrence of all brain tumors [4]. The various features extracted from MRI images are a major source in classification of tumors. DL can predict and give decisions about available data by technique of learning to represent it. DL is preferably used to classify images related to the medical field. However, these methods have produced results up to the expectations in different applications [5]. However, the consistent learning approach lacks a datadriven approach. A large amount of training data is required. Lately, the DL approach, especially the CNN model, has been getting more and more attention. CNNs are superior to other classifications in larger datasets, ImageNet, which consists of tons of images. However, CNNs are difficult to use in medical images. First, the medical imaging dataset contains only limited data. This is because a professional radiologist must label the images in the dataset. This is uninteresting and time hungry work. Second, training CNN is difficult with small datasets due to overfitting. Thirdly, CNN’s hyper-classification parameters need to be changed to improve performance, which requires domain expertise. Therefore, using and refining TL’s pre-trained model is a viable solution. The selection is not included. Use a standard brain tumor classification dataset (MRI) that includes three types of brain tumors: meningiomas, pituitary glands, gliomas, and other non-tumor categories. Based on this dataset, we will conduct extensive experiments for comparing performance of the six CNN architectures for classification of MRI images. I used VGG16, Inceptionv3, Xception, Resnet50, EfficienNetB0, and MobileNet for automatic detection and classification of brain tumors with the help of the differentiated approach. In addition, many manual tumor areas are used to discover and classify brain tumors that do not completely automate brain tumors. This experiment’s goal is to find the best deep TL model which is efficient as well as effective for classifying brain tumors. This white paper reports general accuracy, precision, recall, and measurements. The main contributions to our research are as follows: • Proposes a DL-based framework for automatic identification and classification of tumors in meningiomas, pituitary glands, and glial tumor. • Analyze, validate TL CNN model of 6 deep neural network architectures. • Correctly and efficiently classify brain MRI images and analyze the performance of each fluorescent pattern.
1.1 Literature Review Recently, a lot of research has been done in the field of brain tumor detection and classification. This section provides an overview of the same.
Experimental Analysis of Deep Learning Algorithms Used in Brain …
93
Brain tumors were classified using a network of capsules, and a real-world MRI dataset was used to study the CapsNets super-synthesis problem [6]. CapsNet requires less data for training and has achieved satisfactory results in brain MRI scans. Authors built a visual model to explain the functionality achieved in the best way. Accuracy of 85.59% indicates that methods presented in classifying tumors can surpass CNN [7]. The authors suggested CNN for using it in classification for automated tumor classification. Author used small kernels to create a deeper architecture. The weight of the neurons is described as small. Compared to all other modern methodologies, results indicate that CNN’s archive is 97.5% accurate having little or no complexity. Author Kesav et al. [8] created a new framework used for classifying tumors and recognizing its types and assessed it on two datasets published by Kaggle and Figshare. Aim was to create a core framework to allow traditional R-CNN to run and work faster. MRI images of gliomas and tumors first evaluated from the dual channel of CNN. The R-CNN function extractor then worked for localizing the tumor that is also in an early stage classified glioma MRI sample using the same frame. A bounding box was used to define the tumor area. Meningiomas and pituitary tumors are two types of malignant tumors that are treated in this way. This proposed method got a 98.83% confidence level average in the process of classifying tumors from two classes: meningiomas and pituitary tumor.
2 Methodology This section describes the research methods proposed for the classification of brain tumors to understand more about the proposed transfer learning approach.
2.1 Proposed Methodology Datasets having images of no tumor, glioma, meningioma, and pituitary magnetic resonance images were obtained from Kaggle [9] and fed to the training set. Data pre-processing technique is implemented to check generalizability of the transfer learning models. Data augmentation is done in order to enhance the existing data without adding any new data. This step is useful in classification since having data in good numbers helps in model accuracy. This is done by using a pre-built function in Python. Evaluation and comparison of six different pre-trained models was done through a transfer learning approach for classification of brain tumor images. The models are InceptionV3, VGG16, Xception, Resnet50, EfficienNetB0, and MobileNet. Transfer learning models have pre-trained layers trained on ImageNet dataset, and we added three new layers that is three layers present at last; they are altered to get required classified categories. These pre-trained architectures have a layer called
94 Fig. 1 Methodology used for the experimentation
K. Mundada et al. Downloading and Reading Data
Data Augmentation
Resizing of the images as required for the pretrained model Evaluating the performance of various type of pre trained model
Validation of the models
Calculating overall evaluation metrices
Reporting Results and ϐindings
the SoftMax layer which is responsible for categorizing dataset images into glioma, no tumor, pituitary, and meningioma. After the deep learning framework step, we calculated and evaluated every single model to check performance of various applied transfer learning models in identifying brain tumor types. We split data into sets testing and training to get correct results. We split data like 80% data for training and the rest 20% data for testing. Similar steps are done for all the other architectures of CNN. All the steps required for the learning of the model discussed above are drawn by their order in Fig. 1.
2.2 Transfer Learning-Based Models Pre-trained models used are VGG16, Inceptionv3, Xception, Resnet50, EfficienNetB0, and MobileNet. Detailed description of each model is given below: VGG16 The number 16 in the term VGG corresponds to the fact that it is a 16-layer deep neural network (VGGnet). This implies that VGG16 is a complex network of around
Experimental Analysis of Deep Learning Algorithms Used in Brain …
95
139 million parameters. Even by today’s standards, this is a massive network. The simplicity of a VGGNet16 architecture, on the other hand, is what makes the network popular. InceptionV3 Inceptionv3 has achieved greater than 78% accuracy trained on the ImageNet dataset. It contains 48 layers [10]. It has less computational cost and higher accuracy compared to earlier versions of Inception. Xception Xception is CNN architecture involving convolutions separated depth wise. It is 71 layers deep [10]. In Xception 36, convolutional layers are responsible for feature extraction. Residual connections are attached to each convolutional layer. Modification in structure of Xception model is easily possible [10]. ResNet 50 ResNet50, also called the residual network, has a total 50 layers [10]. This algorithm has a special property that it uses shortcut connections which results in skipping some layers. Due to this skipping property, it compresses the network and that results in rapid speed learning. This network is also pre-trained on the large dataset ImageNet. EfficentnetB0 EfficientNets, as the name implies, are incredibly efficient computationally and obtained excellent performance on the ImageNet dataset of 84.4% top-1 accuracy. MobileNet MobileNet is CNN architecture involving convolutions separated depth wise. It has 28 layers and about 42 lakh parameters [11]; further, those parameters will be reduced by tuning.
3 Results and Discussion 3.1 Dataset MRI dataset of brain [12] is considered in training, testing, and validating different transfer learning DL approaches, aiming to find best suited deep learning architecture. This dataset contains 4 types of MRI images, i.e., meningioma, pituitary, no tumor, and glioma tumor, it has 1645 of meningioma, 1757 of pituitary tumors, 2000 of no tumors, and 1621 of glioma brain tumors in a set of training images. Images taken for testing have 306 meningioma, 300 pituitary tumors, 405 no tumor, and 300 glioma tumors. Then, we merged data from both testing and training. 80% data for training
96
K. Mundada et al.
Fig. 2 Sample of images with label in dataset
and remaining 20% was used for testing. Random sample images in the dataset with all 4 categories are shown in Fig. 2.
3.2 Performance Evaluation The models are evaluated and compared with the help of commonly used parameters as F1-score, accuracy, precision, and recall. Accuracy =
TP + TN TS
Precision =
TP TP + FP
Recall(Sensitivity) = F1 − Score = 2 ×
TP TP + FN
Precision × Recall Precision + Recall
FP = False Positive; FP = False Positive value; TN = True Negative; TP = True Positive; FN = False Negative.
Experimental Analysis of Deep Learning Algorithms Used in Brain …
97
3.3 Results Major advantage of transfer learning-based model is it decreases overfitting issue in the model which is common and frequently occurs in deep learning when working with small dataset having small training and testing images for creating model. All transfer learning models are trained on the same setting. A total of 7023 brain tumor images are used for classification. Table 1 describes various parameter average results of different transfer learning-based CNN algorithms in brain tumor classification. As shown in Table 1, every algorithm in the transfer learning model has achieved competitive results except EfficientnetB0 algorithm. Among all six, VGG16 has produced highest accuracy of 99.09%, and Efficient Net B0 has produced the worst accuracy of 27.79%. Figure 3 describes the history of the VGG16 model. As epochs increase, model accuracy increases, and loss decreases. As shown in Fig. 3, EfficientNetB0 produced less accuracy and high loss. Similarly, Fig. 3 depicts the performance of other models. Tables 2, 3, and 4 show performance classification of each CNN architecture evaluating matric like precision, recall, and f1-score, respectively. InceptionV3, ResNet50 produced decent accuracy as shown in Fig. 3. MobileNet and Xception produced equivalent to VGG16 accuracy, but their precision and recall were less as compared to VGG16 as shown in Tables 2 and 3. In VGG16 following a few convolution layers, there is also a pooling layer that lowers the height and breadth. When it comes to the number of filters that can be used, there are about 64 available, which may also be doubled to roughly 128 then again to 256 filters. Lastly, we can implement about 512 filters. Table 1 Comparison of classification algorithms Model
Accuracy (%)
Precision (%)
Recall (%)
F-Measure (%)
Xception
98.75
94
94
94
ResNet50
84.52
82
83
82
InceptionV3
98
95
95
95
MobileNet
97.50
93.80
93
93
EfficentNetB0
27.79
10
31
15
VGG16
99.09
96
95
96
98
K. Mundada et al.
Fig. 3 Accuracy and loss graphs for a VGG16, b Inceptionv3, c EfficientNetB0, d ResNet50, e Xception, f MobileNet
4 Conclusion Comparative and comprehensive study for six different CNN architectures used in classifying tumors by transfer learning method is presented in the paper. Our experimental results confirm that the VGG16 model outperforms all other models. VGG16 achieves higher values in evaluation of performance and thus found to be the best
Experimental Analysis of Deep Learning Algorithms Used in Brain …
99
Table 2 Comparison for the measure of precision Model
Glioma
Meningioma
No tumor
Pituitary
Xception
0.97
0.87
0.99
0.93
InceptionV3
0.89
0.90
0.99
0.93
Vgg16
0.97
0.86
1
1
EfficientNetB0
0
0
0.31
0
ResNet50
0.76
0.72
0.92
0.86
MobileNet
1
0.89
0.99
0.89
Table 3 Comparison for the measure of recall Model
Glioma
Meningioma
No tumor
Pituitary
Xception
0.85
0.93
1
0.95
InceptionV3
0.94
0.85
0.98
0.95
Vgg16
0.92
0.98
1
0.91
EfficientNetB0
0
0
1
0
ResNet50
0.75
0.64
0.95
0.94
MobileNet
0.86
0.89
1
1
No tumor
Pituitary
Table 4 Comparison for the measure of F1-score Model
Glioma
Meningioma
Xception
0.91
0.90
1
0.95
InceptionV3
0.92
0.87
0.98
0.94
VGG16
0.94
0.91
1
0.95
EfficientNetB0
0
0
0.47
0
ResNet50
0.75
0.67
0.93
0.90
MobileNet
0.92
0.89
0.99
0.94
fitted over other models. Although the VGG16 model is found to have superior architecture, other models except EffecientNet B0 are also performing satisfactorily.
100
K. Mundada et al.
The experimental analysis has few additional findings as noted below: 1. The performance of the pre-trained DL model as a standalone classifier is relatively low. 2. Significant amount of training time has elapsed due to the acquired deep neural network transmission. 3. Due to lesser data for training, overfitting issues are observed in experimentation.
References 1. Kavitha AR, Chitra L, Kanaga R (2016) Brain tumor segmentation using genetic algorithm with SVM classifier. Int J Adv Res Electr Electron Instrum Eng 5:1468–1471 2. Logeswari T, Karnan M (2010) An improved implementation of brain tumor detection using segmentation based on hierarchical self organizing map. Int J Comput Theory Eng 2:591–595 3. Cheng J, Huang W, Cao S, Yang R, Yang W, Yun Z, Feng Q (2015) Enhanced performance of brain tumor classification via tumor region augmentation and partition. PLoS ONE 10:e0140381 4. Swati ZNK, Zhao Q, Kabir M, Ali F, Ali Z, Ahmed S, Lu J (2019) Content-based brain tumor retrieval for MR images using transfer learning. IEEE Access 7:17809–17822 5. Naseer A, Rani M, Naz S, Razzak MI, Imran M, Xu G (2020) Refining Parkinson’s neurological disorder identification through deep transfer learning. Neural Comput Appl 32:839–854 6. Afshar P, Mohammadi A, Plataniotis KN (2018) Brain tumor type classification via capsule networks. In: Proceedings of the 2018 25th ieee international conference on image processing (ICIP). Athens, 7–10 Oct 2018, pp 3129–3133 7. Seetha J, Raja SS (2018) Brain tumor classification using convolutional neural networks. Biomed Pharmacol J 11:1457–1461 8. Kesav N, Jibukumar MG (2021) Efficient and low complex architecture for detection and classification of Brain Tumor using RCNN with Two Channel CNN. J King Saud Univ Comput Inf Sci 33:1–14 9. Sartaj B, Ankita K, Prajakta B, Sameer D, Swati K (2020) Brain tumor classification (MRI). Kaggle 10. Ullah N et al. (2022) An effective approach to detect and identify brain tumors using transfer learning. Appl Sci 12(11):5645 11. Lu S-Y, Wang S-H, Zhang Y-D (2020) A classification method for brain MRI via MobileNet and feedforward network with random weights. Pattern Recogn Lett 140:252–260 12. Badran EF, Mahmoud EG, Hamdy N (2010) An algorithm for detecting brain tumors in MRI images. In: Proceedings of the 2010 international conference on computer engineering and systems. Cairo, 30 Nov 2010, pp 368–373
Optimized GrabCut Algorithm in Medical Image Analyses Mária Ždímalová and Kristína Boratková
Abstract In this contribution, we study discrete and graph methods of image segmentation. We mainly focus on an image segmentation method called “Grabcut,” which we also implement in the C++ programming language. We do and illustrate medical, biological, and even technical applications. One of the applications is analyses of medical scans of brain tumors called “glioblastomas” and other medical and engineering applications. The goal of this paper is to get acquainted results with the use of image segmentation for medical, biological, and possibly engineering image processing and analysis. The aim was to create a new software and improve this algorithm for a specific type of medical, biological, and technical images with a focus on the sharp boundaries of objects. In addition to this, we are looking into cluster analysis and graph theory, on which the GrabCut method is based. The result is a new software with an improved algorithm and its optimization as well as many medical applications. We improved segmentation of the brain, especially focused on getting a better boundary of the brain tumor. The results are segmented brain tumors with better and sharper boundaries. There is still space to extend this optimized method to other medical applications, e.g., breast tumors. Keywords GrabCut · GraphCut · Image segmentation · Gaussian mixture models · Cluster analysis
1 Introduction In computer vision, image segmentation is the process of dividing a digital image into multiple segments (sets of pixels, also called “super pixels”). The objective of segmentation is to simplify or modify the representation of an image into something that is more significant and easier to analyze. Image segmentation is normally M. Ždímalová (B) · K. Boratková Slovak University of Technology in Bratislava, Radlinského 11, 81005 Bratislava, Slovakia e-mail: [email protected] K. Boratková e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_9
101
102
M. Ždímalová and K. Boratková
used to trace objects and boundaries (lines, dots, curves, etc.) that can occur in images. Specifically, image segmentation is the process of allocating a label that shares some pictorial characteristics. The outcome of image segmentation is a set of segments or regions that together represent the entire object. Every pixel in a region is similar to each other with respect to certain characteristics such as color, texture, and intensity. Neighboring areas of regions are significantly different with respect to the same characteristics. Numerous practical applications of image segmentation are medical images, for example, finding tumors and other pathological tissues, computer-guided surgery, treatment planning, optical character recognition (OCR), the study of anatomical structures, i.e., locating objects in satellite images (roads, rivers, forests, etc.), face recognition, trac control systems, fingerprint recognition, brake light detection, agricultural imagining-crop disease detection, machine vision. Some general-purpose algorithms and techniques have been developed for image segmentation. Since there is no common solution to the image segmentation problem, these techniques have to be combined with sphere knowledge toward effective solution of an image segmentation problem for a problem domain. The methods can be classified into three main categories: traditional methods, graph theoretical methods, combinations of both traditional and graph theoretical methods. In the following section, we give a brief overview of the basic methods used in image processing. In applications included in this thesis, we focus on methods based only on graph theoretical approaches image segmentation methods. Image segmentation is a very broad issue in digital image processing. Its principle is to divide a digital image into segments based on some criteria such as color, texture, or intensity. The most common goal of image segmentation is to separate the object from the background. In addition to standard computer graphics, image segmentation also has practical use, for example, in the processing of biological and medical images or satellite images. In particular, in medical image segmentation, medical images do not contain sharp edges. Therefore, segmentation of these images is still challenging. In the introduction, we define basic methods for segmentation techniques. In Sect. 2, we give an overview of the segmentation methods and our methodology. In Sect. 3, we present the GrabCut method, and we show our own implementation and optimization. Finally, in Sect. 4, we discuss medical applications. In the conclusion, we summarize our approach and discuss open questions. The practical results of the proposed algorithm are encouraging.
2 Methodology and Literature Review of the Segmentation Methods A large number of different types have emerged in recent decades and specific methods of image segmentation. In this chapter, we will go over some of the most important and widely used ones. We will illustrate many different traditional methods. A survey of different techniques is in Basavapraad and Ravindra [1]. In
Optimized GrabCut Algorithm in Medical Image Analyses
103
[2] Basavaprasad et al. are methods more oriented on discrete mathematics and graph theory; interactive image segmentation is presented in [3] by Saban, in [4], by Ravindra et al. and in [5] by Brown et al., Cuang et al. [6] discuss Bayesian approach to digital matting and by Kass and all. [7, 8] are discussed active contour models. Gross and Yelen presented basic knowledge of graph theory needed for image processing. Brain tumor analyses are in [9] by Sharma et al. others techniques as using of “k” means algorithm also is in [10] by Saraswat et al. Jeevirha and Priya [11] discuss ultrasound images; automated segmentation techniques for detection of disease are in [12]. Modern techniques such as data mining, deep learning, and using it in medicine are visible in Ghosh and Roy [12, 13]. Deep learning techniques are especially presented in [14] by Mahdi and Kobashi. Gradient techniques and fuzzy approach to image analyses was found in [15] by Varhan et al. and fuzzy in [16] by Haifficker and Tizoosh. Orchard and Bouman [17] present color quantization of data images is explained in detail. Khan and Ravi [18] bring a comparative study on image segmentation techniques. Photoshop [19] is an open software available free online on the Internet as Adobe Photoshop. A review on shortest path algorithms written by the authors Magzhan and Jani [20]. Some methods we discuss in details: Thresholding, also known by the English name “thresholding,” is the simplest image processing method based on the conversion of a black and white image into a binary image. The whole algorithm consists in selecting some intensity in the range; then, we divide all pixels into two groups according to whether their intensities are smaller or greater than/equal to the selected intensity. The result is used to visualize the display of pixels with a smaller value in black and pixels with a larger value in white see [1] Basavaprasad and Ravindra, and Mortensten and Barett [21]. Cluster methods, when segmenting the image, it is customary to use mainly center-oriented cluster methods; they consist in the fact that we have given centers and the elements are assigned to the clusters. In the case of image segmentation, the elements are pixels, and the centers are defined as intensities or RGB vectors. The elements are then assigned on the basis of color (not physical) distance. “Fuzzy” cluster methods: In classical cluster methods, we talk about the so-called hard segmentation, which means that we always assign elements to only one cluster for non-hierarchical methods or to multiple clusters that are subsets of hierarchical clustering. Thus, in classical cluster methods, the intersection of two clusters is either an empty set or an entire one of the clusters. In “fuzzy” cluster methods, we speak of the so-called soft segmentation. This means that elements can belong to multiple clusters at the same time. Clusters in this type of method are therefore not disjunct, and their intersection can be any. Histogram methods: This is a type of cluster method, where we initially calculate a histogram of all pixel intensity. From the histogram, then, we can select the intensities to divide the pixels to clusters, see [1] Basavaprasad and Ravindra, and Mortensten and Barett [21]. “Region-growing” methods: These methods work to enlarge the selected area based on some similarity criterion of the surrounding pixels. The principle of the method is to select the pixel inside the area we want to segment from the rest of the image, see Khan and Ravi [18]. Magic Wand: An image segmentation tool used in Adobe Photoshop. Its principle is that at the beginning the user selects one pixel (or more
104
M. Ždímalová and K. Boratková
pixels), and based on its color, other pixels that are similar with a certain tolerance are determined by user choice, see Khan and Ravi [18]. Edge detectors: Edge detectors are used to find the boundary of an object; their principle is searching adjacent pixels in the image that have too different an intensity color. Graph methods: Very widespread methods include methods based on image representation using graphs and various methods from graph theory. Ford and Fulkerson [22] discuss maximal flow in the network. Interactive graph cuts discuss Boykov and Jolly [23]. Boykov and Jolly [24] present a talk about graphs and region segmentation in computer vision. Rother [25] et al. implement discrete energy minimization in vision. Methods based on finding the shortest path: The principle of this algorithm is to search for the boundary of the object in the image represented graph using the issue of the shortest paths. We consider the shortest path between two points in the graph to be the one whose sum of weights on arrows is the smallest. The input data are one or more points on the object boundary, between which the algorithm will search for the shortest paths, e.g., the Dijkstra algorithm. Intelligent scissors: works on the principle of selecting several points (anchors) on the boundary between the object and the background, the algorithm then passes from one anchor to another looking for the shortest path between them, i.e., the boundary of the object. The result is the area inside which the object is located. Ford and Fulkerson [22] describe the search for maximal flow in a network. The first mention about application of the maximal flow in the network in graph cut is in [22]. Its connection into image analyses is possible to find in [23] by Boykova and Jolly. In [24], Boykov discussed min-cut and max-flow algorithms for energy minimization in vision. Iterated version of graph cut is presented in [25] by Rother et al. Methods based on finding the maximum flow and small section. With these methods, we convert the pixels of the image to the network (graph) in which we try to find maximum flow and minimum cut. Each pixel represents one point of the graph, and in addition, we will add two more points (source) and (mouth). Capacity on the edges we calculate based on the intensity/color of each pixel and find the maximum flux using some algorithm (for example, the Ford–Fulkerson algorithm). After finding maximum flow, we find out which pixels are still reachable from the source and the minimum we separate them by slice, which gives a segmented object. Graph Cut: An image segmentation method based on finding the maximum flow and the minimum section in the image represented by the graph. Its principle is to add two more points to a graph, which we call the source and the mouth. We connect these points with all the pixels on the slide. The capacity at the edge between the pixel and the source will increase in proportion to the intensity pixels, and vice versa, the capacity at the edge between the pixel and the mouth will decrease. The capacities at the edges between pixels will increase as the pixels become more similar. Subsequently, an algorithm is performed to find the maximum flow from the source to the mouth and find the minimum section. We get the resulting object (see Fig. 1). Mixed methods: techniques that combine traditional and graph
Optimized GrabCut Algorithm in Medical Image Analyses
105
Fig. 1 Minimal cut segmentation [22–24]
methods to improve segmentation results. GrabCut: It is an image segmentation method based on initial clustering using Gaussian blend models and a modified GraphCut method, see Bassavaprasad and Ravindra [1] and Rother et al. [25] Figure 2 shows a comparison of the segmentation results of some of the known image segmentation methods mentioned above. Cluster analysis: It deals with the issue of dividing data into finite numbers clusters. Clusters are subsets of a whole set of data where one cluster belongs to the elements of the set, which are in some way close, and by unifying all the clusters, we get again the whole original set of elements. It follows that everyone must belong to (at least) one cluster; Rother et al. [25] discuss GrabCut. Xu and Tian [26] give a survey of clustering algorithms. There are various clustering methods, which are primarily divided into hierarchical and nonhierarchical. Hierarchical clustering: For fclusters in hierarchical clustering, the intersection of two clusters is either an empty set or all one of a pair of clusters. Nonhierarchical clustering: In this type of clustering, we try to divide the data into a finite number of disjoint clusters. Thus, the intersection of two clusters will always be an empty set. It must be true again that by uniting all clusters we get (except for exceptions) the whole set. Center-based clustering. For this type of clustering, often called k-means, we have a final one where we assign the number of centers and elements based on which center they are closest to. The term center in image segmentation usually does not mean coordinates in Euclidean space, but, for example, the color intensity or the vector or the color vector in some other color space. If we look again at the threshold so method in the middle, e.g., in, so similar results we could achieve with a simple a two-element model, where we could consider, for example, the intensity of white as the midpoints color and black intensity. Then, we would assign the pixel to white or black cluster, depending on which of the centers is its intensity closer, i.e., by calculating the function: in. Although this method seems to be relatively simple, its optimization tends to be a problem. Finding a finite number of suitable centers may not be trivial, but there are algorithms for
106
M. Ždímalová and K. Boratková
Fig. 2 a Magic wand, b intelligent scissors, c GraphCut, d GrabCut
estimating a suitable model. The most common is Lloyd’s k-means algorithm, which in some cases can be performed from completely randomly selected centers through iterations. Its principle is to repeat two steps. The first is to assign elements to the currently selected centers. The second is to calculate new centers as the mean value of the elements of individual clusters. These steps are repeated until the algorithm converges, i.e., until the model stops changing significantly for further iterations. However, in some cases, a completely random selection of centers may not give ideal results, or the algorithm needs many iterations to get the right result. For this reason, there are also algorithms for selecting the starting centers. These include, for example, an algorithm called “k-means ++ .” There are also other types of clustering: distribution-oriented clustering and density-based clustering. Mixed Models: Are probabilistic models that divide the data into finite ones the number of clusters based on the probability of belonging to one data point to a given cluster, based on the assumption that the data come from the final one number of probability distributions, by Lindsay [27]. Huang and Chau [28], Reynolds [29] they all discuss details about mixture models and their applications. Thus, the density function of the whole model will be for each point x of the set M on which we created a mixed model with K components, defined as where is a function of cluster density and is the ratio of the size of the cluster to the whole model, where applicable: Gaussian mixed models: In the Gaussian mixed model (hereinafter referred to as), we consider Gaussian clusters (normal) distributions with the mean and variance. Each contains a finite number of K clusters, which can be of different sizes. This type of mixed models is also used in image segmentation, such as the GrabCut method. Justin et al. [30] explain a good way for implementation of the GrabCut algorithm. Cluster parameters for: • - mean value of cluster k, number (vector-for multidimensional data) • - cluster scattering k (Σ(k)-covariance matrix-for multidimensional data) • - cluster weight constant k (sum of weight constants of all clusters is 1).
Optimized GrabCut Algorithm in Medical Image Analyses
107
GMM Clustering: Cluster parameters k are also used to be written with a single symbol k for simplicity entries: k = {π (k), μ(k), σ 2 (k) or (k), k = 1 . . . K }, where K is the number of all clusters. For each cluster i and element x, we can then calculate a multidimensional function density of normal distribution in the following form: −1 1 (−1/2[x−μ(i)]T (i) [x−μ(i)] , φ(x, μ(i), (i)) = e n√ (2π ) 2
det
(i)
where x is the n-dimensional vector and n is the dimension of the elements. For example, in color segmentation of an image, x can symbolize the RGB components of the vector one pixel of the image or pixel intensity in black and white segmentation. The element x then belongs to the cluster formed by the normal distribution, for which the previous function acquires the largest value. Estimation of GMM parameters In practice, for example, in image segmentation, cluster parameters are usually not known and cannot be calculated trivially. Suitable parameters are therefore used to obtain them by estimation and subsequent optimization. One of the most common methods for estimation of GMM parameters in image segmentation is center-oriented clustering data using some k-means algorithm. We can then calculate the mean value, covariance matrix, and weight constant for clusters estimated using the k-means method using the values of the elements assigned to the cluster using the k-means method. Subsequently, we can assign elements to clusters based on probability. To initialize and re-evaluate the GMM, it is often used to use a statistical method called “ExpectationMaximization” to find the most plausible estimate. In this method, the user initially selects K distributions with random parameters and then iteratively optimizes them.
3 GrabCut Algorithm Image Segmentation Using GrabCut: GrabCut is an image segmentation method based on Gaussian mixture models and modified GraphCut. It is an iterative method where in each step we optimize the Gaussian mixture models and find the maximum flow and minimal cut. The algorithm stops once the maximum flow does not change more than a given tolerance, which means that the result of segmentation did not change much between the last two iterations. In GrabCut, we use the RGB vectors of pixels to approximate Gaussian mixture models and calculate the similarity between two pixels. This is used to calculate the capacities of N -links and T -links of unknown pixels. We can also choose between creating N-links from pixels only with their 4 nearest neighbors or with 8, which usually gives better results. The Euclidean distance between connected pixels also needs to be taken into consideration when computing N-links. It is done by dividing the capacity by the distance.
108
M. Ždímalová and K. Boratková
User Input: The first step of the method is user input to give the software some information about the image. The main kind of input this method needs is to give an approximate location of the object we want to segment out. This is usually done by drawing a rectangle around it. Another type of input is declaring seed pixels. This type of input is not required, but it positively affects the result of segmentation. It can also be done after the segmentation, followed by one more iteration of the algorithm. Binary Variable: Pixels are given a value of either 0, if it is a background seed or 1, if it is a foreground seed. This is used for computing the capacities of T-links to know which pixels are seed pixels. Trimap: After the user input, the pixels are divided into three groups. Everything outside the rectangle and background seed pixels inside the rectangle are considered background, foreground seed pixels inside the rectangle are considered foreground, and all pixels inside the rectangle that were not marked as seed pixels are considered unknown. Gaussian Mixture Models in GrabCut: We create two Gaussian mixture models, one for the background pixels and one for the foreground and unknown pixels. The Gaussian mixture models are approximated using the RGB vectors of pixels and optimized in every iteration. Usually, both models have 5 components, which is an optimal number of components to get good enough results in a relatively short time, but the number of components can be optional. Each component has its own characteristics: μ-mean RGB value; π -weighting coefficient; -covariance matrix (3x3). In every step of the algorithm, we optimize the Gaussian mixture models, calculate new capacities on T-links and perform GraphCut on the resulting graph. The algorithm stops once the maximal flow in two consecutive iterations does not change more than a given tolerance. In case the algorithm does not converge, it is required to also give a maximum number of iterations. Segmentation adjustment: On some occasions, the amount of user input is not sufficient. In this case, additional seed pixels must be declared in the areas where the object was not segmented properly. After this, we can either do one more iteration of the algorithm or start the whole algorithm again with the newly defined seed pixels if we prefer better results over short computing time. The Value of the Binary Variable α: Each pixel p acquires the function α either the value αp = 1, if p belongs to an object or αp = 0 if p belongs to the background. At initialization, α for a set of unknown pixels will be inside the quadrangle to 1 and for background pixels outside the quadrilateral value 0. If we define a pixel p inside when manually adjusting the segmentation rectangle as background, the value of αp also changes to 0. Gaussian mixed models in GrabCut and estimation of Gaussian mixture models: After we split the segmented image into subsets of pixels, we create two Gaussian mixed models, one on a set of TB background pixels and the other on a set of the set of unknown pixels TU or the set of pixels of the object TF. Each of these models has a standard K = 5 components (clusters). The number 5 is referred to in the articles as the optimal number of clusters, but their number can be larger or smaller, which will affect the resulting segmentation. At initialization, we must estimate the Gaussian mixture models since their parameters we do not know in advance, we only know their number. An efficient way of estimating GMM’s patent is to find K clusters in the background and K clusters in a set of unknown’s pixels using some
Optimized GrabCut Algorithm in Medical Image Analyses
109
simple cluster method. From this, however, we only get information about which pixels belong to each of the clusters, the other parameters of these estimated GMM components; however, we can calculate the following indexes from the assigned pixels: Weighting Constant of Component π(k): We choose the weight constant of the cluster k at the beginning as the number of pixels that are assigned by the number of all pixels that make up the GMM of which it is a cluster part of. Mean Value of μ(k) : The mean value of the component k in the first iteration shall be the mean value of the RGB pixels that we estimated to belong to this component. RGB vector that will represent the mean value of the component, so we calculate as: μ(k).R = 1/n.
n i=1
n n 1 1 pi · R, μ(k).G = . pi · G, μ(k).B = . pi · B, n i=1 n i=1
where n is the number of pixels that the component contains and pi is the pixels assigned to a given cluster. Covariance Matrix (k): In general, the covariance matrix of three-component elements Pi = (X i , Yi , Z i ) is sets M symmetric matrix size 3 × 3 of the following form: ⎤ ⎡ Var(X ) Cov(X Y ) Cov(X Z ) ⎣ Cov(X Y ) Var(Y ) Cov(Y Z ) ⎦, Cov(X Z ) Cov(Y Z ) Var(Z ) where Var(X ) is the variance of the component X and Cov(X Y ) is the covariance of the components X and Y , for X = Y. We calculate the covariance of the components as: n (X i − μ(X )) ∗ (Yi − μ(Y )) Cov(XY) = i=1 , n where n is the number of elements of the set M and μ(X ) and μ(Y ) are the mean values of the components X and Y . We calculate the variance of one component as: n Var(X ) = Cov(XX) =
i=1 (X i
− μ(X ))2 , n
where n is the number of elements of the set M and μ(X ) is the mean value of the component X . The covariance matrix of the cluster k will then be the covariance matrix of the vector components RGB elements that we have assigned to the cluster. So, it will be a matrix of the following form:
110
M. Ždímalová and K. Boratková
⎤ Var(R) Cov(RG) Cov(RG) ⎣ Cov(RG) Var(G) Cov(G B) ⎦. Cov(R B) Cov(G B) Var(B) ⎡
Graph and Capacities: Once the characteristics of Gaussian mixture models’ components are known, we can create the graph and calculate the capacities between nodes. Two additional nodes called source and sink are added, which represent the object and background. After we find the maximal flow, all pixels that are still reachable from the source are considered to be a part of the object. Capacity of N −links: After obtaining the parameters of both GMMs, we can create a graph. Capacity on the N -line between pixels p and q, we calculate as: N ( p, q) = δ α p , αq
γ 2 e−β|| p−q|| dist( p, q)
Capacity of T -Lines: We connect each pixel P to the source and the mouth using a T -line, with the capacity c on these arrows will be calculated as follows: If P ∈ T B : S(P) · c = 0, T (P) · c = λ. If P ∈ T F : S(P) · c = λ, T (P) · c = 0, where λ in equations and is the largest of the capacities of all N -links, thus ensuring that these T -lines will not be cut
off. K If P ∈ T U : S(P) · c = − (i) , where i = 1, . . . , K i=1 π (i)φ P, μ(i),
K are clusters GMM of background T (P) · c = − (i) , i=1 π (i)φ P, μ(i), where i = 1, . . . , K are clusters GMM of background. GrabCut Algorithm GrabCut is an image segmentation method based on graph cuts. Starting with a user-specified bounding box around the object to be segmented, the algorithm estimates the color distribution of the target object and that of the background using a Gaussian mixture model. The whole GrabCut algorithm consists of repeating the steps until the algorithm converges for maximum flow. If in two consecutive iterations, the maximum flow has not changed, or has changed only by less than the value of the pre-specified tolerance, the algorithm is terminated, and a minimum cut is made for the maximum flow from the last iteration. It is also advisable to select the maximum number of iterations in case the algorithm does not converge. The algorithm is described in more detail in the following pseudocode in algorithm 1 and also with flowchart (see Fig. 3):
Optimized GrabCut Algorithm in Medical Image Analyses
111
Algorithm 1: GrabCut algorithm Input: MRI image Output: segmented image Initialize: maxFlow to -1, iteration to 0, maxIteration to 10, tolerance to 10−3, GMMs while iteration ≤ maxIteration do | learn new GMMs | create graph | find the maximum flow in current graph as flow | if maxFlow ≠-1 and Abs (flow - maxFlow) < tolerance then | | find minimal cut in the graph | | break | else | | set maxFlow to flow | | iteration=iteration+1 | end end
Implementing of GrabCut: We implemented a GrabCut application in the C + + programming language and created a graphic user interface for easier manipulation of the images. We have created classes for easier manipulation with -links and -links, components and pixels. We use default values for and the number of components but allow the user to change it before the start of the algorithm if needed. We use -means and k-means ++ to initiate the GMMs. This way, we get well separated clusters to represent the image. Image Cropping: Since the method is based on distinguishing the object inside the rectangle from the background, it is important to choose a good representation of the background to compare the inside of the rectangle with. The image outside the rectangle might contain elements that are not a good representation of the background. For example, it might contain another object, or the background
Fig. 3 Flowchart of GrabCut method
112
M. Ždímalová and K. Boratková
differs in color in different areas. Because of this, we only took pixels close to the border of the rectangle that was input by the user. We expect that the area close to the rectangle is the most similar to the potential background pixels inside the rectangle. We only used these pixels to model the background. This modification improved the final segmentation as well as the computational time since the computations were done on fewer pixels. Novelty in the Approach 1. Editing segmentation: If some parts inside the rectangle were not segmented correctly, the program allows us to mark some pixels in this area with the mouse and firmly select them to belong to the background or object, which will later affect the new segmentation of their neighboring pixels. After these adjustments, one more iteration of Algorithm 1 is run, or run the whole process from the very beginning if we prefer better accuracy over calculation time. 2. To implement the GrabCut method, we used the Qt library for creating programs with a graphical user interface. Our program is implemented in the language C++ using only the standard C++ libraries and some classes of the Qt library. For example, we use the QImage and QColor classes to store and access images more easily and RGB by the pixel vector. As part of the program, we also created classes for storing and working with T-lines, N-lines, clusters, and pixels. When calculating the capacities of N-links, we omit the function that occurs in the original article and instead calculate a non-zero capacity even between pixels with a different value of as described as N ( p, q) = γ dist( p, q)e − β|| p − q||2 . We also have the option to choose the constant γ and the number of clusters, while at the startup program, these values are set to 50 and 5, which the authors of this method tests determined as optimal values. We use center-oriented clustering with centers to initialize the clusters chosen using the k-means ++ algorithm, which is computationally relatively undemanding, but nevertheless it worked well for our studied type of images. Initial editing of images: Since the GrabCut algorithm is based on distinguishing the object from the background, it is good to realize what the object is and the background in our case. X-ray images of the brain, p with which we worked contained, in addition to the brain, other parts such as the skull or the background of the image, which could have a negative impact on the segmentation result. We cover the background only with healthy brain tissue and glioblastoma for the object; the other parts of the image are irrelevant and distracting. For this reason, before segmentation, we cropped the images so that we are left on the image with only the area containing the glioblastoma plus some part of the healthy one brain to compare color. Here, some authors recommend working only with that part of the background, which is located near the quadrangle, since intuitively it is in this area that finds the part of the background most similar to the background we want to segment from inside the quadrilateral. With images cropped in this way, we have already achieved fairly good results even without additional segmentation adjustments. Without cropping the frames, the program had a problem in most cases correctly segmenting the object, which was most likely a consequence of the fact that the tumor tissue and the skull have a similar color on MRI images. For clustering in our program were used two GMMs that were initialized using the k-means ++ algorithm. Each pixel
Optimized GrabCut Algorithm in Medical Image Analyses
113
of the image is represented by the median values of the cluster to which it most likely belongs. How can we see, the clusters formed inside the quadrilateral already describe our object well by themselves. Our program also includes the option to take a picture after the end of the segmentation.
4 Experiments and Results We tested with our new and improved software the improved GrabCut algorithm on the medical data. Specifically, we focused on the following segmentation: Glioblastoma multiforme, renal cyst, tumor analysis. This original data come from Wikimedia Commons and are available for free use under the Wikimedia Commons license, see Figs. 5, 6, 7, and 8. The segmented and enhanced data are provided by our own improved software of the improved and optimized algorithm, see Figs. 4, 5, 6, and 7. On Figs. 4, 5, 6, and 7 are presenting the original images, their cutted objects, segmented tumors and cysts, and finally just tumors as objects. We focused on the problem of sharply detecting the boundaries of the segmented objects. It is specifically a problem where tumors are well visible by free watching by the eyes, but their exact detection for surgery (for brain, breast, and other types of tumors or cysts) is difficult and hard to provide.
Fig. 4 Glioblastoma multiforme
Fig. 5 Detection of renal cyst
114
M. Ždímalová and K. Boratková
Fig. 6 Detection of the tumor Glioblastoma multiforme
Fig. 7 Analysis of the brain
Fig. 8 Brain tumor on MRI data
After the segmentation of the image, the result can be displayed in three different types of displays (see Figs. 7, 8), which will help the user evaluate the success of the segmentation. The first type highlights the tumor in color on the original image; the remaining two types display only a segmented part of the image on a black or transparent background. Original source data comes from https://medpix.nlm. nih.gov. On Fig. 8 on the left there are original data and on the segmented version is given segmentation by our own program using GrabCut algorithm.
Optimized GrabCut Algorithm in Medical Image Analyses
115
5 Conclusions We created a new software for dealing with special requests for medical data. We implemented and renewed a GrabCut algorithm for getting better segmentation of objects with sharper edges and boundaries. We discussed the possibility of segmentation in medical applications for the segmentation of different types of tumors and cysts. We improved it, optimized it, and also implemented it. We can clearly see that the improved algorithm gets better results in the following sense: more precise boundaries of segmented objects and more sharply defined boundaries. This is important in the sense of detecting objects, tumors, and sharp boundaries of tumors. In the future, we would like to consider engineering applications mainly for detecting crashes in materials and buildings. Another step would be 3D and 4D segmentation of the tumors and cysts. This would lead to better diagnostics for medical purposes. Acknowledgements This work was supported by the Project 1/0006/19 of the Grant Agency of the Ministry of Education and Slovak Academy of Sciences (VEGA).
References 1. Basavaprasad B, Ravindra SH (2014) A survey on traditional and graph theoretical techniques for image segmentation. Int J Comput Appl 975:8887 2. Basavaprasad B, Hegadi RS (2012) Graph theoretical approaches for image segmentation. Aviskar–Solapur Univ Res J 2:7–13 3. Saban MAE, Manjunath BS (2004) Interactive segmentation using curve evolution and relevance feedback. In: 2004 International conference on image processing, ICIP’04, vol 4. IEEE, pp 2725–2728 4. Hegadi RS, Goudannavar BA (2011) Interactive segmentation of medical images using grabcut. Int J Mach Intell 3(3):168–171 5. Brown M, Perez P, Torr P (2004) Interactive image segmentation using an adaptive GMMRF model. Comput Vis–ECCV 3021:428–441 6. Chuang Y, Curless B, Salesin D, Szeliski RA (2001) Bayesian approaches to digital matting. In: IEEE conference on computer vision and pattern recognition, vol 2, pp 264–271 7. Kass M, Witkin A, Terzopoulos D (1987) Snakes: active contour models. In: Proceedings of the IEEE international conference on computer vision, pp 259–268 8. Gross JL, Yellen J (2005) Graph theory and its applications, 2nd edn. Chapman and Hall 9. Sharma P, Goyal D, Tiwari N (2022) Brain tumor analysis and reconstruction using machine learning. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 114. Springer, Singapore, pp 381–394 10. Vignesh K, Nagaraj P, Muneeswaran V, Selva Birunda S, Ishwarya Lakshmi S, Aishwarya R (2022) A framework for analyzing crime dataset in R using unsupervised optimized Kmeans clustering technique. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 114. Springer, Singapore. https://doi.org/10.1007/978-981-16-9416-5_43 11. Jeevitha S, Priya N (2022) Optimized segmentation technique for detecting PCOS in ultrasound images. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 114. Springer, Singapore
116
M. Ždímalová and K. Boratková
12. Ghosh A, Roy P (2022) A leaf image-based automated disease detection model. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 114. Springer, Singapore. https://doi.org/10.1007/978-981-16-9416-5_63 13. Bhushan S (2022) Application of data mining and temporal data mining techniques: a case study of medicine classification. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 111. Springer, Singapore 14. Mahdi FP, Kobashi S (2021) A deep learning technique for automatic teeth recognition in dental panoramic X-ray images using modified palmer notation system. In: Sharma H, Saraswat M, Kumar S, Bansal JC (eds) Intelligent learning for computer vision. CIS 2020. Lecture notes on data engineering and communications technologies, vol 61. Springer, Singapore 15. Vardhan Rao MA, Mukherjee D, Savitha S (2022) Implementation of morphological gradient algorithm for edge detection. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 114. Springer, Singapore 16. Hauffircker H, Tizoosh HR (2000) Fuzzy image processing. In: Computer vision and applications. Elsevier, pp 541–576 17. Orchard MT, Bouman CA (1991) Color quantization of images. IEEE Trans Signal Process 39(12):2677–2690 18. Khan AM, Ravi S (2013) Image segmentation methods: a comparative study 19. Photoshop A (2021) Adobe Photoshop. Preuzeto 29 20. Magzhan K, Jani HM (2013) A review and evaluations of shortest path algorithms. Int J Sci Technol Res 2(6):99–104 21. Mortensen EN, Barrett WA (1995) Intelligent scissors for image composition. In: Proceedings of the 22nd annual conference on computer graphics and interactive techniques, pp 191–198 22. Ford LR, Fulkerson DR (1956) Maximal flow through a network. Can J Math 8:399–404 23. Boykov YY, Jolly M-P (2011) Interactive graph cuts for optimal boundary and region segmentation of objects in ND images. In: Proceedings eighth IEEE international conference on computer vision, ICCV 2001, vol 1. IEEE, pp 105–112 24. Boykov Y (2004) Min-cut and max-flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell 26(9):1124–1137 25. Rother C, Kolmogorov V, Blake A (2004) ‘GrabCut’ interactive foreground extraction using iterated graph cuts. ACM Trans Graph (TOG) 23(3):309–314 26. Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165– 193 27. Lindsay BG (1995) Mixture models: theory, geometry, and applications. IMS 28. Huang Z-K, Chau K-W (2008) A new image thresholding method based on Gaussian mixture model. Appl Math Comput 205(2):899–907 29. Reynolds DA (2004) Gaussian mixture models. In: Encyclopedia of biometrics, vol 741, pp 659–663. Processing, vol 4, pp 2725–2728 30. Talbot JF, Xu X (2006) Implementing GrabCut. Brigham Young University
Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features Aniruddha Mohanty , Ravindranath C. Cherukuri, and Alok Ranjan Prusty
Abstract Speech emotion recognition (SER) is a dynamic area of research which includes features extraction, classification and adaptation of speech emotion dataset. There are many applications where human emotions play a vital role for giving smart solutions. Some of these applications are vehicle communications, classification of satisfied and unsatisfied customers in call centers, in-car board system based on information on drivers’ mental state, human-computer interaction system and others. In this contribution, an improved emotion recognition technique has been proposed with Deep Convolutional Neural Network (DCNN) by using both speech spectral and prosodic features to classify seven human emotions—anger, disgust, fear, happiness, neutral, sadness and surprise. The proposed idea is implemented on different datasets such as RAVDESS, SAVEE, TESS and CREMA-D with accuracy of 96.54%, 92.38%, 99.42% and 87.90%, respectively, and compared with other predefined machine learning and deep learning methods. To test the real-time accuracy of the model, it has been implemented on the combined datasets with accuracy of 90.27%. This research can be useful for development of smart applications in mobile devices, household robots and online learning management system. Keywords Emotion recognition · Speech features · Speech dataset · Data augmentation · Deep convolutional neural network
A. Mohanty (B) · R. C. Cherukuri CHRIST (Deemed to be University), Bangalore, Karnataka, India e-mail: [email protected] R. C. Cherukuri e-mail: [email protected] A. R. Prusty DGT, RDSDE, NSTI(W), Kolkata, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_10
117
118
A. Mohanty et al.
1 Introduction Emotions are the short-lived feeling which come from a notorious reason. It is observed as both mental and physiological state with the way of speaking, gestures, facial expressions, etc. Speech can convey human emotions such as anger, disgust, fear, happiness, neutral, sadness and surprise. Same utterance can be observed based on different sounds and emotions. Speech emotion recognition (SER) recognizes the emotional aspect of speech irrespective of the linguistic content which is more vital than studying the human emotions. As per Ancilin and Milton [1], the speech features are segregated into prosodic and acoustic features. System features represent the response from the vocal tract. Das and Mahadeva [2] illustrated that source features represent periodicity, smoothed spectrum information and shape of the glottal signal. Due to the presence of many acoustic conditions like compressed speech, noisy speech, silent portion of the speech, telephonic conversation and so on, the existing SER approaches require more investigations with different acoustic conditional environments like intonation, stress and rhythm. Now-a-days due to availability of quality and variety of datasets, deep learning is the typical technique used in speech emotion recognition that helps a lot in the investigation. In this paper, an SER model has been proposed comprising of four different stages [3]. The first stage is preprocessing which includes pre-emphasis, framing, windowing, voice activity detection, etc. The second stage is features extraction in which both spectral and prosodic features are considered. In third stage, feature selection and dimension reduction have been used. In final stage, deep learning concept that is DCNN have been applied in order to measure the performance of the model with various datasets like RAVDESS, SAVEE, TESS and CREMA-D. Further, the performance of the model is being evaluated in a realistic scenario by combining all the datasets. The paper is organized in different sections. Section 2 is the Related Work which describes the existing emotion recognition techniques. Section 3 is the System Model that gives insight about the proposed idea and is like the black box representation. Section 4 describes the Proposed Method which deals with relevant features and features extractions, feature selection and the proposed procedure. Section 5 describes the Experiments and the Result analysis which shows the details about the used software, current work and their discussions. The summary and concluding remarks of the projected work are provided in Sect. 6.
2 Related Work Abundant SER studies have been conducted over the years on emotion features, dimension reduction and classifications. Some other speech emotion recognition solutions are also proposed in the recent years.
Improvement of Speech Emotion Recognition …
119
Wang et al. [4] proposed a dual model where mel-frequency cepstral coefficient (MFCC) feature is processed using Long-Short Term Memory (LSTM) and MelSpectrograms is processed using Dual-Sequence LSTM (DS-LSTM) simultaneously. Issa et al. [5] proposed one-dimensional Convolutional Neural Network on the speech features like MFCC, chromagram, mel-scale spectrogram, Tonnetz representation and spectral contrast features from speech files. Christy et al. [6] evaluated SER with the help of multiple machine learning algorithms like linear regression, decision tree, random forest, SVM and CNN. MFCC and modulation spectral (MS) are taken in the implementation. Again, Pawar and Kokate [7] proposed one or more pairs of convolutions and max-pooling layers of CNN and implemented on speech features like Pitch and Energy, MFCC and Mel Energy Spectrum Dynamic Coefficients (MEDC). Jermsittiparsert et al. [8] developed deep learning-based model called ResNet34 helped to recognize the speech words automatically to detect emotion with the help of MFCC, prosodic, LSP and LPC features. Bhangale and Mohanaprasad [9] proposed a threelayered sequential deep convolutional neural network (DCNN) on mel-frequency log spectrogram (MFLS) speech features and implemented with CNN and CNN-LSTM. Swain et al. [10] presented to extract 3-D log Mel-spectrograms features by first and second derivative of Mel-spectrograms, then a bi-directional-gated recurrent unit network with ensemble classifiers using Softmax and Support Vector Machine to get final classification. Hence, prior studies differ in several ways like use of different datasets, various speech features and its combinations, several machine learning models of classification and environment of communication affecting the emotion recognition from speech. These gaps motivate to work on emotion identification from speech.
3 System Model In the proposed approach, Deep Convolution Neural Network (DCNN) is used for the classification of the emotions after data augmentation and extracting features from speech. The model includes one-dimensional convolutional layers combined with max pooling, dropout and flatten layers. The output is activated by ReLU. The system model embodies the various blocks of the proposed approach like black box representation shown in Fig. 1. The starting block is the speech sample, which is input to the model. Then data augmentation is done to achieve a set of realistic data for more visibility. The features like MFCC, Chroma STFT, Zero Cross Rate, Spectral Centroid and Mel Spectrogram are extracted. Finally, Convolution layer is applied with ReLu activation followed by max pooling and dropouts. Flatten is also used to get one-dimensional data which creates a single long feature vector. Hence, the model is fine-tuned by adding, removing or modifying the dropout rates in some of the layers based on the datasets to achieve the optimal results for this implementation as compared to the pre-defined models.
120
A. Mohanty et al.
Fig. 1 Speech emotion recognition approach Table 1 Speech Corpora Name Language Emotions RAVDESS
English
SAVEE TESS
English English
CREMA-D English
Calmness, happiness, sadness, anger, fear, surprise, disgust and neutral Surprise, happy, sad, angry, fear, disgust and neutral Happy, sad, angry, disgusted, neutral, pleasant surprise and fearful Happy, sad, anger, disgust and neutral
References [11] [12] [12] [12]
4 Proposed Method The method presented in this implementation consists of both acoustic and periodic speech features, DCNN model with different datasets. To design the model, input data is required. So, the data is collected from various database. Database: To verify the performance of the model, four different types of the databsets are used. Those are RAVDESS, TESS, SAVEE and CREMA-D shown in Table 1.
4.1 Data Augmentation Deep learning models heavily depend on training dataset. But the main drawback of the dataset is class imbalance and small size. To overcome these two drawbacks, data augmentation [8] is necessary to make the dataset more like real-world audio inputs and enhance the recognition of emotions.
Improvement of Speech Emotion Recognition …
121
• Noise injection [12] is the technique which adds some arbitrary values into data. • Time shifting [13] is the data augmentation technique in which audio is shifted to left or right with arbitrary seconds. • Pitch shifting [13] is based on time stretching and resampling technique.
4.2 Preprocessing and Feature Extraction Several speech features are available to analyze the emotions from speech. Feature extraction plays an important role to implement any machine learning models. A leading trained model can be implemented by selecting the appropriate features from speech samples [14]. So, several prosodic and spectral features representation of same sample is used as input to the deep learning model. Before extracting the prosodic and spectral speech features form speech samples, preprocessing needs to be completed. Preprocessing [15] improves the intelligibility of the normal hearing system; helps to detect the highly powered speech segments of short durations; and also detects the silent portion of the speech segments. Pre-emphasis filter is the first step as part of preprocessing which boosts the high frequency signals. Feature extraction Speech is a quasi-periodic signal of varying length which carries information and emotions. After preprocessing of speech signal, feature extraction can be performed which is the important aspect of emotion recognition. Prosodic features are recognized by tones and the rhythms of the human voice. This is also called para-linguistic features which deals with the large units as phrases, words, syllables and sentence properties of speech elements. The used prosodic features are: • Zero-Crossing Rate (ZCR) [14] is referred as rate of change of signal from positive to zero, from zero to negative and vice versa. • Root-Mean Square Value (RMS) [14] measures the energy of a signal for each frame. The squares of each amplitude are added and divided by the number of amplitudes within frames. Spectral features help to represent the characteristics of human vocal tract in frequency domain. In transferring the time domain signal into frequency domain by using Fourier Transform, spectral features [14] are obtained. Those are: • Spectral Centroid (SC) [15] predicts the brightness of sound and referred as median of the spectrum. • Chroma STFT [15] is used as short-term Fourier transformation to calculate chroma features which helps to represent the arrangement of pitch and signal structure.
122
A. Mohanty et al.
Fig. 2 Deep convolutional neural network
• Mel Scale Spectrogram [12] is a spectrogram where the frequencies are transformed to the mel scale. It is computed by following through Windowing, Fast Fourier Transform (FFT), generating a mel scale and generating spectrogram. • Mel Frequency Cepstral Coefficient (MFCC) [12] is represented as short-term power spectrum of sound. It is derived with respect to the linear cosine transform of a log power spectrum on a nonlinear Mel scale of frequency. • Spectral Flux [16] evaluates how quickly the power spectrum of a signal changes. Feature Selection and Dimensionality Reduction In SER, feature selection helps to reach the best classification performance and accuracy from the feature set and also reduces training time and over-fitting [17]. Principal Component Analysis (PCA) [18] is an unsupervised learning dimensional reduction technique. In this method, the feature sets are decomposed by covariance matrix to get principal components and weighted feature set. Then, eigenvectors representative is selected based upon the weighted feature set.
4.3 Modeling The deep convolutional neural network (DCNN) [18] is a specialized type of multistage architecture neural network model designed to analyze emotions from speech shown in Fig. 2. Loss Function and Optimizer is one of the key aspects. The optimization algorithm continuously updates the direction of gradient descent by minimizing the loss function. Then, each layer is updated by back propagation mechanism of neural network till the results are optimized. In the implementation, categorical cross entropy loss function [17]. After that, Root-Mean Squared Propagation (RMSProp) [19] optimization algorithm is used to enhance the loss function. The RMSprop optimizer limits the fluctuations in the vertical direction. So, the learning rate of algorithm grew in horizontal direction.
Improvement of Speech Emotion Recognition …
123
Fig. 3 Flowchart of the implementation
4.4 Flowchart Emotion Classification The flowchart shows the implementation of the proposed speech emotion recognition model which is represented in Fig. 3.
5 Experiment Set up The proposed Speech Emotion Recognition model has been implemented by the help of Python language and its supported libraries along with various machine learning libraries. Python (Python 3.6.3rc1) along with librosa (librosa 0.8.0) is used for audio processing which provides various inbuilt functions for implementation. Apart from these libraries, seaborn and matplotlib libraries are used to plot the graph which helps to analyze the data statistically. Machine learning libraries keras (keras-2.6.0), tensorflow (tensorflow-2.6.0) and Scikit-learn are also used in this implementation. The datasets are collected from the databases and loaded as input to the framework in the form of speech signal. Next, data augmentation techniques like Noise injection, Shifting and Pitch are used to update the dataset more similar to real-world speech input data. After the data augmentation, the dataset is stored as data frame in the CSV format. Each of the speech samples are preprocessed and speech features are extracted by the help of librosa library. The feature is selected by using Fast correlated based filter where threshold value is used as 0.01. To get more accurate result, the dimension of the dataset have been reduced by the help of PCA. Then, the deep learning concept, Deep Convolution Neural Network is applied for classification. In this implementation, the proposed model have been analyzed with the used four
124
A. Mohanty et al.
Table 2 Layers and parameters of DCNN model Layer (type) Output shape RAVDESS Dense (Dense) Activation_8 (Activation) SAVEE Dense (Dense) Activation_8 (Activation) TESS Dense (Dense) Activation_8 (Activation) CREMA-D Dense (Dense) Activation_8 (Activation) Combined datasets Flatten (Flatten) Dense (Dense) Activation_8 (Activation)
Param#
(None, 14) (None, 14)
910 0
(None, 7) (None, 7)
455 0
(None, 7) (None, 7)
1351 0
(None, 12) (None, 12)
780 0
(None, 192) (None, 12) (None, 12)
0 780 0
datasets individually and again combining all the datasets. Each dataset is divided 75% as training data and 25% as testing data. The fine tune Layers, Output shape of each layer and parameters of DCNN are described in Table 2 for each dataset.
5.1 Result and Analysis In order to classify emotions [12] and measure the performance of the designed model, four datasets (RAVDESS, SAVEE, TESS, CREMA-D) are used. The confusion matrix, precision, recall and F1 score give better intuition of prediction results, and accuracy is also compared. In the classification task, the obtained outputs are either positive or negative. There are four category of predictions such as False positive ( f p), False negative ( f n), True positive (t p) and True negative (tn) [12]. False positive refers as the model predicted samples as positive but actually negative. False negative corresponds as model predicts the samples as negative but actually positive. True positive is the sample where both prediction and actual are positive. The prediction and actual samples are negative for True negative.The results are analyzed with the help of Precision, Recall, F1 Score, Support, Macro Average and Weighted Average values. Confusion Matrix is a performance metric to measure a classification model where output is binary or multiclass having the table of different combinations. The Con-
Improvement of Speech Emotion Recognition …
125
Table 3 Result analysis of the model with various datasets Measures
Precision
Recall
f 1 Score
Support#
Angry
0.98 (R)
0.94 (R)
0.96 (R)
50 (R)
0.85 (S)
0.85 (S)
0.85 (S)
13 (S)
1.00 (T)
1.00 (T)
1.00 (T)
98 (T)
0.89 (C)
0.94 (C)
0.91 (C)
283 (C)
0.89 (P)
0.88 (P)
0.89 (P)
90 (P)
Disgust
Fear
Happy
Neutral
Sad
Surprise
Accuracy
M. Avg W. Avg
0.98 (R)
1.00 (R)
0.99 (R)
46 (R)
0.88 (S)
1.00 (S)
0.99 (S)
15 (S)
0.99 (T)
0.99 (T)
0.99 (T)
94 (T)
0.88 (C)
0.88 (C)
0.88 (C)
329 (C)
0.88 (P)
0.91 (P)
0.89 (P)
468 (P)
0.96 (R)
1.00 (R
0.98 (R)
44 (R)
1.00 (S)
0.93 (S)
0.97 (S)
15 (S)
1.00 (T)
0.99 (T)
0.99 (T)
94 (T)
0.87 (C)
0.86 (C)
0.87 (C)
303 (C)
0.86 (P)
0.93 (P)
0.89 (P)
470 (P)
0.98 (R)
0.98 (R)
0.98 (R)
44 (R)
1.00 (S)
1.00 (S)
1.00 (S)
14 (S)
1.00 (T)
1.00 (T)
1.00 (T)
106 (T)
0.84 (C)
0.84 (C)
0.84 (C)
23 (C)
0.91 (P)
0.86 (P)
0.89 (P)
503 (P)
0.98 (R)
0.93 (R)
0.96 (R)
58 (R)
1.00 (S)
1.00 (S)
1.00 (S)
19 (S)
0.98 (T)
0.99 (T)
0.99 (T)
106 (T)
0.88 (C)
0.88 (C)
0.88 (C)
290 (C)
0.89 (P)
0.97 (P)
0.93 (P)
446 (P)
0.94 (R)
0.98 (R)
0.96 (R)
50 (R)
1.00 (S)
0.93 (S)
0.97 (S)
15 (S)
1.00 (T)
0.99 (T)
1.00 (T)
104 (T)
0.91 (C)
0.88 (C)
0.89 (C)
333 (C)
0.91 (P)
0.85 (P)
0.88 (P)
499 (P)
0.95 (R)
0.95 (R)
0.95 (R)
55 (R)
1.00 (S)
1.00 (S)
1.00 (S)
14 (S)
0.99 (T)
1.00 (T)
0.99 (T)
98 (T)
0.97 (P)
0.92 (P)
0.95 (P)
466 (P)
−
−
0.97 (R)
347 (R)
–
–
0.96 (S)
105 (S)
–
–
0.99 (T)
700 (T)
–
–
0.88 (C)
1861 (C)
–
–
0.90 (P)
3342 (P)
0.97 (R)
0.97 (R)
0.97 (R)
347 (R)
0.96 (S)
0.96 (S)
0.96 (S)
105 (S)
0.99 (T)
0.99 (T
0.99 (T)
700 (T)
0.88 (C)
0.88 (C)
0.88 (C)
861 (C)
0.90 (P)
0.90 (P)
0.90 (P)
3342 (P)
Note R ← RAVDESS, S ← SAVEE, T ← TESS, C ← CREMA-D, P ← PROPOSED
126
A. Mohanty et al.
Fig. 4 Confusion matrix of RADVESS and SAVEE datasets
Fig. 5 Confusion matrix of TESS and CREMA-D datasets
Fig. 6 Confusion matrix of combined dataset and accuracy values
fusion matrix analysis of RADVESS, SAVEE, TESS, CREMA-D and the combined datasets are given in Figs. 4, 5 and 6. The accuracy measure for RADVESS, SAVEE, TESS and CREMA-D datasets are showing 0.97 (As 0.9666 is rounded), 0.93 (As 0.9253 is rounded), 0.99 (As 0.9942 is rounded), 0.88 (As 0. 8790 is rounded), respectively. Hence, the proposed model classifying performs as 97%, 93%, 99% and 88% of classification accuracy.
Improvement of Speech Emotion Recognition … Table 4 Comparison with previous models Dataset Existing model RAVDESS SAVEE TESS CREMA-D Combined datasets
71.76 [5], 88.72 [12] 65.03 [20], 86.80 [12] 55.71 [20], 99.52 [12] 71.69 [12] NA
127
Proposed model 96.54 92.38 99.42 87.90 90.27
Similarly, for combined dataset, the proposed model performance is showing 0.90 (As 0.9026 is rounded) and classification accuracy is 90%. TESS dataset which has only female speech samples shows extremely better performance in this model. The detail analysis of the performance measures is discussed in Table 3.
5.2 Comparison Analysis This implementation has been compared with previously implemented model which is using MFCC, Mel-scaled spectrogram, chromagram, spectral contrast feature and Tonnetz representation as speech features with CNN implementation. Even also compared with another model where ZCR, MFCC, Chroma STFT, RMS, Mel-scaled spectrogram are used as speech features with CNN, modeling shows better performance illustrated in Table 4. Various speech features are playing important role to improve the accuracy of the Emotion Recognition System. On considering other speech features such as Spectral centroid, Chroma STFT, Spectral flux as the combination of both prosodic and spectral features, the model performance for TESS is relatively better than other datasets. However, in this proposed model, it is observed that TESS dataset shows slightly less accuracy as compared to the existing model.
6 Conclusion It has been observed that deep learning technique like DCNN is a feasibility way of predicting human emotions from speech. The proposed model is composed of four phases—data augmentation, feature extraction, feature selection and dimensionality reduction and finally classification. DCNN is applied for classification of emotions on four different datasets like RAVDESS, SAVEE, TESS and CREMA-D. The performance of the model is verified by combining all the datasets into a single dataset, and the proposed model is validated based on accuracy, precision, recall, microaverage and weighted average values for different datasets. It has been observed
128
A. Mohanty et al.
that the proposed model outperforms the existing models with pre-defined machine learning and deep learning techniques. In future, the emotion classification accuracy of proposed model can be improved by considering different speech features like Spectral Spread, Spectral Entropy, Spectral Roll off, Chroma Vector and Chroma Deviation with other classification techniques.
References 1. Ancilin J, Milton A (2021) Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl Acoust 179:1080469 2. Das RK, Mahadeva Prasanna SR (2016) Exploring different attributes of source information for speaker verification with limited test data. J Acoust Soc Am 140(1):184–190 3. Daneshfar F, Kabudian SJ, Neekabadi A (2020) Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier. Appl Acoust 166:107360 4. Wang J, Xue M, Culhane R, Diao E, Ding J, Tarokh V (2020) Speech emotion recognition with dual-sequence LSTM architecture. In: ICASSP 2020–2020 IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, Barcelona, pp 6474–6478 5. Issa D, Demirci MF, Yazici A (2020) Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control 59:101894 6. Christy A, Vaithyasubramanian S, Jesudoss A, Praveena MD (2020) Multimodal speech emotion recognition and classification using convolutional neural network techniques. Int J Speech Technol 23(2):381–388 7. Pawar MD, Kokate RD (2021) Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients. Multimedia Tools Appl 80(10):15563– 15587 8. Jermsittiparsert Kittisak, Abdurrahman Abdurrahman, Siriattakul Parinya, Sundeeva Ludmila A, Hashim Wahidah, Rahim Robbi, Maseleno Andino (2022) Pattern recognition and features selection for speech emotion recognition model using deep learning. Int J Speech Technol 23(4):799–806 9. Bhangale K, Mohanaprasad K (2022) Speech emotion recognition using mel frequency log spectrogram and deep convolutional neural network. Futuristic Commun Netw Technol 241– 250 10. Swain M, Maji B, Kabisatpathy P, Routray A (2022) A DCRNN-based ensemble classifier for speech emotion recognition in Odia language. Complex Intell Syst 1–3 11. Xu M, Zhang F, Zhang W (2021) Head fusion improving the accuracy and robustness of speech emotion recognition on the IEMOCAP and RAVDESS dataset. IEEE Access 9:74539–74549 12. Dolka H, Arul Xavier VM, Juliet S (2021) Speech emotion recognition using ANN on MFCC features. In: 2021 3rd International conference on signal processing and communication (ICPSC), IEEE, Coimbatore, pp 431–435 13. Pham NT, Dang DNM, Nguyen SD (2021) Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition. 309:145–156. arXiv preprint arXiv:2109.09026 14. Akçay MB, O˘guz K (2020) Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun 116:56–76 15. Rajesh S, Nalini NJ (2020) Musical instrument emotion recognition using deep recurrent neural network. Procedia Comput Sci 167:16–25 16. Hao Y, Küçük A, Ganguly A, Panahi, IMS (2020) Spectral flux-based convolutional neural network architecture for speech source localization and its real-time implementation. IEEE Access 8:197047–197058
Improvement of Speech Emotion Recognition …
129
17. Gao M, Dong J, Zhou D, Zhang Q, Yang D (2019) End-to-end speech emotion recognition based on one-dimensional convolutional neural network. In: Proceedings of the 2019 3rd international conference on innovation in artificial intelligence. ACM Press, Kunming, pp 78–82 18. Zheng WQ, Yu JS, Zou YX (2015) An experimental study of speech emotion recognition based on deep convolutional neural networks. In: 2015 International conference on affective computing and intelligent interaction (ACII). IEEE, Xi’an, pp. 827–831 19. Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw Mach Learn 4(2):26–31 20. Sakorn M, Jitpattanakul A, Hnoohom N (2020) Negative emotion recognition using deep learning for Thai language. In: 2020 Joint international conference on digital arts. Media and technology with ECTI northern section conference on electrical, electronics, computer and telecommunications engineering (ECTI DAMT & NCON). IEEE, Pattaya, pp 71–74
Genetic Artificial Bee Colony for Mapping onto Network on Chip “GABC” Maamar Bougherara and Messaoudi Djihad
Abstract Network on chip is a new concept in SoC interconnections, and this structure facilitates complex components integration. However, as it is a new technology, it requires efforts in research, especially for the acceleration and simplification of design phases. Mapping step is very important in the process of network on chip design. In fact, a bad mapping of application software components can significantly reduce the overall performance of the final system. It is therefore interesting to develop methods and tools to automate this step. The main goal of our work is to develop a new technique for mapping applications onto NoC architecture in order to minimize communications cost. This new solution is based on artificial bee colony (ABC) and genetic algorithm(GA). The experimental results demonstrate the efficiency and effectiveness of the proposed G ABC algorithm with significant communication cost reductions compared to ABC. For example, for the V O P D and M P E G benchmarks, G ABC saves more than 21.36% and 5.6% in communication cost compared to ABC and previously proposed algorithms. Keywords System on chip · Network on chip · NoC architecture · Mapping · Artificial bee colony · Communication cost
1 Introduction The NoC paradigm was born with the growth of applications size running on the SoC and their complexity and heterogeneity. Indeed, this growth has made communication bus between components very difficult and does not meet all needs. Thus, basing on computer networks, a network on a chip has been established. The mapping of IP cores to NoC is a critical step in NoC design that significantly affects the performance of the system such as power consumption, latency, and load balancing. Mapping in M. Bougherara LIMPAF Laboratory, Bouira University, Bouira, Algeria M. Bougherara (B) · M. Djihad Department of Computer Science, High Normale School of Kouba, Algiers, Algeria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_11
131
132
M. Bougherara and M. Djihad
NoC process design is considered as an NP-hard problem, for the fact that more than one critical performance must be taken into account for a best mapping algorithm. Therefore, several methods are proposed in the literature to deal with. Therefore, many researchers solve this problem by using a heuristic algorithm, and the artificial bee colony (ABC) is a less complex and more efficient algorithm, but the main objective of this paper is to try to coordinate the ABC with the genetic operator to make it more powerful. The rest of this paper is organized as follows, in Sect. 2, some generalities on NoC will be given, Sect. 3 some related work. In Sect. 4, we highlight the most mapping problems, and our proposed method is described in Sects. 5 and 6, simulation and results are given in Sect. 7. Finally, we conclude our work by a conclusion in Sect. 8.
2 NoCs Generalitie Network on chip (NoC) inspired by general networks designed for supercomputers, it is a set of devices interconnected on a single chip, where their communication is fulfilled by sending data on a scalable interconnect network. A NoC provides the following main advantages: (i) energy efficiency and reliability; (ii) scalability of bandwidth compared to traditional bus architectures; (iii) reusability [1]. A NoC is characterized by (i) a topology: describes how the NoC components are connected to form the on-chip interconnect (e.g., 2D mesh, torus, ring...), (ii) commutation mode, (iii) flow control in order to avoid the deadlock problem, and (iv) storage strategies. Figure 1 shows an example of a 3 × 3 2D mesh NoC with 9 tiles. The NoC designbased system consists of several steps, first, the application has to be implemented by a set of communications task, which can be run in parallel; then, each task has assigned to an available core selected and scheduled, finally, these cores have to be mapped on the NoC [2].
Fig. 1 Network on chip example with 9 Tiles
Genetic Artificial Bee Colony for Mapping …
133
In this paper, we interest in final step called application mapping, and it is an important step and steel open search problem, an algorithm mapping can optimize up to 51,7 percent of communication energy savings compared to an ad hoc implementation [3]. To achieve a high performance, an optimal mapping solution should be found. For example, if we have m tasks to mapped into a NoC of n cores where (m 0.5. It is shown in the Figs. 6, 7, and 8. The correlation between important and influential factors increases when the value tends to 1 which is identified by the heat map. Table 1 Dataset table
Dataset
Instances
Attribute
Lung cancer
1187
9
Breast cancer
569
32
Prostate cancer
1101
9
Presaging Cancer Stage Classification by Extracting Influential Features … Table 2 List of influential factors extracted from a dataset by random forest classifier
191
Radius
Concavity
Texture
Perimeter
Concave points
Area
After finishing all phases, the T stage of cancer is categorized with the malignant tumor using the TNM staging [1] model. The proposed model considers radius, area, perimeter as an important factor to categorize the actual stage of the T. Other influential factors are highly correlated with the important factor like the concavity, concave points, and texture and so on. The radius of the tumor is used to measure the circumferential involvement of the tumor to categories the T stage of cancer. Each T stage has a specific millimeter or centimeter unit to measure the length of the tumor and classify the actual stage of the tumor. Finally, ML and DL algorithms are used for evaluating the outcome of the model for classifying the T stage of cancer. Flowchart of the proposed model is represented in Fig. 1. The shape of the tumor is asymmetric and irregular; to measure the tumor size, we consider radius as an influential factor of the circumference involvement for tumor (T) staging. The formula used for circumferential measurement (C) of the tumor size is given in Eq. (2) because the model used radius as an important feature which is computed and measured by clinical or pathological report. C = 2πr
(2)
Other important factors which are highly related with tumor (T) size are area (A) of the tumor as expressed in Eq. (3) which is extremely correlated with the radius. Each stage T1, T2, T3 classifies with the primary factor as tumor size along with
Fig. 1 Flowchart of the proposed model
192
S. Manna and S. Mistry
area and perimeter considered as the highly dependent parameter for tumor size measurement. A = πr 2
(3)
Each important and influential factor has a different impact on the tumor such as perimeter represent the core tumor with mean value, whereas area connected with the mean of the cancer cell area, compactness analyzing with the perimeter and area mean. The T stage of the cancer is accurately classified by the tumor size along with the important factors which are highly dependent on the influential factor. Stage T1 has 42%, 60%, and 63% of malignant tumors for breast, lung, and prostate dataset which is classified by tumor size measured with the important factor such as radius and is highly correlated with other important factors like area, perimeter with IF > 0.9. For stage T2, 53%, 14%, and 21% of malignant tumors are classified with important parameters such as radius, area, and perimeter which are highly correlated with other influential factors like concave points with a value of IF > 0.60. The concave point is highly correlated with smoothness, compactness, concavity, symmetry, fractal dimension. In stage T3, smoothness, texture, and symmetry parameter are related with radius, area, and mean with IF > 0.55. T3 stage has 5%, 0.54%, and 16% of malignant tumors. The dependency of the important and influential factors in T1, T2, and T3 stages of tumor is represented by heat map of Figs. 6, 7, and 8. The major contribution of this implementation is to develop a model where the study identifies new influential factors which can more accurately classify the T stages of cancer by relating them with standard biological models.
4 Result and Discussion The proposed model examines the clinical data with important and influential factors like radius, area, perimeter, concave points for classifying the T stage of cancer. In the TNM [1] model, the tumor size (T), lymph node (N), and metastasis (M) are considered as parameters of the stage. In the proposed model, circumferential involvement is considered as the important factor for T size measurement. The dataset is split into training (70) and testing set (30); the important factors were selected from the training set. The experiment is performed on a system with the configuration i5 processor, 16 GB RAM, 512 SSD, NVIDIA GPU. The following results are generated in anaconda platform Spyder (3.9) environment. The performance of ML and DL models was measured with respect to the number of instances classified correctly as true-positive (TP) rate, and number of instances negatively classified as false-positive (FP) rate. The number of instances correctly classified as true-negative (TN) and instances incorrectly classified as false-negative (FN). The accuracy, specificity, sensitivity of the model determines using TP, FP, TN, and FN. Figure 2 shows the ROC curve for malignant data of the WDBC, lung
Presaging Cancer Stage Classification by Extracting Influential Features …
193
Fig. 2 ROC curve for full dataset of lung cancer, breast cancer, and prostate cancer
cancer, and prostate data. Each line in ROC curve represents separate ML and DL algorithms. The ROC curve for breast cancer dataset is represented in Fig. 3. Out of the 212 malignant data, 42%, 53%, and 5% data present in T1, T2, and T3 stages. The lines in the Fig. 3 show the performance of the model getting better in individual stages of tumor T. The ROC curve for stage T1, T2, and T3 of prostate is represented in Fig. 4. Out of 950 malignant data, 63%, 21%, and 16% belong to T1, T2, and T3 stages of prostate. The changes in the curve lines represent a different accuracy rates for each T stage of prostate. Figure 5 shows ROC curve for stages T1, T2, and T3 of lung cancer. Out of 1029 malignant tumors, T1 has 60.4%, 14% in T2, and 0.6% in T3. Figure 5 represents the curve of stage T1, T2, and T3. Figure 6 shows the correlation between features increases when it is divided into different stages of cancer. The below heat map identifies the similarity between important and influential factors of stage T1. The important factors with other influential factors are also responsible for staging such as radius, area, and perimeter with IF > 0.70. Figure 7 shows the dependency between features increases when it is divided into different stages. The below matrix represents not only the important features that are responsible for the disease, but also other influential factors that are responsible in each stage.
194
Fig. 3 ROC curve for breast cancer for T1, T2, and T3 stages
Fig. 4 ROC curve for prostate cancer for T1, T2, and T3 stages
S. Manna and S. Mistry
Presaging Cancer Stage Classification by Extracting Influential Features …
Fig. 5 ROC curve for lung cancer for T1, T2, and T3 stages
Fig. 6 Heat map representation of T1 stage with all features
195
196
Fig. 7 Heat map representation of T2 stage with all features
Fig. 8 Heat map representation of T3 stage with all features
S. Manna and S. Mistry
Presaging Cancer Stage Classification by Extracting Influential Features …
197
Figure 8 shows the dependency among the features increases heavily as it reaches to stage T3, and most of the features become responsible for the disease.
4.1 Comparison Table The proposed model achieves the following result for malignant data after training and testing the dataset with ML algorithms LR, KNN, NB, SVM and one DL model MLP. Among all ML algorithms, SVM has the highest accuracy (99.4), sensitivity (95.7), specificity (92.7), recall (98), and F measure (96.9) rate. Table 3 shows that the proposed model performs better than [13], but the accuracy is better in [14]. The comparison performs with other algorithm for malignant data of dataset as shown in Table 3. Cancer is classified into four stages, but a handful of models exists for T stage classification. Proposed model categories the cancerous data as features values and measures their performance. Dataset is split into three stages T1, T2, and T3 respecting the size of the malignant tumor. Table 4 represents highest accuracy achieved by stage T1: LR and SVM is 98.7 for LC, LR (95.1) for BC and for PC, SVM (92.1) for ML model, whereas MLP achieves highest accuracy in LC (98.1). The accuracy for KNN and NB is 93.7 and 90.2 with precision—97.2 and 91.8, specificity—89.6 and 75, recall—94.7 and 95.3, and F measure—95.9 and 93.5 for BC. The accuracy for KNN and NB in LC is 98.4 and 93. In the prostate, the accuracy of LR, KNN, and NB is 90.6, 77.1, and 90.2 with precision—90.4, 86.7, and 89.9. The observed value of T2 stage is shown in Table 5, where LR has highest accuracy of 99.3 for breast, 100 accuracies for all models in lung, and in prostate, 91.5, and for SVM, 92.3 in MLP for breast and 80 in MLP for prostate. The accuracy obtained by other ML models for Prostate cancer is LR and NB 84.2 and KNN 78.3 with highest precision 90 in SVM and NB, 67.2. The outcome of stage T3 shown in Table 6 for three datasets with 100% accuracy achieved by all ML and DL models for breast and lung cancer, whereas 96% highest accuracy achieved by SVM for prostate cancer with 79% in MLP. Other model accuracy in the prostate is LR—92.4, KNN—82.1, and NB—92.7.
5 Conclusion The focus of the proposed model is to be categorized the tumor (T) stages of cancer based on three datasets breast, lung, and prostate cancer using circumferential involved and additional influential factor like concavity, texture, etc., to classify the data into different stages using ML and DL algorithms LR, KNN, SVM, NB, and MLP. The accuracy obtained from the experiment is much better than other methods and highest accuracy is achieved by SVM and LR with 98.2, 97.3 and 98.9. In this study, TNM model with additional influential factors is used for the cancer stage
198
S. Manna and S. Mistry
Table 3 Comparison table of (a) Breast cancer, (b) Lung cancer, (c) Prostate cancer ML and DL models
Accuracy
Precision
Specificity
Recall
F Measure
(a) Breast cancer (%) RBF-NN [6]
79.16
SVM [13]
90.14
NB [15]
93.32
SVM [14]
99.51
KNN [16]
95.90
MLP [17]
97.7
–
–
–
44.9
90.1
–
–
–
–
100
99.25
–
90.47
98.27 –
–
–
–
94.2 –
IEC-MLP [18]
98.74
98.13
–
–
–
Proposed [LR]
99.4
98.3
96.8
96.6
97.4
Proposed [KNN]
94.5
96.6
92.9
89.6
93
Proposed [NB]
97.1
95.7
92.7
98.2
96.9
Proposed [SVM]
99.4
95.7
92.7
98
96.9
Proposed [MLP]
98.8
97
96
90
93
NB [5]
88.8
80.1
90.2
86
88
MLP [5]
88.7
92.3
–
83.7
87.8
SVM [6]
78.9
77
–
83
79
SVM [19]
86
100
–
67
80
Proposed [KNN]
97.9
100
100
87.8
93.5
(b) Lung cancer (%)
Proposed [NB]
96.4
94.8
99
Proposed [SVM]
95.6
70.6
95.1
100
83.3
82.8
88.7
Proposed [MLP]
98.4
98
99
100
99
SVM [7]
71.6
59
68.7
–
–
ANN [7]
71.6
56.4
76.6
–
–
SVM [20]
71
–
–
73.7
71.6
LR [19]
92
83.3
86.3
–
–
MLP [8]
87.75
89.17
85.17
–
–
Proposed [LR]
89.1
86.3
83.8
79.7
76.5
Proposed [NB]
88.9
87.3
84.2
75.3
76.3
Proposed [SVM]
87.6
75.7
83.5
76.2
76
Proposed [MLP]
88
88.1
86.9
80.3
80.1
(c) Prostate cancer (%)
95.1
94.4
92.3
SVM
MLP
83.9
90
93.1
92.1
89
MLP
Accuracy
Prostate cancer (%)
78
97.2
98.1
SVM
ML and DL models
F Measure
88
91
Precision
86
95.5
95.6 82
96.4
96.8
84.9
70.1
Specificity
98.1
98.7
98.7 98
96.1
96.1
87
91
Recall
Precision
Lung cancer (%) Recall
Accuracy
Specificity
Accuracy
Precision
Breast cancer (%)
LR
ML and DL models
Table 4 Performance evaluation of the proposed model for tumor stage T1
98.9
99.2
99.2
Specificity
86
91
F Measure
100
96.1
96.1
Recall
99
96.1
96.1
F Measure
Presaging Cancer Stage Classification by Extracting Influential Features … 199
92.3
78
88.9
95.4
94.6
MLP
Accuracy
Prostate cancer (%)
SVM
ML and DL models
MLP
93.1 86
95.6
Recall
95.8
90
Precision
Specificity
F Measure 82
96.8
95.8
87.2
Specificity
100
100 100
100
87
85.8
Recall
Precision
Accuracy
99.1
Precision
Accuracy
99.3
Lung cancer (%)
Breast cancer (%)
LR
ML and DL models
Table 5 Performance evaluation of the proposed model for tumor stage T2
100
100
Specificity
78.4
87.9
F Measure
100
100
Recall
100
100
F Measure
200 S. Manna and S. Mistry
100
100
96
89
MLP
Accuracy
Prostate (%)
SVM
ML and DL models
MLP
100
100
100
Recall
88.9
93
Precision
Specificity 100
F Measure 100
100
86.7
74
Specificity
100
100 100
100
78
91.6
Recall
Precision
Accuracy
100
Precision
Accuracy
100
Lung cancer (%)
Breast cancer (%)
LR, KNN, SVM, NB
ML and DL models
Table 6 Performance evaluation of the proposed model for tumor stage T3
100
100
Specificity
89.1
92.3
F Measure
100
100
Recall
100
100
F Measure
Presaging Cancer Stage Classification by Extracting Influential Features … 201
202
S. Manna and S. Mistry
classification to provide better prognosis and personalized treatment. The advanced stage classifications through statistical and advanced deep learning models are some major approachable works that need to be explored for gaining more effective trustworthiness of the model. In future, we focus to classify the T stage of tumor with node (N) and metastasis (M) using a grading system for accurate stage classification diagnosis with a proper explanation why the features are selected as important feature by the model using XAI approach to build the trustworthiness.
References 1. Amin MB, Greene FL, Edge SB, Compton CC, Gershenwald JE, Brookland RK, Meyer L, Gress DM, Byrd DR, Winchester DP (2017) The eighth edition AJCC cancer staging manual: continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA Cancer J Clin 67(2):93–99 2. Kulkarni P (2019) Fine grained classification of mammographic lesions using pixel N-grams. Asian J Convergence Technol (AJCT). ISSN: 2350-1146 3. Ibrahim AO, Ahmed A, Abdu A, Abd-alaziz R, Alobeed MA, Saleh AY, Elsafi A (2019) Classification of mammogram images using radial basis function neural network. In: International conference of reliable information and communication technology. Springer, Cham, pp 311–320 4. Boeri C, Chiappa C, Galli F, De Berardinis V, Bardelli L, Carcano G, Rovera F (2020) Machine Learning techniques in breast cancer prognosis prediction: a primary evaluation. Cancer Med 9(9):3234–3243 5. Joshua ESN, Chakkravarthy M, Bhattacharyya D (2020) An extensive review on lung cancer detection using machine learning techniques: a systematic study. Rev d’Intelligence Artif 34(3):351–359 6. Islam M, Rab R (2019) Analysis of CT scan images to predict lung cancer stages using image processing techniques. In: 10th Annual information technology, electronics and mobile communication conference (IEMCON). IEEE, pp 0961–0967 7. Nitta S, Tsutsumi M, Sakka S, Endo T, Hashimoto K, Hasegawa M, Hayashi T, Kawai K, Nishiyama H (2019) Machine learning methods can more efficiently predict prostate cancer compared with prostate-specific antigen density and prostate-specific antigen velocity. Prostate Int 7(3):114–118 8. Bellad SC, Mahapatra A, Ghule SD, Shetty SS, Sountharrajan S, Karthiga M, Suganya E (2021) Prostate cancer prognosis-a comparative approach using machine learning techniques. In: 2021 5th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 1722–1728 9. Abdallah SA, Mustafa ZA, Ibraheem BA (2022) Prostate cancer classification using artificial neural networks. J Clin Eng 47(3):160–166 10. Nooreldeen R, Bach H (2021) Current and future development in lung cancer diagnosis. Int J Mol Sci 22(16):8661 11. Kasinathan G, Jayakumar S (2022) Cloud-Based lung tumor detection and stage classification using deep learning techniques. BioMed Res Int 12. William HW, Street WN, Mangasarian OL (1995) Breast cancer Wisconsin (diagnostic) data set. UCI Machine Learning Repository 13. Alzu’bi A, Najadat H, Doulat W, Al-Shari O, Zhou L (2021) Predicting the recurrence of breast cancer using machine learning algorithms. Multimedia Tools Appl 80(9):13787–13800 14. Divyavani M, Kalpana G (2021) An analysis on SVM & ANN using breast cancer dataset. Aegaeum J 8(369–379)
Presaging Cancer Stage Classification by Extracting Influential Features …
203
15. Wang H, Yoon SW (2015) Breast cancer prediction using data mining method. In: IIE Annual conference on proceedings. Institute of Industrial and Systems Engineers (IISE), p 818 16. Sharma S, Aggarwal A, Choudhury T (2018) Breast cancer detection using machine learning algorithms. In: International conference on computational techniques, electronics and mechanical systems (CTEMS) 17. Al-Shargabi B, Alshami F, Alkhawaldeh R (2019) Enhancing multi-layer perception for breast cancer prediction. Int J Adv Sci Technol 18. Talatian Azad S, Ahmadi G, Rezaeipanah A (2021) An intelligent ensemble classification method based on multi-layer perceptron neural network and evolutionary algorithms for breast cancer diagnosis. J Exp Theoret Artif Intell 1–21 19. Banerjee N, Das S (2020) Prediction lung cancer–in machine learning perspective. In: International conference on computer science, engineering and applications. IEEE, pp 1–5 20. Alghatani K, Ammar N, Rezgui A, Shaban-Nejad A (2022) Precision clinical medicine through machine learning: using high and low quantile ranges of vital signs for risk stratification of ICU patients. IEEE Access
Diagnosis of Parkinson’s Disease Using Machine Learning Algorithms Khushal Thakur, Divneet Singh Kapoor, Kiran Jot Singh, Anshul Sharma, and Janvi Malhotra
Abstract Parkinson’s disease is an age-related nervous system disorder that impacts movement of the human body. Cure of this disease is yet unknown, but a patient’s quality of life can be improved if it is diagnosed at an early stage. It has been identified by various researchers that voice degradation is a common symptom of this disease. Machine learning is increasingly establishing its place in the healthcare industry. In this paper, we implement and compare state-of-the-art machine learning algorithms like Support Vector Machine, Random Forest Classifier, Decision Tree Classifier, and Extra Trees Classifier for diagnosis of Parkinson’s disease performance evaluation. The work in this paper proposes using Extra Trees Classifier along with entropybased feature selection which proved superior over all other models for Parkinson’s disease prediction. It yields highest accuracy 94.34%, precision 0.9388, F 1 -score 0.9684, and recall value 1. Keywords Parkinson’s disease · Machine learning · Classification · Feature selection · Accuracy
1 Introduction Parkinson’s disease is a long-term disorder of the nervous system that affects movement. Its symptoms appear slowly, and sometimes, it is just barely noticeable tremor in one hand, but as the disease worsens, it causes stiffness in the body and so slowing of movement [1]. The common symptoms of the disease are tremor, rigidity, slowness of movement, difficulty in walking, loss of odor, and deterioration of voice. In the first stages, your face may not show any change, and your arms and legs may not swing, but these symptoms will come into visibility when the disease worsens. K. Thakur (B) · D. S. Kapoor · K. J. Singh · A. Sharma Electronics and Communication Engineering Department, Chandigarh University, Mohali, Punjab 140413, India e-mail: [email protected] J. Malhotra Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab 140413, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_16
205
206
K. Thakur et al.
Death of cells in substantia nigra, a region of midbrain, leads to dopamine deficit which causes motor symptoms. It also involves building up misfolded proteins in the neurons. Because it affects motor characteristics, Parkinson’s disease is also called ‘shaking palsy’ [2]. Exact cause of Parkinson’s disease is unknown, and it is believed that both genetic and environmental factors are responsible for the disease. People who have some family members with Parkinson’s disease are at risk. Other groups of people at elevated risk are the people who have been exposed to some pesticides or have suffered from some prior head injuries. Parkinson’s disease occurs in the age group of over 60 [3, 4]. Males are more affected by this disease, and the ratio stands 3:2. Parkinson’s disease in the age group of before 50 is called early-onset PD. Average life expectancy after diagnosis is said to be in between 7 and15 years. Some of the Parkinson’s disease statistics are as follows: • One or two people out of 1000 people have Parkinson’s disease. • For people over the age of 65, 18 people out of 1000 are affected. • Annual mortality rate per 100,000 in India has increased by 87.9% since 1990 which was 3.8% earlier. • About 15% of people who have this disease have a family history of this disease. • Approximately 60,000 people are diagnosed with Parkinson’s disease each year and it is believed that large numbers of cases are not detected. • About 7–10 people worldwide are suffering from this disease. • Around 1 in 20 affected people is from the age group of less than 40. The cure for this disease is not known, but medications and therapies like physiotherapy and speech therapy, especially in preliminary stages, can significantly improve life quality. If Parkinson’s disease is detected in preliminary stages, it can also reduce estimated cost of pathology. One common and early-stage symptom of Parkinson disease is degradation of voice [5]. Also, the analysis of voice measurement is simple and non-invasive. Thus, for diagnosis of Parkinson’s disease, measurement of voice can be used. Data science is one approach to diagnose Parkinson’s disease in its preliminary stages. Data science is the study of enormous amounts of data that uses systems, algorithms, and processes to extract meaningful and useful information from raw, structured, and unstructured data [6, 7]. Data is a precious asset for any organization of data science, and data is manipulated to get some new and meaningful information. In data science, knowledge from datasets (typically large) is extracted and applying that knowledge and several significant problems related to that dataset can be solved. Data science is playing a significant role in the healthcare industry. In health care, physicians use data science to analyze their patient’s health and make weighty decisions. It helps hospital managing teams to enhance care and reduce waiting time [8]. For this purpose, voice data has been collected via telemonitoring and telediagnosis systems which are economical and easy to use. Furthermore, advances in technologies like Internet of things, wireless sensor networks, and computer vision can be used to develop newer multidomain solutions [9–14].
Diagnosis of Parkinson’s Disease Using Machine Learning Algorithms
207
2 Background 2.1 Related Work Several researchers have made several attempts to diagnose Parkinson’s disease. Here, in this section, some recent works on the detection of Parkinson’s disease using voice features are mentioned. Sakar et al. [15] developed machine learning models that differentiate healthy and Parkinson’s affected people. Data was collected from a group of 40 people. 20 were healthy and 20 were Parkinson’s-affected patients. Twenty-six speech samples were collected from all 40 patients. SVM (RBF-kernel) and k-NN (1,3,5,7) were used for classification. Summarized Leave-One-Out (sLOO) and Leave-One-Subject-Out (LOSO) were used in cross-validation. Bhattacharya et al. [16] made this model using WEKA. SVM was the machine learning algorithm used in their research. The values of kernels were varied to achieve the best accuracy. Before classification, the data was also preprocessed. Richa Mathur et al. also developed a model using WEKA. They went with k-NN, AdaBoost, M1, bagging, and MLP. Benba et al. [17] collected a dataset from 34 people, 17 of which were PD-affected patients. From each person, they collected 1–20 Melfrequency cepstral coefficients (MFCCs). SVM machine learning algorithm was used. For cross-validation, LOSO was used. Sakar et al. [18] introduced tunable Qfactor wavelet transform (TQWT) in their work. TQWT outperformed old methods for extraction of voice features. MFCC and TQWT were contributing the most in best accuracy Therefore, when it comes to Parkinson’s detection, TQWT and MFCC contribute the most. Little et al. [19] introduced pitch period entropy (PPE) in their work. Data was collected from a group of 31 people: 8 were healthy and 23 were Parkinson’s-affected patients. They went with feature calculation, data preprocessing, followed by feature selection, and finally, classification. They went with the SVM algorithm. All these works above show that various machine learning algorithms have been applied on dataset to detect Parkinson’s disease. Forest-based algorithms like Extra Trees Classifier have not been used in these works. In our proposed work, forest-based algorithms are used. Also, feature selection techniques SelectFromModel have been applied on the dataset for improving the results.
2.2 Parkinson’s Disease Dataset The first step for performing classification using machine learning algorithms is collection of data. For this research, the data is downloaded from an online website www.kaggle.com, which originally was collected from the UCI, a machine learning repository. Some common features of the data are as follows. This dataset has a total of 756 instances and 754 features. This dataset has data for both Parkinson’s disease-affected patients and healthy people. Voice-based data of 188 Parkinson’s disease patients (107-men and 81-women) and 64 healthy individuals (41-women,
208
K. Thakur et al.
23-men) is collected in this dataset. Three repetitions of sustained phonations were performed, and this is how 756 instances are formed.
2.3 Machine Learning The four models that were opted to work on this problem statement were Support Vector Machine (kernel = ‘linear’), Decision Tree Classifier, Random Forest Classifier, and Extra Trees Classifier. All these models are explained in detail below. 1. Support Vector Machine (SVM) [20]: Support Vector Machine is an ML algorithm which is used in both classification as well as in regression problem statements. It is one of the most used machine learning algorithms. SVM algorithm creates a line (decision boundary). This line will segregate data points in Ndimensional space, and it will be able to put new data points into a particular category. Extreme points also called vectors are considered while creating this decision boundary (hyperplane). The extreme cases are support vectors of this Support Vector Machine model. If we have a dataset to be classified into two categories (red circles and green stars), then there can be many decision boundaries for that like red and green. The work of the SVM machine learning algorithm is to find the best decision boundary (called hyperplane) that separates given two data most efficiently. Hyperplane is a decision boundary with the largest margin (distance between decision boundary and closest points). 2. Decision Tree (DT) Classifier [21]: Decision Tree Classifier classifies the data like human technique of thinking. We need to be familiar with some terms to understand logic behind Decision Tree Classifier: • ROOT NODE (parent node): It is the starting point of the whole tree. It is the node from where the entire dataset will be divided into sets. • LEAF NODE: These are the final nodes. These nodes cannot be divided further. • SPLITTING: It is the process in which decision nodes are further divided. • BRANCH: It is a tree formed when a tree splits. • CHILD NODE: All nodes (except root node). • PRUNING: Removal of unwanted subtrees. The following describes the working of Decision Tree classifier: • Tree will start given real dataset which is the root node of our tree. • The values of root attributes will be compared with attributes of given dataset. • Considering the comparison branch/subtree is made and algorithm works on next node. • This process will take place until the final leaf node of the tree and eventually prediction will be made by the algorithm. 3. Random Forest (RF) Classıfıer [22]: Random Forest Classifier works on the principle of ensemble learning. In ensemble learning, multiple classifiers work
Diagnosis of Parkinson’s Disease Using Machine Learning Algorithms
209
together to predict the result, and eventually performance of the model will be improved. In Random Forest Classifier, there are multiple decision trees that will work together, and maximum votes from all the decision trees will be the final output. Working of Random Forest classifiers has two phases. In the first phase, the machine learning algorithm will combine N decision trees, and in other results, all the trees in the first phase will be taken into account. First the algorithm will build DT for some randomly selected points from training data for N number of trees. Second, for new data, all decision trees will make their prediction and the final classified category will be one with maximum number of votes. 4. Extra Trees (ET) Classifier [23]: Extra Trees Classifier is also based on Decision Tree Classifier, and in concept, it is remarkably like that of Random Forest Classifier. In this algorithm, many decision trees (without pruning) will be trained using training data and final output will be the majority of predictions made by all decision trees individually. There are two differences between Extra Trees Classifier and Random Forest Classifier, and in Extra Trees Classifier, there is no bootstrapping of observations and the nodes are also Random splits.
3 Methodology The flow diagram representation for building a model for voice feature-based detection of Parkinson’s disease using machine learning algorithms is presented in the form of steps for model building which is given in Fig. 1. A. Data Preprocessing This step is a combination of two processes. One is outliers’ removal and the other is feature selection. Both steps are explained below: Outliers Removal [24]: An outlier is something that is different from the rest. Most machine learning algorithms are affected when some value of attribute is not in the same range. These outliers are mostly the result of errors while data collection may be measuremental or executional. These extreme values are far away from other observations. Machine learning models can be misled by outliers that cause various problems while training and eventually a less efficient machine learning algorithm model. There are a lot of diverse ways to remove outliers. Some outliers were observed in the Parkinson’s disease dataset, and an attempt to remove these outliers has been made. Outliers were removed considering the most key features for specific machine learning algorithms as a base. After performing this step, the number of instances in the dataset will decrease. Feature Selection [25]: In today’s data, the collected data is high dimensional and rich in information. It is quite common to go through a data frame with hundreds of attributes. Feature selection is a technique to select most prominent features from n given features. Feature selection is important because of some given reasons:
210
K. Thakur et al.
Fig. 1 Proposed methodology
• While training models, with an increase in the number of features, time taken also increases exponentially. • Risk of overfitting also increases if the number of features are more. • In this work, the feature selection technique used is SelectFromModel. The feature selection technique SelectFromModel select features is based on importance attribute threshold. This threshold is meant by default. Different machine learning models were then trained with these selected features, and the efficiency of the model has been improved than before. B. Model Selectıon, Training, and Testing The dataset will be divided into two parts: training dataset and testing dataset. The model will be trained by fitting a training dataset in the model, and later, it will be tested using a test dataset. In the testing process, the performance metrics are noted so that we can move to the next step that is model evaluation.
Diagnosis of Parkinson’s Disease Using Machine Learning Algorithms
211
C. Model Evaluation For finding the best model out of all our proposed models, model evaluation is necessary. Once all the models are trained and tested, the next step is evaluation to find out the best machine learning model which is best suitable for the given problem. To this problem, different performance metrics are used to evaluate the most efficient machine learning model, namely accuracy, precision, F1 score, recall, and AUC-ROC curve. Performance metrics will judge if the models are improving or not. To get the correct evaluation of a machine learning model, the metrics should be chosen very carefully. Confusion Matrix: The problem that we are solving is a classification-based problem. One of the most preferred ways to evaluate a classification-based machine learning model is confusion matrix. Confusion matrix summarizes the performance of machine learning models. Confusion matrix is the most intuitive performance metric because it evaluates the performance of a model with count value. This technique is used in both multiclass and binary classification. Confusion matrix is a two-dimensional table, and the table is divided into four parts. All the four parts of the confusion metrics are explained below: True Positive: It is the efficiency of a machine learning algorithm to classify positive instances as positive. In the true positive section, there comes the count of instances which are predicted as of Label 1, and they actually belong to Label 1 too. It is expressed as true-positive rate (TPR) and is often called sensitivity, which is the proportion of correctly predicted positive samples to actual positive samples. Sensitivity = TP/(TP + FN)
(1)
True Negative: It is the efficiency of a machine learning algorithm to classify negative instances as negative. In the true negative section, there comes the count of instances which are predicted as of Label 0, and they actually belong to Label 0 too. It is expressed as true-negative rate (TNR) and is often called specificity, which is the proportion of correctly predicted negative samples to actual negative samples. Specificity = TN/(TN + FP)
(2)
False Positive: This is a case when a model makes a false prediction of the classification problem. In this section, there comes a count of instances which are predicted as of Label 1, and they actually belong to Label 0. It is expressed as true-negative rate (TNR). False-positive rate (FPR) is the proportion of the negative cases that is predicted as positive to the actual negative cases. FPR = FP/(TN + FP)
(3)
212
K. Thakur et al.
False Negative: This is a case when a model makes a false prediction of the classification problem. In this section, there comes a count of instances which are predicted as of Label 0, and they actually belong to Label 1. It is expressed as false-negative rate (FNR). False-positive rate (FPR) is the proportion of the positive cases that is predicted as negative to the actual positive cases. FPR = FN/(TP + FN)
(4)
Accuracy: It will decide how accurately our ML model is working. Accuracy is all about correct predictions. Hence, it is the proportion of correct predictions to that of total predictions. Accuracy =
TP + TN TP + TN + FP + FN
(5)
Precision: Precision considers accuracy of positively predicted classes. Precision is the ratio of correctly predicted positive instances to total predicted positive instances. Precision = TP/(TP + FP)
(6)
Recall: Recall is another name for sensitivity of confusion matrix. Recall = TP/(TP + FN)
(7)
F1 -Score: It is another performance metric formed by harmonic mean of recall and precision. F1 - score = 2.(TP)/(2.(TP) + FP + FN)
(8)
AUC-ROC Curve: Area under curve (AUC)-receiver operating characteristics (ROC) curve is one important way to evaluate performance of a classification problem. It is one of the most important performance metrics when it comes to model evaluation. ROC is a probability curve and AUC measures separability. Better machine learning models have more AUC value of AUC than others. Its value lies between 0 and 1. ROC is a curve between true-positive rate (Y-axis) and false-positive rate (X-axis). Let us suppose AUC for a machine learning based model comes out to be 0.8 that means there are 80% chances that the model is able to distinguish label 1 and label 0 classes.
Diagnosis of Parkinson’s Disease Using Machine Learning Algorithms
213
4 Results In this section, the performance of our proposed models is analyzed. We have worked with four machine learning models and analyzed them in Table 1. These were SVM, Decision Tree Classifier, Random Forest Classifier, and Extra Trees Classifier. Accuracy, precision, recall, and F 1 -score were the performance metrics used. It was observed that Extra Trees Classifier achieved the highest accuracy of 93.39%. Other performance metrics of Extra Trees Classifier are also far better than others. It touches the recall value 1, precision 0.92, and F 1 -score 0.96. After applying SelectfromModel feature selection technique, the results improved. There was notable improvement in the performance metrics of Random Forest Classifier. Both Extra Trees Classifier and Random Forest Classifier are getting satisfactory accuracy. Accuracy of Extra Trees Classifier improved and reached 94.33%, while accuracy of Random Forest Classifier increased from 88.42 to 91.57%. Value of other performance metrics also improved after applying SelectFromModel feature selection techniques as shown in Table 2. Confusion matrix is known to be the easiest way to understand performance of a machine learning Model. Based on the following observations made, Extra Trees Classifier came out as the best model among all for Parkinson’s disease prediction. The confusion matrix for the same is given in Fig. 2a. In Fig. 2a, we can notice that in our trained model, there are a total of 106 patients, out of which 92 are Parkinson’s disease infected and 14 healthy people. Then, our trained model is predicting 99 as Parkinson’s positive and 7 as healthy subjects. The confusion matrix of Extra Trees Classifier after feature selection is given below (Fig. 2b). Improvement is clearly visible in the confusion matrix after feature selection. Without feature selection Table 1 Performance metrics (feature selection not applied) Model
Accuracy
Precision
Recall
F1-score
SVM (linear)
0.801886
0.877777
0.887640
0.882681
DT
0.822429
0.870588
0.902439
0.886227
XGBoost
0.915094
0.927835
0.978260
0.952380
Random Forest
0.884210
0.888888
0.972972
0.929032
Extra Trees Classifier
0.933962
0.929292
1
0.963350
Table 2 Performance metrics (SelectfromModel applied) Model
Accuracy
Precision
Recall
F 1 -Score
SVM (linear)
0.811320
0.879120
0.898876
0.888888
DT
0.803738
0.858823
0.890243
0.874251
XGBoost
0.915094
0.919192
0.989130
0.952879
Random Forest
0.915789
0.902439
1
0.94871
Extra Trees Classifier
0.943396
0.938775
1
0.968421
214
K. Thakur et al.
Fig. 2 a Confusion matrix (feature selection not applied), b confusion matrix (Model applied)
when there were seven cases that were not Parkinson’s disease positive put predicted as Parkinson’s affected. Now the number has reduced to six and eventually one more true negative case increases (now 98 cases are predicted positive (out of which 96 true positive) and 8 cases are predicted negative. The relative operating characteristic curves of Extra Trees Classifier and Random Forest Classifier after applying feature selection are depicted in Fig. 3a, b. Area under curve for Random Forest Classifier is 0.91, while for Extra Trees Classifier, it is 0.96. Both results are quite satisfactory, but if considered, Extra Trees Classifier is better than all other machine learning models in our work. The performance metrics of Extra Trees Classifier combined with SelectFromModel are performing excellently, and even if we compare this machine learning model to some already proposed models, it is performing significantly better. There is comparative analysis of our proposed model and some already existing models for Parkinson’s disease detection (Table 3). So, it can be observed that Extra Trees Classifier when combined with SelectFromModel feature selection technique is working very efficiently.
5 Conclusion and Future Scope Parkinson’s disease needs to be diagnosed at a preliminary stage to improve the life quality of the patient. Using machine learning algorithms and collected dataset for this purpose, many machine learning models have been proposed. Those models produce some important results. In this work, getting an efficient machine learning model to detect Parkinson’s disease was the final goal which has been achieved. In this work, some machine learning models were designed to predict Parkinson’s positive or negative person based on the voice features. These models were particularly SVM (kernel = ‘linear’), Decision Trees Classifier, Random Forest Classifier, and
Diagnosis of Parkinson’s Disease Using Machine Learning Algorithms
215
Fig. 3 a ROC curve (feature selection not applied), b) ROC curve (model applied)
Table 3 Comparative analysis of ML models for voice-based diagnosis of Parkinson’s disease
Model
Reference
Accuracy
Linear SVM
[16]
65.21
Linear SVM
[15]
85
Linear SVM
[17]
91.17
SVM (RBF)
[19]
91.4
SVM (RBF)
[18]
86
k-NN + AdaBoost.M1
[26]
91.28
Extra trees classifier
This work
94.3396
Extra Trees Classifier, and evaluating all these models based on their performance metrics Extra Trees Classifier when combined with SelectFromModel feature selection technique comes out to best with an accuracy of 94.339%. When compared to some earlier machine learning models that were designed for the same problem statement, the model designed in this work is better than most. Also, a Random Forest Classifier combined with SelectFromModel technique was performing efficiently with an accuracy of 91.57%. Here, some key points for the machine learning models based on Parkinson’s disease diagnosis problem statement are: • Extra Trees Classifier is a great model for voice feature-based diagnosis of Parkinson’s disease, and its efficiency increases too when combined with SelectFromModel feature selection technique. • It is strongly recommended to apply feature selection technique to machine learning because there can be large number of features, and it makes the training process complex. Though the results for our proposed machine learning model are quite satisfactory, there is always a scope for improvement. In future, the accuracy of the model can be increased by applying some other techniques like data balancing, cross-validation.
216
K. Thakur et al.
Looking at the performance metrics of the model that we proposed with this dataset is still efficient and can be relied upon for the problem statement.
References 1. Olanow CW, Stern MB, Sethi K (2009) The scientific and clinical basis for the treatment of Parkinson disease. Neurology 72(21 Supplement 4):S1–S136 2. Langston JW (2002) Parkinson’s disease: current and future challenges. Neurotoxicology 23(4– 5):443–450 3. Van Den Eeden SK, Tanner CM, Bernstein AL, Fross RD, Leimpeter A, Bloch DA, Nelson LM (2003) Incidence of Parkinson’s disease: variation by age, gender, and race/ethnicity. Am J Epidemiol 157(11):1015–1022 4. Huse DM, Schulman K, Orsini L, Castelli-Haley J, Kennedy S, Lenhart G (2005) Burden of illness in Parkinson’s disease. Mov Disord Official J Mov Disord Soc 20(11):1449–1454 5. Perez KS, Ramig LO, Smith ME, Dromey C (1996) The Parkinson larynx: tremor and videostroboscopic findings. J Voice 10(4):354–361 6. Cao L (2017) Data science: challenges and directions. Commun ACM 60(8):59–68 7. Dhar V (2013) Data science and prediction. Commun ACM 56(12):64–73 8. Subrahmanya SVG et al (2021) The role of data science in healthcare advancements: applications, benefits, and future prospects. Ir J Med Sci 1–11. https://doi.org/10.1007/S11845-02102730-Z/FIGURES/5 9. Jawhar Q, Thakur K, Singh KJ (2020) Recent advances in handling big data for wireless sensor networks. IEEE Potentials 39(6):22–27 10. Jawhar Q, Thakur K (2020) An improved algorithm for data gathering in large-scale wireless sensor networks. In: Proceedings of ICETIT 2019. Springer, Cham, pp 141–151 11. Sachdeva P, Singh KJ (2015) Automatic segmentation and area calculation of optic disc in ophthalmic images. In: 2015 2nd International conference on recent advances engineering and computational sciences (RAECS) 12. Sharma A, Kapoor DS, Nayyar A, Qureshi B, Singh KJ, Thakur K (2022) Exploration of IoT nodes communication using LoRaWAN in forest environment. CMC-Comput Mater Continua 71(3):6239–6256 13. Sharma A, Agrawal S (2012) Performance of error filters on shares in halftone visual cryptography via error diffusion. Int J Comput Appl 45:23–30 14. Singh K et al (2014) Image retrieval for medical imaging using combined feature fuzzy approach. In: 2014 International conference on devices, circuits communication ICDCCom 2014—proceedings. https://doi.org/10.1109/ICDCCOM.2014.7024725 15. Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, Kursun O (2013) Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health Inform 17(4):828–834 16. Bhattacharya I, Bhatia MPS (2010) SVM classification to distinguish Parkinson disease patients. In: Proceedings of the 1st amrita ACM-W celebration on women in computing in India, pp 1–6 17. Benba A, Jilbab A, Hammouch A, Sandabad S (2015) Voiceprints analysis using MFCC and SVM for detecting patients with Parkinson’s disease. In: 2015 International conference on electrical and information technologies (ICEIT). IEEE, pp 300–304 18. Sakar CO, Serbes G, Gunduz A, Tunc HC, Nizam H, Sakar BE, Apaydin H (2019) A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl Soft Comput 74:255–263 19. Little M, McSharry P, Hunter E, Spielman J, Ramig L (2008) Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. Nat Proc 1–1
Diagnosis of Parkinson’s Disease Using Machine Learning Algorithms
217
20. Zhang Y (2012) Support vector machine classification algorithm and its application. Commun Comput Inf Sci 308(CCIS, PART 2):179–186 21. Dai QY, Zhang CP, Wu H (2016) Research of decision tree classification algorithm in data mining. Int J Database Theory Appl 9(5):1–8 22. Ren Q, Cheng H, Han H (2017) Research on machine learning framework based on random forest algorithm. In: AIP conference proceedings (vol 1820, no 1, p 080020). AIP Publishing LLC 23. Ampomah EK, Qin Z, Nyame G (2020) Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement. Information 11(6):332 24. Gress TW, Denvir J, Shapiro JI (2018) Effect of removing outliers on statistical inference: implications to interpretation of experimental data in medical research. Marshall J Med 4(2) 25. Ramaswami M, Bhaskaran R (2009) A study on feature selection techniques in educational data mining. arXiv preprint arXiv:0912.3924 26. Mathur R, Pathak V, Bandil D (2019) Parkinson disease prediction using machine learning algorithm. In: Emerging trends in expert applications and security. Springer, Singapore, pp 357–363
Stock Market Prediction Techniques Using Artificial Intelligence: A Systematic Review Chandravesh Chaudhari and Geetanjali Purswani
Abstract This paper systematically reviews the literature related to stock price prediction systems. The reviewers collected 6222 research works from 12 databases. The reviewers reviewed the full-text of 10 studies in preliminary search and 70 studies selected based on PRISMA. This paper uses the PRISMA-based Python framework systematic-reviewpy to conduct this systematic review and browser-automationpy to automate downloading of full-texts. The programming code with comprehensive documentation, citation data, input variables, and reviews spreadsheets is provided, making this review replicable, open-source, and free from human errors in selecting studies. The reviewed literature is categorized based on type of prediction systems to demonstrate the evolution of techniques and research gaps. The reviewed literature is 7 % statistical, 9% machine learning, 23% deep learning, 20% hybrid, 25% combination of machine learning and deep learning, and 14% studies explore multiple categories of techniques. This review provides detailed information on prediction techniques, competitor techniques, performance metrics, input variables, data timing, and research gap to enable researchers to create prediction systems per technique category. The review showed that stock trading data is most used and collected from Yahoo! Finance. Studies showed that sentiment data improved stock prediction, and most papers used tweets from Twitter. Most of the reviewed studies showed significant improvements in predictions to previous systems. Keywords Expert systems · Hybrid learning · Stock market · Forecasting · Deep learning · Artificial intelligence · Machine learning
C. Chaudhari (B) · G. Purswani Department of Commerce, CHRIST (Deemed to be University), Bangalore, Karnataka 560029, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_17
219
220
C. Chaudhari and G. Purswani
1 Introduction The stock market dates back to 1531 in Belgium, where brokers and moneylenders would buy or sell bonds and promissory notes. The New York Stock Exchange (NYSE) was established after 19 years [1]. Equity share prediction depends upon numerous variables such as how a company is managed, the external environment, and demand for its product and services. Determining the company value is subjective, and people set different prices for the same share. The perception of a company also changes the investor’s decision regarding the valuation of any company. Academics and investors have tried to create many methods to predict the share price or value of the company correctly, but still, accurate prediction systems are not available. However, investment firms like Warren Buffett’s Berkshire Hathaway have predicted the market [2].
1.1 Preliminary Search The authors considered following literary work for generating the research questions and keywords to collect literature from databases. Somanathan and Rama [3] reviewed prediction techniques like multilayer perceptron (MLP), dynamic architecture for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) MLP models, artificial neural network (ANN), and exponential GARCH-MLP models. In Nti et al. [4] review, 66% of research papers were technical analyses, 23% were fundamental analyses, and the rest were combined analyses. The use of Bollinger bands for a portfolio with companies division into small, mid, and large caps showed k-nearest neighbour (KNN) accuracy (ACC) of 84.3% and Long Short Term Memory (LSTM) efficiency of 58.44 [5]. Kumar Chandar [6] proposed a wavelet-adaptive network-based fuzzy inference system using technical data (TD). Nti et al. [7] compared ensemble methods and techniques such as decision tree (DT), SVM, neural network, and ensemble techniques (Gradient boosting machine (GBM), XGBM, LightGBM, AdaBoost, and CatBoost). Thakkar and Chaudhari [8] surveyed the fusion of variables and reviewed DNN techniques such as convolutional neural network (CNN), recurrent neural network (RNN), gated recurrent unit (GRU), LSTM, echo state network, deep Q-network, deep belief network, and restricted Boltzmann machine. A hybrid model was introduced that fuses input variables from stock data, macroeconomic variables, Google trends index, forum discussion, web news, and tweets. Combining financial data with social media data improved the stock price prediction [4, 9, 10]. Researchers reviewed papers with input variables financial news, social media, forums, and blogs for stock market prediction [10, 11]. These reviews provides suggestions to use better-expanded lexicons, quality of news collected and improve performance using sentiment analysis with SVM, DT, ANN, CNN, LSTM, and GRU.
Stock Market Prediction Techniques …
221
2 Research Methodology This systematic review is created by incorporating reproducible and open system based on preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]. This paper uses Python framework systematic-reviewpy with PRISMA support and browser-automationpy to automate possible steps of systematic reviews. The programming code makes the review error-free and reproducible to generate an exact selection of works [13]. This systematic review contains all kinds of research documents available in databases to remove any risk of bias [14]. The search strategy accumulates articles containing at least one keyword from financial, ML, and common keyword groups. The counts of keywords in each group are taken, and the citations are filtered to 503 papers using systematic-reviewpy. The browser-automationpy automatically downloaded pdfs from databases [13]. These downloaded documents are also validated using the systematic-reviewpy Python framework for the correct full-text. Figure 1 shows precise details of all the actions taken for the systematic literature review.
Fig. 1 Systematic review flow diagram
222
C. Chaudhari and G. Purswani
The following research questions are the foundation of this systematic review. 1. What are the current techniques for predicting the stock market? This review shows all the techniques or methods of prediction found in reviewed research studies. 2. How are stock market prediction systems advancing? The prediction systems’ advancements can be seen in statistical to hybrid techniques. 3. What data (types of data and input variables) is needed for predicting stock price? This paper shows the type of data from fundamental to sentiment data and provided in variable spreadsheet [13]. 4. What is the impact of AI on stock market predictions? This work focuses on AI techniques and provides comprehensive details of AI models used for prediction.
3 Literature Review Figure 2 describes the categorization of prediction systems into statistical, ML, DL, hybrid techniques, and a mix of techniques and information provided per category. These categories help understand prediction improvement and additional complexity in the progressions of the techniques. The statistical systems category techniques are based on statistical or probability theories. It includes traditional methods such as discounted cash flow and the capital asset pricing model (CAPM). The ML category contains techniques that computationally change their parameters to fit the data and use the model for inference. The ML category includes techniques that are more complex than statistical techniques but more explainable than deep learning (DL) techniques. The hybrid systems category contains techniques created using more than one type of technique, data or both. Hybrid techniques utilize multiple models to understand the variety of data, such as sentimental data (SD) and TD. The data is classified into fundamental data (FD), TD, macroeconomic data (MD), and SD. FD contains financial data or ratios generated using financial statements such
Fig. 2 Categorization of prediction techniques
Stock Market Prediction Techniques … Table 1 Analysis of statistical techniques References Techniques utilized [15–18]
Type-1 fuzzy time series model, Japanese candlestick-based cloud model, Markov Chain Monte Carlo Bayesian inference approach with tree-augmented Naïve Bayes (MCMC-TAN) algorithm, and financial risk index evaluation model
223
Results analysis
Research gap
Type-1 fuzzy time series model showed higher average ACC rates. MCMC-TAN technique showed high classification performance. Apriori algorithm model performed 90.27% average detection ACC
Automatically producing the optimal order of the high-order fuzzy time series. Large-scale enterprise data for in-depth mining to improve indicators
as profitability, liquidity, and efficiency ratios. TD contains stock price information such as open, close, volume, and technical analysis ratios such as chart patterns and moving averages. It is to be noted that some authors call the trading data (open, close, and volume data) as FD. MD contains economic conditions outside of company control, such as the country’s growth rate, gross domestic product, and inflation rate. All the data related to sentiment analysis and natural language processing (NLP) models are considered SD. Examples include news, social media data and tweets.
3.1 Prediction Systems Using Statistical Techniques The financial risk index evaluation model used Cronbach’s coefficient method for reliability analysis, ratio statistic test method for validity analysis, and multivariate analysis of variance [15]. The MCMC-TAN technique uses technical indicators, greek financial newspapers, tweets, and NLP tools for hand-labelling features [16]. The use of HaHigh, HaLow, HaClose, and HaOpen prices showed significant improvement in HA Cloud FTS method [17]. The reviewed papers have used root mean square error (RMSE), redundancy and sensitivity, relative error, and average detection ACC as performance metrics. The input variables used for stock market prediction are closing index, Heikin-Ashi (HA) candlestick patterns, Yu WFTS, Cloud FTS, song FTS, debt solvency, growth ability, operating ability, cash flow ability, and profitability [15, 17, 18]. Table 1 throws light on the techniques utilized, result analysis along with the research gaps.
224
C. Chaudhari and G. Purswani
Table 2 Analysis of machine learning techniques References Techniques utilized Results analysis [19–23]
Artificial bee colony-adaptive neuro-fuzzy inference system (ABC-ANFIS)-SVM model, dynamic asset selection method, support vector regression (SVR) with linear, radial, and polynomial kernel, SVM with PCA combination and extreme gradient boosting (XGBoost)-multiple time scales-particle swarm optimization
The ABC-ANFISSVM model mean absolute error (MAE), RMSE, and mean absolute percentage error (MAPE) are less and U1 and U2 are highest. Random walk model outperformed SVR
Research gap Using FD or SD. Different asset grouping strategies. More stocks for longer time. Model parallelism, different feature reductions
3.2 Prediction Systems Using Machine Learning Techniques The ensemble strategy by Carta et al. [19] reached significant returns of 33% and is better than light gradient boosting, random forest (RF), SVR, and ARIMA. The stepwise multivariate logistic regression compared with SVM model showed that the average ACC of SVM is lower (63.4% compared to 68.1%) but better for specific industries (84.2–71.6%) [20]. Deng et al. [21] proposed XGBoost-multiple time scales-Particle swarm optimization performed better than XGBoost-M-genetic algorithm, XGBoost based on previous 5, 10, 20 periods. XGBoost-Ave, XGBoost-A11, SVM best prediction of Long and Hold and Short and Hold with annual accumulated return of 15.79% and an average hit ratio of 74.90%. ML techniques used some companies, indexes, and futures from any stock exchange based on country, capitalization, industry sector, and past performance. Table 2 gives an overview on the machine learning techniques in comparison with the analysis and gaps.
3.3 Prediction Systems Using Deep Learning Techniques DL prediction systems are popularly made of LSTM neural networks. Some authors explored different hyperparameters of deep LSTM neural network, 2–3 LSTM layers, 32–128 neurons, and input sequences of 11–22 time-steps [24]. Song and Lee [25] used a simple neural network structure. In contrast, Nayak and Misra [26] used MLP with one hidden layer fuzzified by Gaussian membership functions and chemical
Stock Market Prediction Techniques …
225
reaction optimization for the optimal weights and biases. Shahvaroughi Farahani and Razavi Hajiagha [27] used ANN trained with metaheuristic algorithms such as bat algorithm, genetic algorithms, and social spider optimization for feature selection. Nayak and Misra [28] used binary encoded artificial chemical reaction optimization technique with a traditional trigonometric functional link artificial neural network (FLANN). The predictive power of features tested using logistic regression (LR) and DNN with unsupervised feature extraction methods: autoencoder, PCA, and the restricted Boltzmann machine [29]. Jan [30] used a chi-squared automatic interaction detector (CHAID) to select essential variables and created prediction models with DNN and CNN. Classification of stocks with the choice to consider historical volatility empirically tested using constant threshold-based stock classification and ranking-based stock classification [31]. Bao et al. [32] proposed model outperformed compared models on MAPE, Theil U, and the highest R. Superior performance significance was tested according to the Deibold-Mariano test [26]. For fuzzification methods, the Gaussian membership function is determined to be superior. Bat algorithm and social spider optimization yield the highest R-squared, lowest loss functions, and social spider optimization error which have been lower than the others [27]. PCA increased the LSTM model’s training efficiency by 36.8%, and the proposed model showed superior performance of 93% ACC, 96% precision, and 96% recall [33]. Chalvatzis and Hristu-Varsakelis [24] approach achieved cumulative returns of 371, 185, 340, and 360% in NASDAQ, Dow Jones Industrial Average (DJIA), S&P 500, and Russel 2000 stock indices. Gu et al. Chong et al. [31] proposed method outperformed other models and showed 36% earning rate per year or 9.044% per 12 weeks. Empirical work does not support superiority of DNN as it underperforms a linear autoregressive model in the test set [29]. Many researchers used multiple indices from different countries to build models workable under different economic structures and conditions or on different indices from the same country to focus on predictability than generality. Some researchers selected a finite number of stocks, such as 344 listed and OTC sample companies, 3558 stocks from the Chinese stock market, and 49 stocks from China Securities 100 Index component stocks from the Shanghai and Shenzhen 300 Index component stocks [30, 33, 34]. The timings of the financial data vary from one year to a quarter of the century [31, 35]. Table 3 gives in-depth overview on the deep learning techniques along with their respective research gaps.
3.4 Prediction Systems Using ML and DL Techniques Three feature selection methods, namely tournament screening algorithm, sequential forward floating selection algorithm, and least absolute shrinkage and selection operator, were used to compare model predictions [37]. Stock sequence array convolutional LSTM (SACLSTM) uses LSTM neural network to integrate the data directly into a matrix and extract high-quality features using convolution [38]. Hu et al. [39]
226
C. Chaudhari and G. Purswani
Table 3 Analysis of deep learning techniques References Techniques utilized [24, 25, 27, 30–36]
LSTM with a combination of wavelet transform (WT) and stacked autoencoders (SAE), mean-variance model, Haar WT feature extension, recursive feature elimination, PCA, and DNN automatic trading system. Portfolio optimization models using deep multilayer perceptron (DMLP), LSTM, and CNN
Results analysis
Research gap
CHAID-CNN has the highest ACC (all above 89%), followed by CNN, CHAID-DNN, and DNN. Without transaction fees, DMLP+MSAD performed with high desired return and lowest predictive errors
Different hyper-parameters selection scheme. GPU-based and heterogeneous computing. Creation of metaheuristic algorithms. Comparative investigation using transformer, GRU, Bi-LSTM, and reward expressed using the Heston model or Black Sholes equation
created prediction models of type CNN, DNN, RNN, LSTM, reinforcement learning, and DL methods such as self-paced learning mechanisms, hybrid attention networks, multi-filters neural network and Wavenet. The different modifications in prediction period, epoch, and batch size have also been explored by [40, 41]. The LightGBM uses exclusive feature bundling with the minimum variance portfolio using the mean-variance model with conditional value at risk (CVaR) constraint [42]. The LSTM performed similarly to RNN, and both outperformed linear regression. Empirical study did not see any improvements in prediction by increased memory, additional output neurons or more input neurons for RNN or LSTM [43]. Risk assessment using CVaR is more consistent and sufficient to traditional portfolio theory [42]. The ANN-based model MAPE is smaller than SVR and HMM-based models. Hybrid models performed poorly during the transition time between various stock patterns where mean-reverting stock trends and momentum benefits models [44]. Empirical findings in Hasan et al. [45] suggested that the DNN model is superior for shorter periods and lower threshold values measured by average return per trade, compound return, and maximum drawdown. Neural network models performed similarly to gradient boosting tree models, and deep FNN outperformed shallow FNN [46]. The period of financial data ranges from 1 to 36 years [38, 41, 46]. Table 4 provides a combined analysis of ML and DL techniques.
Stock Market Prediction Techniques … Table 4 Analysis of ML and DL techniques References Techniques utilized [37–40, 43, 44, 46–50]
Dynamic advisor-based ensemble (DYNABE) ensembles with stacking linear regression with elastic net regularization, LR, SVM, rotation forest or XGBoost. Discrete WT and empirical mode decomposition algorithms with ANN and SVR. LSTM network with multiple portfolio optimization techniques, Monte Carlo simulation, equal-weighted modelling, and mean variant optimization. LightGBM. DBN, SAE, and MLP
227
Results analysis
Research gap
DYNABE model achieved a best-case 31.12% misclassification error and 359.55% annualized absolute return. CNN better performed than CNN-corr, CNNpred and LSTM. LightGBM performed better than ML models. DBN, SAE, and MLP have superior performance to traditional ML algorithms
Pre-stacking base model filter before stacking. Dynamic portfolio construction, diversification, and optimization with multiple flexible and tactical strategies, cost control, risk management, and transaction execution. Architecture using CNN and RNN. Different activation functions, training epochs, testing periods, and feature selection methods
3.5 Prediction Systems Using Hybrid Techniques The deep information echoing model (Echoing Reinforced Impact Unit and Echoing Information Aggregation) feeds the tweet-level and word-level attentions into a hierarchical tweet representation and echoing module to capture both the FD and SD reinforced impact. Zheng and He [51] proposed the combination of PCA and RNN trained with Levenberg-Marquardt, Bayesian regularization, and scaled conjugate gradient algorithms. Multi-model-based hybrid prediction algorithm (MM-HPA) is a combination of linear (ARIMA and exponential smoothing model) and nonlinear models (RNN), including a genetic algorithm [52]. Yu and Li [53] tested six profitable methods with zero transaction costs and achieved the best-annualized return of 278.46%, more than the market. Polamuri et al. [52] showed better prediction of the proposed hybrid model over linear and nonlinear models. The forecast time frames also matter while creating the prediction model. The researchers have also selected some companies from some industries based on the type of work companies do in the sector, like manufacturing or operations [51]. The authors have created models using TD for up to 20 years, SD such as political situations for up to 19 years, news for up to 4 years, and tweets for up to 2 years [54–56]. The MM-HPA model can be advanced with greedy heuristic methods and pareto optimization to reduce computation time [52]. Table 5 gives an
228
C. Chaudhari and G. Purswani
Table 5 Analysis of hybrid techniques References Techniques utilized [4, 51, 53–60]
CNN-oriented DL for textual data, SVM for numerical data with soft set theory. LSTM for TD and valence aware dictionary and sentiment reasoner for SD. 2 layers of CNN with 3 layer LSTM network with fully connected layer. Stacked LSTM network with CNN feature engineering and random search algorithm feature selection. BiLSTM network with temporal attention mechanism. Stanford sentiment analysis using tenfold CV, PCA, and spam tweets reduction
Results analysis
Research gap
Political situation and sentiment feature improves ACC by 20% and 3%, respectively. Trump’s tweets and attributes improve forecast ACC. Deep echoing model outperforms other baselines model. Prediction ACC rates using financial news and social media are 75.16 % and 80.53 %, respectively
Inclusion of political policy, industrial development, market sentiment, and natural factors variables. Multi-layer CNN for features extraction. Twitter data with stock-specific and domain-specific lexicon. Addition of stocks correlation analysis. More granular trading data with advanced trading strategies. Generative adversarial networks and autoencoders
analysis on the hybrid techniques, wherein the techniques are being discussed based on their research analysis and gaps identified.
3.6 Prediction Systems Using Mix of Techniques The prediction systems using various techniques contain statistical, ML, DL, and hybrid models. Haar wavelet transforms used to generate denoised time series stock data and SAE to fix the problem of imbalanced text and stock features. Eliasy and Przychodzen [61] compared CAPM and LSTM with dropout layers of 20% rate and 40 neurons. Rajab and Sharma [62] proposed an interpretable neuro-fuzzy system created using rule base reduction via selecting the best rules and constrained learning during model optimization which achieves a superior balance between ACC and interpretability. The rule bases are less confirming readability, intelligibility aspects and lesser training time than ANFIS. The gradient boosting algorithm outperforms AdaBoost classifier for daily and weekly intervals but the opposite case for sentiment analysis. The predictability of stock increases by 15% using sentiment analysis. Hybrid models can be used to mitigate the limitations of standalone models but are
Stock Market Prediction Techniques … Table 6 Analysis of mix of techniques References Techniques utilized [9, 61–66]
Prediction model by integrating SAE, wavelet transform, Doc2Vec and LSTM model. ANN, fuzzy logic, genetic algorithm, and hybrid approaches. Interpretable neuro-fuzzy system. Reinforcement learning beliefs, desires, and intentions (BDI) agent framework
229
Results analysis
Research gap
Gradient boosting machine yields top performance in economic and statistical evaluation criteria. 70–75% ACC by linear vector quantization. ANN best predicts stocks values, and SVM best predicts trend
Evolutionary genetic algorithms for stock selection. Social media text data from more platforms. Analysis of other indexes, industries, and markets. Optimization of network architecture and hyperparameters. Web crawler to gather daily articles and google trends
very complex. The data taken is generally from 6 to 28 years [61, 63]. The strained gradient-based technique combined with the Mamdani-type fuzzy system can be used in future research [62]. Analysis on combination of techniques is briefly structured in the Table 6.
4 Discussion This work provides how researchers combine basic models to produce ensemble and hybrid models to perform better or understand deeper data connections. Multiple variables influence stock prediction, and utilizing all the resources such as news at the time of prediction is also tricky. There is also a chance of erroneous data that can change the prediction. There is still a research gap for a hybrid model that incorporates all the financial data affecting the stock prices and consistently accurately to some extent. The top performance metrics used are ACC, MSE, MAE, RMSE, and MAPE. The most used techniques for stock prices are neural networks, especially LSTM, in combination with other techniques. For comparison of the proposed model, the most used techniques include RF and SVM. RNN and LSTM are the second most used techniques. LR, KNN, CNN, and ANN were also popular techniques for comparison. Yahoo! Finance is the most well-known source of financial information. The most used input variables for the stock price prediction use trading data such as low, high, open, close prices, and the volume of traded stock. Most articles used data from the S&P 500, NASDAQ, Microsoft, DJIA, and BSE. The reviewed data showed that the majority of authors are from China. The UK, India, and USA authors have contributed fairly to this topic.
230
C. Chaudhari and G. Purswani
The research gap also includes variations in base techniques, data, and better optimization algorithms. The top limitation is using more data representing stock prices in the selected market. The authors who did not use SD and MD suggest so. Others mention using DL techniques, feature selection and metaheuristic algorithms for training the models.
5 Conclusion Accurate prediction of stock price is still an unsolved problem. This systematic review used systematic-reviewpy and browser-automationpy Python framework to construct a replicable and open-source system which can also be used to extend or update this review. The programming code with comprehensive documentation, citation data, input variables, and reviews spreadsheets is also provided. This review showed the data and techniques used by the previous works to help new researchers build better intelligent systems or implement similar systems in different markets. This review analysed data and timings needed for stock price prediction. Categorizing prediction systems into statistical, ML, DL, and hybrid are helpful for researchers focusing on specific category techniques. The performance per category helps select adequate models for complex hybrid prediction systems. The supplementary review spreadsheets also contains a separate column for understanding which techniques are used to compare the proposed models by authors so readers can use those as benchmarks. The systematic review analyses can be improved by selecting more fulltext and reviewers. This review showed all the needed information for understanding the problems in accurately predicting stock prices and creating intelligent systems leveraging different techniques and data. Acknowledgements Chandravesh Chaudhari is a recipient of Indian Council of Social Science Research Doctoral Fellowship. His article is largely an outcome of his doctoral work sponsored by ICSSR. However, the responsibility for the facts stated, opinions expressed, and the conclusions drawn is entirely that of the author.
References 1. The birth of stock exchanges. https://www.investopedia.com/articles/07/stock-exchangehistory.asp. Last Accessed 03 Feb 2022 2. Capone A (2008) Warren Buffett and the interpretation of financial statements, the search for the company with a durable competitive advantage. J High Technol Law 1–4 3. Somanathan AR, Rama SK (2020) A bibliometric review of stock market prediction: perspective of emerging markets. Appl Comput Syst 25:77–86. https://doi.org/10.2478/acss-20200010 4. Nti IK, Adekoya AF, Weyori BA (2021) A novel multi-source information-fusion predictive framework based on deep neural networks for accuracy enhancement in stock market prediction. J Big Data 8. https://doi.org/10.1186/s40537-020-00400-y
Stock Market Prediction Techniques …
231
5. Singh A, Gupta P, Thakur N (2021) An empirical research and comprehensive analysis of stock market prediction using machine learning and deep learning techniques. IOP Conf Ser Mater Sci Eng 1022. https://doi.org/10.1088/1757-899X/1022/1/012098 6. Kumar Chandar S (2019) Fusion model of wavelet transform and adaptive neuro fuzzy inference system for stock market prediction. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/ s12652-019-01224-2 7. Nti IK, Adekoya AF, Weyori BA (2020) A comprehensive evaluation of ensemble learning for stock-market prediction. J Big Data 7. https://doi.org/10.1186/s40537-020-00299-5 8. Thakkar A, Chaudhari K (2021) Fusion in stock market prediction: a decade survey on the necessity, recent developments, and potential future directions. Inf Fusion 65:95–107. https:// doi.org/10.1016/j.inffus.2020.08.019 9. Ji X, Wang J, Yan Z (2021) A stock price prediction method based on deep learning technology. Int J Crowd Sci 5:55–72. https://doi.org/10.1108/ijcs-05-2020-0012 10. Obthong M, Tantisantiwong N, Jeamwatthanachai W, Wills G (2020) A survey on machine learning for stock price prediction: algorithms and techniques. FEMIB 2020 Proc 2nd Int Conf Financ Econ Manag IT Bus 63–71. https://doi.org/10.5220/0009340700630071 11. Alzazah FS, Cheng X (2020) Recent advances in stock market prediction using text mining: a survey. E-Business 13. https://doi.org/10.5772/intechopen.92253 12. Moher D, Liberati A, Tetzlaff J, Altman DG (2010) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Int J Surg 8:336–341. https://doi.org/10. 1016/j.ijsu.2010.02.007 13. Chaudhari C, Purswani G (2022) Supplementary data: a systematic review of artificial intelligence prediction techniques for stock market. https://doi.org/10.5281/ZENODO.6750696 14. Lefebvre C, Manheimer E, Glanville J (2008) Searching for studies. Cochrane Handb Syst Rev Interv Cochrane B Ser 95–150. https://doi.org/10.1002/9780470712184.ch6 15. Gao B (2021) The use of machine learning combined with data mining technology in financial risk prevention. Comput Econ Nan. https://doi.org/10.1007/s10614-021-10101-0 16. Maragoudakis M, Serpanos D (2016) Exploiting financial news and social media opinions for stock market analysis using MCMC Bayesian inference. Comput Econ 47:589–622. https:// doi.org/10.1007/s10614-015-9492-9 17. Hassen OA, Darwish SM, Abu NA, Abidin ZZ (2020) Application of cloud model in qualitative forecasting for stock market trends. https://doi.org/10.3390/e22090991 18. Chen SM, Chu HP, Sheu TW (2012) TAIEX forecasting using fuzzy time series and automatically generated weights of multiple factors. IEEE Trans Syst Man Cybern Part A Syst Hum 42:1485–1495. https://doi.org/10.1109/TSMCA.2012.2190399 19. Carta SM, Consoli S, Podda AS, Recupero DR, Stanciu MM (2021) Ensembling and dynamic asset selection for risk-controlled statistical arbitrage. IEEE Access 9:29942–29959. https:// doi.org/10.1109/ACCESS.2021.3059187 20. Baranes A, Palas R (2019) Earning movement prediction using machine learning-support vector machines (SVM). J Manag Inf Decis Sci 22:36–53 21. Deng S, Huang X, Wang J, Qin Z, Fu Z, Wang A, Yang T (2021) A decision support system for trading in apple futures market using predictions fusion. IEEE Access 9:1271–1285. https:// doi.org/10.1109/ACCESS.2020.3047138 22. Sedighi M, Jahangirnia H, Gharakhani M, Fard SF (2019) A novel hybrid model for stock price forecasting based on metaheuristics and support vector machine. https://doi.org/10.3390/ data4020075 23. Henrique BM, Sobreiro VA, Kimura H (2018) Stock price prediction using support vector regression on daily and up to the minute prices. J Financ Data Sci 4:183–201. https://doi.org/ 10.1016/j.jfds.2018.04.003 24. Chalvatzis C, Hristu-Varsakelis D (2019) High-performance stock index trading: making effective use of a deep LSTM neural network 25. Song Y, Lee J (2020) Importance of event binary features in stock price prediction. https://doi. org/10.3390/app10051597
232
C. Chaudhari and G. Purswani
26. Nayak SC, Misra BB (2019) A chemical-reaction-optimization-based neuro-fuzzy hybrid network for stock closing price prediction. Financ Innov 5:1–34. https://doi.org/10.1186/s40854019-0153-1 27. Shahvaroughi Farahani M, Razavi Hajiagha SH (2021) Forecasting stock price using integrated artificial neural network and metaheuristic algorithms compared to time series models. Soft Comput 25:8483–8513. https://doi.org/10.1007/s00500-021-05775-5 28. Nayak SC, Misra BB, Behera HS (2019) ACFLN: artificial chemical functional link network for prediction of stock market index. Evol Syst 10:567–592. https://doi.org/10.1007/s12530018-9221-4 29. Chong E, Han C, Park FC (2017) Deep learning networks for stock market analysis and prediction: methodology, data representations, and case studies. Expert Syst Appl 83:187–205. https://doi.org/10.1016/j.eswa.2017.04.030 30. Jan CL (2021) Financial information asymmetry: using deep learning algorithms to predict financial distress. Symmetry (Basel) 13:443. https://doi.org/10.3390/sym13030443 31. Gu Y, Shibukawa T, Kondo Y, Nagao S, Kamijo S (2020) Prediction of stock performance using deep neural networks. https://doi.org/10.3390/app10228142 32. Bao W, Yue J, Rao Y (2017) A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS One 12. https://doi.org/10.1371/journal.pone. 0180944 33. Shen J, Shafiq MO (2020) Short-term stock market price trend prediction using a comprehensive deep learning system. J Big Data 7. https://doi.org/10.1186/s40537-020-00333-6 34. Ma Y, Han R, Wang W (2020) Prediction-Based portfolio optimization models using deep neural networks. IEEE Access 8:115393–115405. https://doi.org/10.1109/ACCESS.2020. 3003819 35. Wang W, Li W, Zhang N, Liu K (2020) Portfolio formation with preselection using deep learning from long-term financial data. Expert Syst Appl 143:1. https://doi.org/10.1016/j.eswa.2019. 113042 36. Lee J, Koh H, Choe HJ (2021) Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning. Appl Intell 51:6202–6223. https://doi.org/10.1007/s10489-021-02218-4 37. Peng Y, Albuquerque PHM, Kimura H, Saavedra CAPB (2021) Feature selection and deep neural networks for stock price direction forecasting using technical analysis indicators. Mach Learn Appl 100060. https://doi.org/10.1016/j.mlwa.2021.100060 38. Wu JMT, Li Z, Herencsar N, Vo B, Lin JCW (2021) A graph-based CNN-LSTM stock price prediction algorithm with leading indicators. Multimed Syst Nan. https://doi.org/10.1007/s00530021-00758-w 39. Hu Z, Zhao Y, Khushi M (2021) A survey of forex and stock price prediction using deep learning. https://doi.org/10.3390/ASI4010009. http://arxiv.org/abs/2103.09750 40. Sharaf M, Hemdan EED, El-Sayed A, El-Bahnasawy NA (2021) StockPred: a framework for stock price prediction. Multimed Tools Appl 80:17923–17954. https://doi.org/10.1007/ s11042-021-10579-8 41. Mehtab S, Sen J (2020) A time series analysis-based stock price prediction using machine learning and deep learning models. https://doi.org/10.13140/RG.2.2.14022.22085/2 42. Chen Y, Liu K, Xie Y, Hu M (2020) Financial trading strategy system based on machine learning. Math Probl Eng 2020. https://doi.org/10.1155/2020/3589198 43. Serrano W (2021) The random neural network in price predictions. Neural Comput Appl Nan. https://doi.org/10.1007/s00521-021-05903-0 44. Shi C, Zhuang X (2019) A study concerning soft computing approaches for stock price forecasting. https://doi.org/10.3390/axioms8040116 45. Hasan A, Kalipsiz O, Akyoku¸s S (2020) Modeling traders’ behavior with deep learning and machine learning methods: evidence from BIST 100 index. Complexity 2020. https://doi.org/ 10.1155/2020/8285149 46. Ndikum P (2020) Machine learning algorithms for financial asset price forecasting. arXiv Prepr 1–16. arXiv2004.01504
Stock Market Prediction Techniques …
233
47. Dong Z (2019) Dynamic advisor-based ensemble (DYNABE): case study in stock trend prediction of critical metal companies. PLoS One 14. https://doi.org/10.1371/journal.pone.0212487 48. Ta VD, Liu CM, Tadesse DA (2020) Portfolio optimization-based stock prediction using longshort term memory network in quantitative trading. https://doi.org/10.3390/app10020437 49. Lv D, Yuan S, Li M, Xiang Y (2019) An empirical study of machine learning algorithms for stock daily trading strategy. Math Probl Eng 2019:30. https://doi.org/10.1155/2019/7816154 50. Nabipour M, Nayyeri P, Jabani H, Mosavi A, Salwana E, Shahab S (2020) Deep learning for stock market prediction. Entropy 22. https://doi.org/10.3390/E22080840 51. Zheng L, He H (2020) Share price prediction of aerospace relevant companies with recurrent neural networks based on PCA 52. Polamuri SR, Srinivas K, Mohan AK (2020) Multi model-based hybrid prediction algorithm (MM-HPA) for stock market prices prediction framework (SMPPF). Arab J Sci Eng 45:10493– 10509. https://doi.org/10.1007/s13369-020-04782-2 53. Yu X, Li D (2021) Important trading point prediction using a hybrid convolutional recurrent neural network. https://doi.org/10.3390/app11093984 54. Xu W, Pan Y, Chen W, Fu H (2019) Forecasting corporate failure in the Chinese energy sector: a novel integrated model of deep learning and support vector machine. Energies 12. https:// doi.org/10.3390/en12122251 55. Yuan K, Liu G, Wu J, Xiong H (2020) Dancing with trump in the stock market. ACM Trans Intell Syst Technol 11. https://doi.org/10.1145/3403578 56. Khan W, Malik U, Ghazanfar MA, Azam MA, Alyoubi KH, Alfakeeh AS (2020) Predicting stock market trends using machine learning algorithms via public sentiment and political situation analysis. Soft Comput 24:11019–11043. https://doi.org/10.1007/s00500-019-04347y 57. Karlemstrand R, Leckström E (2021) Using Twitter attribute information to predict stock prices 58. Hao Y, Gao Q (2020) Predicting the trend of stock market index using the hybrid neural network based on multiple time scale feature learning. https://doi.org/10.3390/app10113961 59. Khan W, Ghazanfar MA, Azam MA, Karami A, Alyoubi KH, Alfakeeh AS (2020) Stock market prediction using machine learning classifiers and social media, news. J Ambient Intell Humaniz Comput Nan. https://doi.org/10.1007/s12652-020-01839-w 60. Shi L, Teng Z, Wang L, Zhang Y, Binder A (2019) DeepClue: visual interpretation of text-based deep stock prediction. IEEE Trans Knowl Data Eng 31:1094–1108. https://doi.org/10.1109/ TKDE.2018.2854193 61. Eliasy A, Przychodzen J (2020) The role of AI in capital structure to enhance corporate funding strategies. Array 6:100017. https://doi.org/10.1016/j.array.2020.100017 62. Rajab S, Sharma V (2019) An interpretable neuro-fuzzy approach to stock price forecasting. Soft Comput 23:921–936. https://doi.org/10.1007/s00500-017-2800-7 63. Nevasalmi L (2020) Forecasting multinomial stock returns using machine learning methods. J Financ Data Sci 6:86–106. https://doi.org/10.1016/j.jfds.2020.09.001 64. Kumar G, Jain S, Singh UP (2021) Stock market forecasting using computational intelligence: a survey. Arch Comput Methods Eng 28:1069–1101. https://doi.org/10.1007/s11831020-09413-5 65. Ahmed M, Sriram A, Singh S (2020) Short term firm-specific stock forecasting with BDI framework. Comput Econ 55:745–778. https://doi.org/10.1007/s10614-019-09911-0 66. Strader TJ, Rozycki JJ, Root TH, Huang Y-H (2020) (John): machine learning stock market prediction studies: review and research directions. J Int Technol Inf Manag 28:63–83
Swarm Intelligence-Based Clustering and Routing Using AISFOA-NGWO for WSN M. Vasim Babu , M. Madhusudhan Reddy, C. N. S. Vinoth Kumar , R. Ramasamy, and B. Aishwarya
Abstract In recent years, energy conservation is an ambitious challenge, because IoT connects a limited number of resource devices. Clustering plays vital role to provide efficient energy saving mechanisms in WSN. Major issues in existing clustering algorithms are short network lifetime, unbalanced loads among sensor nodes in the network, and high end-to-end delays. This paper introduces an integration of novel artificial intelligence-based sailfish optimization algorithm (AISFOA) with Novel Gray Wolf Optimization (NGWO) technique. Initially, cluster is formed using AISFOA approach. Meanwhile, cluster head is elected after network deployment, and it can be changed dynamically based on network lifetime. Second, distance between sensor nodes is estimated by Euclidean distance to avoid data redundancy. Next, a NGWO algorithm is used to select a minimal path for routing. This research work incorporates merits of both clustering and routing techniques that lead to high energy ratio and prolonged network lifespan. Simulation is performed by using an NS2 simulator. The efficiency of the proposed SOA is analyzed with IABCOCT, EPSOCT, and HCCHE. Computer simulation outcome displays that the planned SOA enhances the energy efficiency and network lifetime, and also, it deduces node-to sink delay. Keywords WSN · Clustering · Cluster head · Routing
M. Vasim Babu (B) KPR Institute of Engineering and Technology, Arasur, Coimbatore, Tamil Nadu, India e-mail: [email protected] M. Madhusudhan Reddy K.S.R.M College of Engineering, Kadapa, Andhra Pradesh, India C. N. S. Vinoth Kumar · B. Aishwarya SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India R. Ramasamy Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Tamil Nadu, India B. Aishwarya SRM Institute of Science and Technology, Ramapuram Campus, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_18
235
236
M. Vasim Babu et al.
1 Introduction Wireless sensor network faces multiple challenges, which includes network lifetime, security, connectivity, energy efficiency, and synchronization [1]. Partitioning of the network is one of the major problems in WSN [2]. Initially, clustering is chosen as a high-grade result to solve this issue. During clustering, election of CH plays a major role to enhance network lifetime [3]. However, for a wide range of applications, multi-hop communication is needed because of the limited transmission range of sensor nodes. Most of the survey addresses the issues of clustering and routing in WSN separately. Very few protocols believe multi-hop routing [4]. Some of the protocol believes that certain percentage of nodes is considered as CH. In most of the approaches, CH elected randomly and can deploy close to each other [5]. By treating both clustering and routing approach as single is a better solution to this problem. This solution employs an efficient mechanism for information gathering and CH election [6]. It also minimizes the communication between CH with sensor nodes and base stations. Most of the existing protocols follow the following steps, which includes cluster identification, CH election, synchronization, and data transmission [7]. The data packets are stored in network queue of CH. If the data limit exceeds the buffer size of CH, then it starts dropping [8]. In WSN, the sensor nodes (SNs) are distributed in large area network to gather data of interest. It requires multi-hop communication to communicate the collected information to BS, because the SNs are short range and battery-powered devices. Here, clustering is one of the solutions; here, it is applicable for single-hop communication [9], in which every CH in network transmits information directly to the BS. However, for larger area network, routing with clustering is desired. CH failure in routing path leads to failure in data transmission [10]. In this proposed methodology, we treated a clustering and routing solution as unified problem based on cluster during CH election is considered. After identifying the clusters and CH, a routing mechanism is proposed for handling failover scenarios. Literature survey related to this proposed methodology is stated in Sect. 2. Complete work flow of this proposed methodology is clearly described in Sect. 3. Performance analysis is illustrated in Sect. 4, and this research work is ended in Sect. 5.
2 Related Work Mehta and Saxena presented a multi-objective dependency on clustering and SFO target-hunting routing approach. The main purpose of this methodology was to prolong power ratio in networks. However, this optimization strategy focused on multiple objective criteria to be optimized [11]. Bandi et al. proposed an individual accommodate differential activity scheme-enhanced ABC technique that depends on CH election strategy. It was used to prolong network lifetime with high QoS. However, this algorithm has premature convergence in the upcoming search period.
Swarm Intelligence-Based Clustering and Routing Using …
237
In addition, the optimal value accuracy does not meet the requirements [12, 13]. Kumar et al. introduced nature-inspired sailfish optimization algorithm, which was stimulated by using sailfish groups. It discovers the CH, but routing does not take place in this approach [14]. Satyajit and Pradip developed a ring routing protocol that depends on modified SFO and SCNN for dynamical networks. But exponentially, the shallow network holds a lower number of training parameters and also has some sample complexity [15]. Sathyamoorthy et al. designed a Q-learning approach to deploy sensor nodes in the network. This methodology increases throughput, network lifespan, and PDR, and also, it reduces delay. Major issue in this proposed approach was that it would take more time to reach the optimal value in normal circumstances [16]. Gupta et al. selected the CH based on the BS, cost, node degree, residual energy of nodes, distance, proximity, and node coverage using WMBA approach. In addition, optimal path selection was done by using the QOGA approach. But, in WMBA, time overhead is relatively higher than the other methodology [17]. Uma and Selvaraj proposed a multi-hop routing scheme. It enhanced network lifetime and reliability. However, it loses bandwidth over several hops [18]. Khalid et al. introduced a symmetrical energy alert clustering and routing protocol. Extraordinary performance of this methodology was shown in terms of network lifespan and power usage. However, numerical computations were carried out to elect optimal radius; therefore, power intake of distant CH was decreased [19]. Tanima and Indrajit developed an improved PSO gravitational hunting technique for clustering and routing in networks. The performance of this methodology was clearly shown in terms of remaining energy, communication system lifetime, and occurrence rate, and it outperforms the existing one. However, the tendency of GSA became trapped in local minima [20, 21]. Farsi et al. proposed a congestion alert clustering and routing rules to reduce crowding over the network. However, this rule does not have any GPS support to deduce the power for the intended region between the SNs in the network. And, it was very hard to establish setup and routing tables [22]. Akhilesh and Rajat designed a power-efficient hierarchical clustering and routing by FCM. This methodology achieved better energy efficiency and better coverage. However, in FCM, the Euclidean distance can unequally weight underlying factors [23]. Barzin et al. developed a multi-objective nature-inspired algorithm, which worked under shuffled frog-leaping technique and firefly approach. It acts as an application-specific clustering supported multi-hop routing rule for WSN. But, this SFLA approach had slow convergence, and it was easy to fall into local optima solution [24]. Anand and Sudhakar represented a generic algorithm-based clustering and PSO-based routing in the network. It was mainly used to prolong network lifetime. But, GA holds less data about the issues. Designing objective functions and getting the representation were quite difficult. In addition, it was computationally expensive, so that it created a time complexity problem [25]. Shyjith et al. proposed a hybrid optimization algorithm for CH election to address challenges in CH election. Here, the CH was elected using rider cat swarm optimization, which was the integration of the rider optimization algorithm with cat swarm optimization. However,
238
M. Vasim Babu et al.
CSO technique had slow tracking speed and poor tracking accuracy [26]. Mudathir et al. presented a light weight and efficient dynamic CH election routing protocol. However, this methodology does not hold any clear idea regarding the challenges of elected CH failure while playing the task. In addition, the mobility of the sensor node was not taken into account [27]. This review shows merits and demerits of some existing methodologies. This research work overcomes some of the demerits mentioned in this survey. Proposed AISFOA-GWO is clearly explained in upcoming section.
3 Proposed Approach 3.1 Network Model In WSN, N amount of SNs experiences deployment to examine atmosphere frequency. Sensor nodes in the network have the ability to function in a sensing state for observing physical variables. Or else, in a communication state for forwarding data between every sensor nodes in the network straight to BS, they gather information from cluster manager. Connectivity of each sensor node handles traffic, and also, it allocates an index of its position. After the deployment also, the sensor nodes and BS are stable, which is characteristic for sensor networking applications. Initial energy of every SN in the WSN is better, and also, the network is assumed to be identical. Each node in the network estimates the impermanent state of physical variables. Meanwhile, it transmits data for destination nodes in a periodic manner. Sensor node contains a group of transmitting energy levels and also has the capability to adapt power transmission. The connection between sensor nodes is the same as nature. In addition, the distance between the sensor nodes is estimated by using received signal energy. The perceived information is highly correlated. Therefore, CH collects information classified from clusters to determine length aggregation. In this proposed methodology, CH is elected by using the AISFOA algorithm that is clearly explained in the section below.
3.2 AISFOA-Based Clustering Process AI-based SFOA mimics the sailfish group hunting which effectively changes the attack on schooling sardine prey. It is also used to select the optimal path to sink nodes for packet transmission. Steps involved in AISFOA approach are initialization, elitism, attack alteration mechanism, hunting, and catching prey. Basic workflow of this proposed algorithm is described as follows: • To drive school of smaller fishes. • To encircle the smaller fishes.
Swarm Intelligence-Based Clustering and Routing Using …
• • • •
239
The maneuverability of small fishes. To injure the small fishes. To hunt the small fishes. To change their body color. Based on this strategy, the sensor nodes in the network are clustered.
3.2.1
Initialization
In general, SFOA is a population-based metaheuristic approach. Here, sailfish is considered as campaigner substances, and the issues are assumed as viewpoints of sailfish in the hunting region. Based on these criteria, the population among solution spaces is generated in a random manner. Normally, sailfish can hunt in hyperdimensional areas with changeable view point vectors. Another incorporator in this algorithm is the school of sardines. In addition, let as presume that the school of sardines set is swim in the hunting region. The sailfish and the sardines are mainly used to discover the solutions. Here, the sailfish is scattered in the search space, and the sardines selects better solutions in the search space. Meanwhile, sardine can be eaten by sailfish when finding the search space. In addition, sailfish modifies the location to find a better solution.
3.2.2
Elitism
Sometimes, while updating the position, better results can be misplaced, because the updated location is more debilitated than the existing locations unless the elitist election is stated. The main motivation of this elitism is to copy the modified fitness result into upcoming contemporaries. In AISFOA approach, better location of sailfish is rescued for every process that is reasoned as elite. This elitism denotes fittest sailfish, and also, it affects the mobility and alteration of sardines at the time of attack. Moreover, while sailfish hunting, the sardines will be livid by dynamic movement with the sailfish podium. So, the location of injured sardines is also saved to select a better destination for cooperative hunting via sailfish. Highest fitness of iteration is location of elite sailfish and also the blistered sardine.
3.2.3
Attack Alteration Mechanism
Most of the sailfish attacks the prey school because no compatriots are onrush. And, the sailfish promotes the success rate in attacking with impermanent unified attack. In fact, the animal group behavior of sailfish corrects its own location based on the position of other attackers among prey schools. In the prey school, there is no direct coordination between them. The AISFOA approach clearly shows that the sailfish
240
M. Vasim Babu et al.
attacks alteration strategy while hunting in groups. Two ways of position update of sailfish are done around the prey school. First, sailfish can hold secondary onslaught to prey school regarding elite sailfish and livid sardines. Second, sailfish occupies void space close to the prey school. It leads to a high seizure success rate at upcoming stages of collaborative hunting.
3.2.4
Hunting
In the starting stage of grouping hunting, an entire slaughter of sardines is determined to be uncommon. In many cases, the sardines’ bills deed the sardines’ bodies, and then, automatically, the sardines scale will be eliminated. Therefore, a huge number of sardines have wound on their physical structure. Designer recovered the affirmative correlativity between captured success rate and the amount of wounds in prey school. In the starting stage of hunting, the sailfish has high power to get prey. In addition, the sardines are non-wounded and tired. Due to this reason, the sardines hold high escape speed and also have better tactic power. The power of sailfish attack is reduced at hunting time.
3.2.5
Catching Prey
Due to frequent and intense attack, the power stored in the prey is deduced, and also, the capability to detect location directional data of sailfish is also reduced. In the meanwhile, sardines were picked by sailfish, and they broke off from the shoal for rapid capturing. Finally, the wounded sardine should break off from the shoal, which can capture rapidly. In this proposed methodology, let us assume that the communicable prey appears when sardines are healthier than the related sailfish. In this stage, the location of sailfish substitutes with the current location of afraid sardine. It is mainly used to enhance the chance of hunting new prey. It effectively avoids local optima and has high convergence speed. The SNs in the network are clustered by this proposed methodology. In this stage, the distance between every SNs in WSN is estimated using the Euclidean distance formula.
3.3 Euclidean Distance Mathematically, distance between two data points in the space denotes the length of line segment between two points. Let us assume that the two points in the 2D plane are (a1 , a2 ) and (b1 , b2 ). Here, the Euclidean distance is estimated by using the following equation:
Swarm Intelligence-Based Clustering and Routing Using …
241
Distance(d) =
(a2 − a1 )2 + (b2 − b1 )2
(1)
a1 and a2 denote the coordinates of one sensor node, and b1 and b2 are the coordinates of other SNs in WSN, where d represents distance between two SNs in WSN. By using this estimation process, the distance between every sensor nodes in the network is estimated. It is mainly used to remove the data redundancy generated by adjacent sensor nodes. Cluster head (CH) is elected after the network deployment, and it can be changed dynamically based on network lifetime. After the election of CH is done, routing takes place, which is briefly explained in the upcoming section.
3.4 NGWO Approach Stages in NGWO routing are wolves’ initialization, determination of fitness value, and update the speed and location of wolves. The complete work flow of this NGWO approach is clearly stated below.
3.4.1
Wolves’ Initialization
Each result denotes the coordinating single entrance to the BS. The size of the result is similar to the total amount of gateways (G). The solution provides a route to the BS. Each gateway in the network originates with an arbitrary number. (El.m ) = Rnd(0, 1) where1 ≤ m ≤ Na , 1 ≤ l ≤ G
(2)
The above-mentioned equation is considered as the first equation. Element l is considered as a gateway number for specified solutions. Simultaneously, it will be mapping the gateway lz; thereafter, it effectively finds the route near the BS from the l l . It specifies transmission information of lm and l n . The mapping of searching path is estimated by using the following equation: l z = idx (setNxtL(lm ), n)
(3)
lz indicates the indexing function, which effectively carries the nth gateway from SetNxtL.
3.4.2
Fitness Function
It estimates the worthiness of the specified solution in terms of variables associated with it. It is mainly used to inform the result of alpha, delta, and alpha on each round.
242
M. Vasim Babu et al.
In this proposed methodology, this fitness function shows an efficient routing path from gateway into the BS. The spacing traversed with gateways is represented as follows: Dis =
h
dist(ld , NxtL(ld ))
(4)
k=1
The entire amount of gateway in the network is evaluated as follows: M=
h
N xt L Count(ld )
(5)
D=1
The routing is authorized based on the smallest distance transmission and a small number of hops for communication. The result with the highest fitness measure is the optimum result in population. The fitness mathematical relation is estimated as follows: Routing fitness =
Z1 m 1 ∗ Dis + m 2 ∗ M
(6)
where m1 and m2 belong to (0, 1), so that m1 + m2 = 1, and also, the Z 1 indicates the probability stable. This routing fitness balances the entire distance and total number of hops in the network.
3.4.3
Update Wolves’ Position
To reach prey, each wolf must know its own place that depends on the location of delta, alpha, and beta. In this research methodology, alpha wolves have broad resolve in the result area, beta wolves have optimum solution from past process, and the delta wolves have optimum result from instant process. In order to know the position of omega wolves, we need to know the standardized position of alpha, beta, and delta wolves effectively. Additionally, the position of wolves needs to update randomly to provide a better solution for optimization issues. All the results are re-evaluated by the fitness function after relinquishing the brand-new position. Here, using the optimal path selection strategy, the data packets are transmitted between source and destination nodes in WSN effectively.
4 Result and Discussions The planned AISFOA-NGWO approach has been improving the energy efficiency and network lifetime, and it deduces packet delay and total packets dropped loss.
Swarm Intelligence-Based Clustering and Routing Using …
243
Performance efficiency and precision premise are verified by NetSim simulation. In addition, the comparison is made with existing low-energy adaptive clustering hierarchy (LEACH) protocol and fuzzy-based cluster formation protocol (FBCFP) protocol. Upcoming section introduces computer simulation setup and investigation results.
4.1 Simulation Setup In this setup, 100 * 100 m numbers of SNs are distributed in particular regions. It uses a NetSim simulator. In this proposed methodology, the sensor nodes are separated into five clusters, where each cluster has 20 SNs. Each cluster is attached with a sink node, so that the amount of sink node is 5. Here, the entire sink node transfers the entire information to BS. This sink node is located in the central part of the cluster, which effectively gathers the information of SNs in the cluster. By exploitation of the single-hop model method, all the nodes in the cluster are connected with sink nodes. Moreover, BS is placed in the central part of every WSN, and it collects all the required information from the sink nodes. Each SN is placed in particular distance such as 20 * 20 m, because size of the network is 100 * 100 m. still, better architecture design makes the connectivity easier and limber. Here, experimental results are shown in the upcoming section.
4.2 Result Analysis This proposed AISFOA-NGWO approach is estimated and equivalent with existing LEACH and FBCFP protocols regarding energy consumption, network lifetime, mean package delay, and loss ratio. It effectively shows the simpleness and elation of this methodology, which is in comparison with most recent surveys.
4.2.1
Energy Consumption
It is an estimation of power used up for every SNs in the network at the time of packet transmission to sink nodes. It is evaluated by using the following equation: EC =
Tec Trp
(7)
where E c represents energy consumption, T ec denotes total number of energy consumed by sensor nodes, and T rp indicates total amount of received packets.
244
M. Vasim Babu et al.
Fig. 1 Comparative analysis of energy consumption
This proposed AISFOA-NGWO approach reduces the total power used up of SNs approximately by 40% than LEACH protocol and 10% than FBCFP protocol. Therefore, Fig. 1 clearly shows that the energy consumption of this proposed methodology is lower than existing approaches.
4.2.2
Network Lifetime
It is lifetime that the starting SN in WSN runs out of power to transmit data packets. It is estimated by using the below equation: NL =
E0 Pr
(8)
where N L indicates network lifetime, E 0 represents initial energy of the battery, and Pr is power. Figure 2 illustrates the comparative result of network lifetime. Here, initial energy of sensor nodes is taken into account. It clearly shows that the proposed AISFOANGWO approach has a longer lifetime than the LEACH and FBCFP protocols.
Swarm Intelligence-Based Clustering and Routing Using …
245
Fig. 2 Comparative analysis of network lifetime
4.2.3
Mean Package Delay
Mean package delay (M dly ) is the summation of queuing delay (Qdly ), amount of transmission time (T tt ), and routing delay (Rdly ), which is estimated as follows: Mdly = Q dly + Ttt + Rdly
(9)
Figure 3 shows the evaluation result of LEACH, FBCFP, and also the proposed AISFOA-NGWO approach. According to this comparative analysis, the proposed AISFOA-NGWO approach has a lower mean packet delay than the existing LEACH and FBCFP protocols.
4.2.4
Loss Ratio
Loss ratio is also known as total packet dropped loss. It is the difference between the sum of sent packets (TPS) to the sum of received packets (TPR). TPD = TPS−TPR
(10)
Figure 4 displays the obtained result of LEACH, FBCFP, and also the proposed AISFOA-NGWO approach. This analysis shows that the proposed AISFOA-NGWO approach has a lower packet drop ratio than the existing LEACH and FBCFP protocols.
246
M. Vasim Babu et al.
Fig. 3 Comparative analysis of mean packet delay
Fig. 4 Comparative analysis of packet loss ratio
5 Conclusion This research work presented an AISFOA-NGWO approach for WSN. This proposed work involves cardinal main stages which are AISFOA-based clustering and NGWObased routing. AISFOA is used to form the cluster. In the meanwhile, CH is also elected in the network deployment stage, and it can be changed dynamically based on network lifetime. In addition, Euclidean distance is utilized to estimate the region
Swarm Intelligence-Based Clustering and Routing Using …
247
between SNs to avoid redundancy. After that, routing is carried out by the NGWO approach to choose the optimal path in the network. This proposed methodology carries merits of both clustering and routing, which achieves high power ratio and prolonged network lifetime. The experimental result of proposed AISFOA-NGWO approach is made up, and the simulation report is proved with the assistance of evaluated parameters such as energy consumption, network lifetime, mean package delay, and loss ratio. The simulation output shows that the developed AISFOA-NGWO approach offered high energy efficiency and prolonged network lifetime, and also it deduces mean package delay and loss ratio in the network.
References 1. Huan X, Kim KS, Lee S, Lim EG, Marshall A (2021) Improving multi-hop time synchronization performance in wireless sensor networks based on packet-relaying gateways with per-hop delay compensation. IEEE Trans Commun 2. Loganathan S, Arumugam J, Chinnababu V (2021) An energy-efficient clustering algorithm with self-diagnosis data fault detection and prediction for wireless sensor networks. Concurrency Comput Pract Experience e6288 3. Famila S, Jawahar A, Sariga A, Shankar K (2020) Improved artificial bee colony optimization based clustering algorithm for SMART sensor environments. Peer-to-Peer Netw Appl 13(4):1071–1079 4. Singh A, Nagaraju A (2020) Low latency and energy efficient routing-aware network codingbased data transmission in multi-hop and multi-sink WSN. Ad Hoc Netw 107:102182 5. Iwendi C, Maddikunta PKR, Gadekallu TR, Lakshmanna K, Bashir AK, Piran MJ (2020) A metaheuristic optimization approach for energy efficiency in the IoT networks. Softw Pract Experience 6. Elhoseny M, Rajan RS, Hammoudeh M, Shankar K, Aldabbas O (2020) Swarm intelligence– based energy efficient clustering with multihop routing protocol for sustainable wireless sensor networks. Int J Distrib Sens Netw 16(9):1550147720949133 7. Amutha J, Sharma S, Sharma SK (2021) Strategies based on various aspects of clustering in wireless sensor networks using classical, optimization and machine learning techniques: review, taxonomy, research findings, challenges and future directions. Comput Sci Rev 40:100376 8. Yadav SL, Ujjwal RL (2020) Sensor data fusion and clustering: a congestion detection and avoidance approach in wireless sensor networks. J Inf Optim Sci 41(7):1673–1688 9. Barik PK, Singhal C, Datta R (2021) An efficient data transmission scheme through 5G D2Denabled relays in wireless sensor networks. Comput Commun 168:102–113 10. Ramluckun N, Bassoo V (2020) Energy-efficient chain-cluster based intelligent routing technique for wireless sensor networks. Appl Comput Inf 11. Mehta D, Saxena S (2020) MCH-EOR: multi-objective cluster head based energy-aware optimized routing algorithm in wireless sensor networks. Sustain Comput Inf Syst 28:100406 12. Bandi R, Ananthula VR, Janakiraman S (2021) Self adapting differential search strategies improved artificial bee colony algorithm-based cluster head selection scheme for WSNs. Wirel Pers Commun 1–22 13. Babu MV et al (2021) An improved IDAF-FIT clustering based ASLPP-RR routing with secure data aggregation in wireless sensor network. Mobile Netw Appl 26(3):1059–1067 14. Kumar, BS, Santhi SG, Narayana S (2021) Sailfish optimizer algorithm (SFO) for optimized clustering in wireless sensor network (WSN). J Eng Des Technol
248
M. Vasim Babu et al.
15. Pattnaik S, Sahu PK (2021) Optimal shortest path selection by MSFO-SCNN for dynamic ring routing protocol in WSN. In: 2021 2nd International conference for emerging technology (INCET). IEEE, pp 1–6 16. Sathyamoorthy M, Kuppusamy S, Dhanaraj RK, Ravi V (2021) Improved K-Means based Q learning algorithm for optimal clustering and node balancing in WSN. Wirel Pers Commun 1–22 17. Gupta SC (2021) Energy-Aware Ch selection and optimized routing algorithm in wireless sensor networks using Wmba and Qoga. Turkish J Comput Math Educ (TURCOMAT) 12(10):6279–6293 18. Durairaj UM, Selvaraj S (2020) Two-level clustering and routing algorithms to prolong the lifetime of wind farm-based WSN. IEEE Sens J 21(1):857–867 19. Darabkh KA, El-Yabroudi MZ, El-Mousa AH (2019) BPA-CRP: a balanced power-aware clustering and routing protocol for wireless sensor networks. Ad Hoc Netw 82:155–171 20. Bhowmik T, Banerjee I (2021) An improved PSOGSA for clustering and routing in WSNs. Wireless Pers Commun 117(2):431–459 21. Vasim Babu M, Vinoth Kumar CNS, Baranidharan B, Madhusudhan Reddy M, Ramasamy R (2022) Energy-Efficient ACO-DA routing protocol based on IoEABC-PSO clustering in WSN”. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 114. Springer, Singapore. https://doi.org/10.1007/978-981-16-9416-5_11 22. Farsi M, Badawy M, Moustafa M, Ali HA, Abdulazeem Y (2019) A congestion-aware clustering and routing (CCR) protocol for mitigating congestion in WSN. IEEE Access 7:105402–105419 23. Panchal A, Singh RK (2021) EHCR-FCM: energy efficient hierarchical clustering and routing using fuzzy C-means for wireless sensor networks. Telecommun Syst 76(2):251–263 24. Barzin A, Sadegheih A, Zare HK, Honarvar M (2020) A hybrid swarm intelligence algorithm for clustering-based routing in wireless sensor networks. J Circ Syst Comput 29(10):2050163 25. Anand V, Pandey S (2020) New approach of GA–PSO-based clustering and routing in wireless sensor networks. Int J Commun Syst 33(16):e4571 26. Shyjith MB, Maheswaran CP, Reshma VK (2021) Optimized and dynamic selection of cluster head using energy efficient routing protocol in WSN. Wireless Pers Commun 116(1):577–599 27. Yagoub MFS, Khalifa OO, Abdelmaboud A, Korotaev V, Kozlov SA, Rodrigues JJPC (2021) Lightweight and efficient dynamic cluster head election routing protocol for wireless sensor networks. Sensors 21(15):5206
Sentiment Analysis Using an Improved LSTM Deep Learning Model Dhaval Bhoi, Amit Thakkar, and Ritesh Patel
Abstract Sentiment analysis is one of the subfields of data mining that helps to detect the sentiment of a person. Analyzing and examining textual data quantifies the global society’s feelings or attitudes toward their belief and perception. Along with product review and healthcare domain, sentiment analysis is also widely applied in the hotel industry review domain where online feedback data obtained from stakeholders via social media are extracted, processed, analyzed, and evaluated. Sentiment analysis provides many benefits like utilizing sentiment classification insight information to achieve improved quality and services. Sentiment analysis is performed in our work using machine learning and improved deep learning methods. Deep learning approaches are found to produce an effective solution to various text mining problems such as document clustering, classification, web mining, summarization, and sentiment analysis compared to machine learning approaches. Experiments are performed on the 515 K hotel review dataset. Classification results were compared between deep learning approaches and machine learning-based approaches. Finally, we discuss our proposed improved deep learning approach in this paper. Deep learning is a machine learning and artificial intelligence concept aimed at making robots as proficient at making judgments as humans are. We were able to get superior results using the LSTM model compared to traditional machine learning models and the sequential RNN model, and the empirical results were further strengthened using bidirectional LSTM (BiLSTM) and improved LSTM (ILSTM) and gained a performance improvement 2.73% and 5.67%, respectively. D. Bhoi (B) · R. Patel U & P U. Patel Department of Computer Engineering, Faculty of Technology and Engineering (FTE), Chandubhai S. Patel Institute of Technology (CSPIT), Charotar University of Science and Technology (CHARUSAT), Anand, Gujarat, India e-mail: [email protected] R. Patel e-mail: [email protected] A. Thakkar Department of Computer Science and Engineering, Faculty of Technology and Engineering (FTE), Chandubhai S. Patel Institute of Technology (CSPIT), Charotar University of Science and Technology (CHARUSAT), Anand, Gujarat, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_19
249
250
D. Bhoi et al.
Keywords Sentiment analysis · Review data · Natural language processing · Machine learning deep learning
1 Introduction One of the most important and challenging tasks in natural language processing (NLP) is performing sentiment analysis, often known as opinion mining, which involves extracting thoughts, attitudes, judgments, or opinions on a specified issue [1]. Blogs, forums, and online social networks have sprung up as a result of Web 2.0, allowing users to converse any issue and share their beliefs on it [2]. Sentiment analysis [3] is the process of determining a consumer’s attitude toward a subject or entity by detecting and classifying consumers’ feelings from textual data [structured data or unstructured data] into different sentiments—for example, positive, negative, or neutral—or feelings such as joyful, miserable, annoyed, or disgusted [4]. Sentiment analysis and opinion mining systems are gaining popularity in the natural language processing sector as opinion-based or feedback-based applications become more popular. The development of the Internet has transformed people’s lifestyles; they are now more outspoken about their ideas and opinions, and this trend has made it easier for researchers to obtain user-generated content [5]. Sentiment analysis is useful in a variety of disciplines, including the hotel industry, where input from stakeholders is required to evaluate the effectiveness of the services provided. Review classification based on emotion exhibited is crucial for any organization’s services and facilities to improve. The hotel industry generates a wide range of data. The data to be evaluated includes hotel staff, services, and other unstructured data, such as visitor comments expressed on social media via web blogs, discussion forums, and social data networking sites. For target users such as reviewers, hotel workers, and the hotel management organization as a whole, a sentiment study can be used to examine the exponential volume of data. Application of sentiment analysis techniques at the document, sentence, or phase level enables efficient business applications [6], helps to make better decisions, and facilitates early prediction of trends. According to a study, approximately 95% of tourists look at virtual Internet reviews of the hotel before making a reservation choice, and more than a third believe that the opinions stated in reviews are the most essential factor to consider when booking a hotel online [7]. Online reviews of customers, especially with critical comments, may impact its reputation significantly. A sentiment analysis system not only assists the organization’s employees and administration, but it also influences visitors’ judgments.
Sentiment Analysis Using an Improved LSTM Deep Learning Model
251
1.1 Sentiment Analysis Challenges To do sentiment analysis, machine learning methods [such as the Naive Bayes classification algorithm and support vector machine], artificial neural network-based approaches, or deep learning-based methodologies can be employed [8]. Sentiment analysis can be done at many degrees of granularity such as phrase-level sentiment analysis or feature-level sentiment analysis [9]. In this research paper, we have implemented sentence-level sentiment.
1.2 Data Preprocessing in Sentiment Analysis Challenges Many words have no effect on their sentiment orientation when textural data is evaluated. Question words such as how, what, why, and when, for example, do not add to the text’s polarity, therefore removing them will lower the problem’s dimensionality because each word in the textual data is viewed as one dimension [10]. Preprocessing is considered one of the major tasks while working with structured data or unstructured data. Following operations are required in order to perform data preprocessing on sentiment data. Tokenization: It is split or chopped textual material or a specified character sequence into words, fragments, or tokens. Lowercasing: To make it easier to match terms in stakeholders’ remarks or comments to words in emotion, all characters in the textual data are transformed to lowercase. Normalization: Shortened material is normalized by mapping it to commonly used Internet slang phrases using a lexicon dictionary. For example, the words like “gud” and “awsm” are mapped to “good” and “awesome,” respectively. Stemming: Stemming aids in the reduction of inflected words to their root or base word. “Moving,” “moved,” and “movement,” for example, can all be transformed to “move.” Lemmatization: It is a stemming-related strategy for grouping content to optimize system reaction speed and usefulness, punctuation marks and stop words that are unrelated and improper for sentiment analysis are deleted. Transliteration: When people communicate, react, or comment in mixed language, the content is transformed using the Google Transliterate application programming together a world’s various inflected forms. The steps of data preprocessing are depicted in Fig. 1. The remainder of the research article is organized as follows: The background and correlated work using machine learning and deep learning methodologies are described in Sect. 2. Section 3 discusses sentiment classification using machine learning approaches and how the results can be improved utilizing a deep artificial neural network-based approach. In Sect. 4, we have further improved results using improved long-short term (ILSTM).
252
D. Bhoi et al.
Fig. 1 Steps of data preprocessing
2 Background and Related Work 2.1 Literature Review User-generated content has emerged as the primary source of knowledge on the Internet. The primary mode of communication between the people providing service and customers, as well as among service receivers, has been recognized as sentiment, online reviews, or comments [7]. Recent research on sentiment analysis has been covered in the works of SVM [11–13]. SVM, Naive Bayes, KNN, and maximum entropy algorithms can be used to do sentence-level categorization. The authors of this paper have utilized the Internet movie database (IMDb) polarity dataset and the hotels.com dataset. They have proposed contextual lexicon-conceptquality (CLCQ) and contextual lexicon-quality (CLQ) models. Their proposed methodology outperformed the competition. These findings support the proposed models’ ability to accurately identify word of mouth (WOM) quality [14]. Taskoriented, granularity-oriented, and methodology-oriented techniques are all used to approach sentiment analysis [15]. In the research paper [16], Wang and Zhai highlighted two basic techniques to achieve sentiment analysis using a broad approach based on current evidence: rulebased analysis and statistical model-based analysis. Human knowledge is utilized to establish rules (according to the vocabulary of feelings) to determine the feelings behind a text in rule-based analysis. In the work of [17], they offered software that would generate an average number rating of all the unique features of a product, allowing a buyer to gain a comprehensive picture of the product. As a result, it has proven to be a useful tool for reviewing all evaluations and is valuable to purchasers on e-commerce platforms. Keeping this in mind, consumers’ security and privacy should not be jeopardized. Since it assigns a positive or negative orientation to a text, sentiment analysis is classified as a categorizing or classification problem. The outcomes of using support vector machine (SVM) on standard datasets to train a sentiment classifier are
Sentiment Analysis Using an Improved LSTM Deep Learning Model
253
presented in this research paper [18]. According to their findings, using chi-square feature selection can improve classification accuracy substantially. According to the experimental results of this research paper, the LSTM model achieves state-of-the-art accuracy for the feedback given by students’ dataset [19] because approaches like bag of words (BOW), N-gram, Naive Bayes classification algorithm, and SVM models suffer due to their inability to solve difficulties with traditional and they lose the order, context, and information about the words and textual data. Recently, authors of [20] applied a machine learning approach for automated legal text classification. They have applied six different classifiers including multilayer perceptron classifier, decision tree classifier (DTC), K-nearest neighbor classifier, Naïve-Bayes classifier (NBC), support vector machine classifier (SVM), and ensemble classifier for comparison. Similarly, authors of [21] applied machine learning-based approaches for polystore health information systems and produced.
3 Sentiment Classification Using Machine Learning and Deep Neural Network-Based Approach The sentiment analysis plays a major role in a wide range of applications including education domain from student, teachers and education institute perspective, hotel industries, politics, and many more [10]. Machine learning techniques (MLT) and deep learning techniques (DLT) are the most often utilized sentiment classification algorithms. Figure 2 depicts the overall architecture of the sentiment categorization system for deep learning techniques [22].
Fig. 2 Sentiment analysis using deep learning approach
254
D. Bhoi et al.
3.1 Sentiment Analysis Generic System Architecture Using Deep Learning Approach Deep learning is a one type of machine learning that uses neural networks to adapt to a variety of circumstances. Machine learning algorithms require the problem’s distinct feature set to be extracted and presented, whereas a deep learning network model learns these features from the dataset automatically.
3.2 Sentiment Analysis Research Methodologies Extraction of proper features, feature selection techniques, and choice classification techniques play a significant role in the proper identification of the sentiment polarity. From among different term occurrence and opinion, word techniques unigram, bigrams, and trigrams are applied. Machine learning algorithms require the problem’s distinct feature set to be extracted and presented, whereas a deep learning network model learns these features from the dataset without the requirement for feature extraction. The findings of NB, RF, and SVM are examined and contrasted using commonly accessible sentiment classification techniques such as SVM, logistic regression (LR), NBC, random forest (RF), DTC, or K-nearest neighbor classification algorithm.
3.2.1
Feature Extraction
Proper detection and selection of feature sets help to improve sentiment analysis to a great extent. Extraction of proper feature, feature selection techniques, and choice classification techniques plays a significant role in proper identification sentiment polarity. From among different term occurrence and opinion, words techniques unigram, bigrams, and trigrams can be applied. Machine learning algorithms require the problem’s distinct feature set to be extracted and presented, whereas a deep learning network model learns these features from the dataset without the requirement for feature extraction.
3.2.2
Sentiment Classification Techniques
From widely available feature extraction techniques, like support vector machine, logistic regression, Naïve Bayes classification, random forest, decision tree or Knearest neighbor classification algorithm can be applied and their performance can be evaluated, tested, and compared.
Sentiment Analysis Using an Improved LSTM Deep Learning Model
255
3.3 Sentiment Classification Dataset and Experiment The experiment was carried out on a dataset of 515 K hotel review dataset. This dataset is accessible on Kaggle website: [https://www.kaggle.com/jiashenliu/515khotel-reviews-data-in-europe]. This dataset information originally was gathered from Booking.com. Everyone has access to all of the information in the file. This database comprises 515 K customer reviews and ratings from 1.493 K premium hotels around Europe. In the meantime, the geographic location of hotels is provided for additional investigation. We have focused on sentiment classification tasks and its performance measurement on this dataset. In this corpus, it contains sentiment expressed as either positive or negative. Following the application of data preparation stages and feature extraction, the performance of the system was assessed using the NB classification algorithm, RF algorithm, and SVM algorithm as shown in Table 1. After conducting comparison experiments, the assessment parameters of accuracy, precision, recall, and F 1 -score [21, 22] were derived for assessing and comparing performance using the Eqs. 1– 4 and the results are shown in Fig. 3. Where the SVM classification algorithms outperformed the competition for accuracy, recall, precision, and F1-score. However, the RF algorithm produced better results compared to SVM for precision performance measurement. Best performing model results are marked bold. Accuracy =
True Posotive + True Negative True Positive + True Negative + False Positive + False Negative Precision = Recall =
True Posotive True Positive + False Positive
(2)
True Posotive 2 ∗ Precision ∗ Recall True Positive + False Negative Precision + Recall
Table 1 Performance of different classification algorithm
(1)
(4)
Performance measurement
Sentiment classification algorithm NB
RF
CA
81.02
84.86
P
78.61
83.39
82.77
R
73.98
77.84
82.99
F1
75.66
80.52
82.88
CA accuracy, P precision, R recall, F 1 F 1 -score
SVM
86.13
256
D. Bhoi et al.
Performance Comparison using Machine Learning Approach 90 85 80 75 70 65 Naïve Bayes
Random Forest
Accuracy
Precision
Recall
SVM F1-Score
Fig. 3 Performance comparison using machine learning approach
As our next experiment, we have created five deep learning models [DM1, DM2, DM3, DM4, and DM5] with different numbers of nodes in neural network configurations. Their results are compared with SVM, the best performing machine learning technique. Table 2 summarizes the findings. We employed the Adam optimizer, sigmoid activation function, and binary cross-entropy as loss functions in each deep learning model findings. The highest performing machine learning models SVM and other deep learning models were surpassed by our deep learning model 3 (DM3) built in terms of accuracy, recall, and F 1 -score performance. DM3 produced accuracy that was nearly identical to SVM. Precision results for DM5 showed a significant improvement as compared to SVM and other deep learning models. As shown in Fig. 4, we have compared performance results of various deep learning models with each other and earlier outperformed SVM classifier. As a result, in terms of precision and F 1 -score performance evaluation criteria, DM3 and DM5 deep learning models outperform SVM. However, we required a proper deep learning system capable of outperforming the competition. So, in Sect. 4, we proposed improved LSTM (ILSTM) and compared it Table 2 Performance of different deep learning models with best performing SVM Performance measurement
Sentiment classification algorithm DM1
DM2
DM3
DM4
CA
84.70
85.54
86.06
P
81.31
84.21
81.88
R
81.57
79.83
F1
81.44
81.96
CA accuracy, P precision, R recall, F 1 F 1 -score
DM5
SVM
85.73
85.68
86.13
83.55
84.60
82.77
84.89
81.35
79.70
82.99
83.36
82.44
82.08
82.88
Sentiment Analysis Using an Improved LSTM Deep Learning Model
257
Performance Mesurement and Comparison of DL Models with SVM 88 86 84 82 80 78 76 DM1
DM2
DM3
Accuracy
Precision
DM4 Recall
DM5
SVM
F1-Score
Fig. 4 Performance measurement and comparison of deep learning models with SVM
to RNN, LSTM and its variant bidirectional LSTM (BiLSTM), as well as other deep learning models.
4 Sentiment Classification with Improved LSTM [ILSTM] We used sequential models like recurrent neural network (RNN) and its variations to perform sentiment categorization due to the quick growth of deep learning and the continual enrichment of related algorithms. RNN is a deep network design for sequential data processing [23–26]. However, it has the constraint of not being able to maintain lengthy sequential information, which is why RNNs are affected by the vanishing gradient and expanding gradient difficulties. With its gating mechanism, the LSTM model overcomes the challenges of RNNs. We have used LSTM and its variations. LSTM also has limitations and is not able to retain semantic contextual information bi-directionally. As a consequence, we used BiLSTM to yield superior results than basic RNN and LSTM models since it can grasp the semantics of text in both directions (left to right and vice versa). Figure 5 depicts an example LSTM unit. Equations 5–10 illustrate the mathematical calculations for the forget gate, input gate, output gate, cell, and hidden states. We have used a three layer LSTM structure in order to make an improved variation of LSTM. Sequential models and number of units used for Improved LSTM are depicted in Table 3. ILSTM is the variant that we applied in our experiment. We experimented with various permutations and combination of number of different units in each LSTM Layer 1, Layer 2, and Layer 3 with configuration (128, 64, 32), (64, 32, 16), and (32, 16, 8), respectively. f gt = σ w f g h t−1 , xt + b f g
(5)
258
D. Bhoi et al.
Fig. 5 Sample LSTM unit
Table 3 Configuration of improved LSTM
Layer configuration
Input dimension/number of units
Embedding layer
10,000 and max length = 500
LSTM layer1
128 with return sequences = TRUE
LSTM layer2
16 with return sequences = TRUE
LSTM layer3
8
igt = σ wig h t−1 , xt + big
(6)
ogt = σ wog h t−1 , xt + bog
(7)
ct = tan h wc h t−1 , xt + bc
(8)
ct = f gt ct−1 + i t .ct
(9)
h t = ogt .σc (ct )
(10)
We conducted experiments with different possible permutation combinations of Layer1, Layer2, and Layer3. We experimented with the number of most frequent words and the length of the review. Table 3 depicts the optimal design layout, configuration, and its descriptions.
4.1 Improving LSTM Performance Sequential models and number of units used for improved LSTM are depicted in Table 3. ILSTM is the variant that we applied in our experiment. We experimented with various permutations and combinations of the number of different units in each LSTM Layer1, Layer2, and Layer3 with configuration (128, 64, 32), (64, 32, 16), and (32, 16, 8), respectively. We conducted an experiment with all the possible
Sentiment Analysis Using an Improved LSTM Deep Learning Model Table 4 Model configuration for training
259
Number of epochs
Batch size
Validation split
20
128
0.2
combinations. We experimented with the number of most frequent words and the length of the review. Table 3 shows the arrangement that produces the best results.
4.2 Improved LSTM for Sentiment Classification and Performance In our study, we have used ILSTM, which is a three layer LSTM architecture to perform sentiment classification. ILSTM further improved the result of sentiment classification in comparison with RNN, LSTM, and BiLSTM. In our experimentations, we have used ILSTM, which has three layers LSTM architecture used to perform sentiment classification. Configuration and hyper parameter details for training are shown in Table 4. Our model’s general architecture is depicted in Fig. 6. ILSTM further improved the result of sentiment classification in comparison with RNN, LSTM, and BiLSTM.
Fig. 6 Sentiment analysis using proposed DL approach
260
D. Bhoi et al.
From the above result depicted in Table 5 and Fig. 7, we can clearly say that the proposed improved LSTM model outperforms various deep learning models like simple RNN, simple LSTM, and BiLSTM model as well. We have also compared our work with recent work by authors of [26] where they have performed sentiment classification using neural network-based techniques on twitter data. Our model also outperforms their result as they applied Naïve Bayes, decision tree, SVM, multilayer perceptron (MLP), recurrent neural network (RNN), convolutional neural network (CNN), and their validation accuracies ranging from 67.88 to 84.06 for different classification techniques and neural network techniques, while in our approach we could achieve accuracy up to 88.67% and F 1 -score 87.66%. Table 5 Performance of different deep learning models Performance measurement
Deep learning models RNN
LSTM
BiLSTM
ILSTM
CA
71.00
83.00
85.73
88.67
P
71.53
83.00
85.73
87.23
R
71.50
83.00
85.73
88.11
F1
71.00
83.00
85.73
87.66
CA accuracy, P precision, R recall, F 1 F 1 -score
Performance Comparison 100 90 80 70 60 50 40 30 20 10 0 RNN
LSTM Accuracy
Precision
BILSTM Recall
F1-Score
Fig. 7 Performance comparison using the deep learning approaches
ILSTM
Sentiment Analysis Using an Improved LSTM Deep Learning Model
261
5 Conclusion and Future Work We compared deep learning models to machine learning approaches in this research study, which were used to obtain sentiment analysis for hotel industry reviews. The success of feature extraction, which was successfully extracted using the suggested technique, is critical to the efficacy of sentiment analysis. Correct sentiment analysis for hotel review classification will aid in the improvement of the hospitality industry and its services. Bidirectional LSTM (BiLSTM) and improved LSTM (ILSTM) were used to enhance the empirical data, resulting in performance improvements of 2.73% and 5.67%, respectively. We were able to generate improved results utilizing the LSTM model, and the empirical results were further strengthened using ILSTM techniques. In the future, we will employ hybrid approaches to increase sentiment classification accuracy and other performance metrics while lowering computing costs. Acknowledgements The authors are grateful to the Principal and Dean of the Faculty of Technology and Engineering, as well as the Head of the U & P U. Patel Department of Computer Engineering at CSPIT, Charotar University of Science and Technology, Changa, for their constant suggestions, encouragement, guidance, and support in completing this work. We would like to express our gratitude to management in particular for their encouragement and moral support.
References 1. Tang D, Qin B, Liu T (2015) Deep learning for sentiment analysis: successful approaches and future challenges. Wiley Interdiscip Rev Data Min Knowl Discov 5(6):292–303. https://doi. org/10.1002/widm.1171 2. Dang NC, Moreno-García MN, de la Prieta F (2020) Sentiment analysis based on deep learning: a comparative study. arXiv 3. Liu B, Liu B (2015) The problem of sentiment analysis 4. Pang B, Lee L (2008) Opinion mining and sentiment analysis 5. Oramas Bustillos R, Zatarain Cabada ML, Estrada B, Hernández Pérez Y (2019) Opinion mining and emotion recognition in an intelligent learning environment. Comput Appl Eng Educ 27(1):90–101. https://doi.org/10.1002/cae.22059 6. Hu YH, Chen YL, Chou HL (2017) Opinion mining from online hotel reviews—a text summarization approach. Inf Process Manage 53(2):436–449. https://doi.org/10.1016/j.ipm.2016. 12.002 7. Hwang SY, Lai CY, Jiang JJ, Chang S (2014) The identification of noteworthy hotel reviews for hotel management. In: Proceedings of Pacific Asia conference information systems (PACIS) vol 6, no 4, pp 1–17. https://doi.org/10.17705/1pais.06402 8. Liu B (2012) Sentiment analysis and opinion mining. Morgan & Claypool Publishers 9. Li X, Bing L, Li P, Lam W, Yang Z (2018) Aspect term extraction with history attention and selective transformation. (Online) Available: http://arxiv.org/abs/1805.00760 10. Zucco C, Calabrese B, Agapito G, Guzzi PH, Cannataro M (2020) Sentiment analysis for mining texts and social networks data: methods and tools. Wiley Interdisc Rev Data Min Knowl Discovery 10(1). https://doi.org/10.1002/widm.1333 11. T. Anuprathibha, Selvib CSK (2016) A survey of twitter sentiment analysis. IIOAB J 7(9Special Issue):374–378
262
D. Bhoi et al.
12. Bhoi D, Thakkar A (2021) Sentiment analysis tools, process, methodologies: a survey. Int J Adv Sci Technol 29(4):6280–6290. Accessed: May 06 2021. (Online) Available: http://sersc. org/journals/index.php/IJAST/article/view/27316/15007 13. Parveen R, Shrivastava N, Tripathi P (2020) Sentiment classification of movie reviews by supervised machine learning approaches using ensemble learning voted algorithm. In: 2nd International conference on data engineering and applications (IDEA0 2020), vol 4, no 4, pp 285–292. https://doi.org/10.1109/IDEA49133.2020.9170684 14. Hung C (2017) Word of mouth quality classification based on contextual sentiment lexicons. Inf Process Manage 53(4):751–763. https://doi.org/10.1016/j.ipm.2017.02.007 15. Yue L, Chen W, Li X, Zuo W, Yin M (2019) A survey of sentiment analysis in social media. Knowl Inf Syst 60(2):617–663. https://doi.org/10.1007/s10115-018-1236-4 16. Sci-Hub | generative models for sentiment analysis and opinion mining. Socio-Affect Comput 107–134 (2021). https://doi.org/10.1007/978-3-319-55394-8_6. Accessed Jul. 22 2021 17. Potdar A, Patil P, Bagla R, Pandey R, Jadhav N (2016) SAMIKSHA—sentiment based product review analysis system. Phys Procedia 78(December):513–520. https://doi.org/10. 1016/j.procs.2016.02.096 18. Zainuddin N, Selamat A (2014) Sentiment analysis using support vector machine. In: Proceedings no I4CT 2014—1st international conference on computer communication control technology, pp 333–337. https://doi.org/10.1109/I4CT.2014.6914200 19. Kandhro IA, Wasi S, Kumar K, Rind M, Ameen M (2019) Sentiment analysis of students comment by using long-short term model. Indian J Sci Technol 12(8):1–16. https://doi.org/10. 17485/ijst/2019/v12i8/141741 20. Sil R, Alpana, Roy A (2021) Machine learning for automated legal text classification. Int J Comput Inf Syst Ind Manage Appl 13:242–251 21. Gupta N, Gupta B (2021) Machine learning approach of semantic mapping in polystore health information systems. Int J Comput Inf Syst Ind Manage Appl 13:222–232 22. AlSurayyi WI, Alghamdi NS, Abraham A (2019) Deep learning with word embedding modeling for a sentiment analysis of online reviews. Int J Comput Inf Syst Ind Manage Appl 11:227–241 23. Haitao W, Jie H, Xiaohong Z, Shufen L (2020) A short text classification method based on n-gram and CNN. Chin J Electron 29(2):248–254. https://doi.org/10.1049/cje.2020.01.001 24. Taneja N, Thakur HK (2021) RNNCore: lexicon aided recurrent neural network for sentiment analysis. Int J Comput Digital Syst 25. Onan A (2020) Mining opinions from instructor evaluation reviews: a deep learning approach. Comput Appl Eng Educ 28(1):117–138. https://doi.org/10.1002/CAE.22179 26. Singal A, Thiruthuvanathan MM (2022) Twitter sentiment analysis based on neural network techniques. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Lecture notes on data engineering and communications technologies, vol 114. Springer, Singapore. https://doi.org/10.1007/978-981-16-9416-5_3
Comparative Analysis on Deep Learning Models for Detection of Anomalies and Leaf Disease Prediction in Cotton Plant Data Nenavath Chander
and M. Upendra Kumar
Abstract Around 40% of the cotton in world is produced in India, but it is prone to various limitations in the area of the leaf. And most of these limitations are recognized as diseases. It is very difficult to detect those diseases with naked eyes. This research focuses on building an application that uses different convolution networks to improve the process of identifying the health of the cotton crops. To prevent the major losses in the cotton crop production, we need to check the crop health frequently to reduce the spread of the disease. Reason for damaged cotton crops is because of pests, fungi, and bacteria. The proposed application is used to save the cotton crops from diseases and farmers from heavy losses; about 90% of the cotton cultivation is prone to get affected by fungus pests and bacteria. In the proposed work, we use ResNet50, ResNet152v2, and Inception V3 models for predicting disease. Here, the dataset, i.e., the images of cotton leaves and plants which are used for training and testing the model, is captured manually. The proposed way shows the higher classification rate of accuracy, and also the computation time is low. This shows the practical importance of this application usage in the real-time situations for disease detection. Keywords Image · TRIM · Classification · Color · Texture · Disease · Feature · Prediction · Cotton
1 Introduction Around 40% of the cotton in the world is produced in India. In the year 2019– 20, India has produced 63 billion kilograms of cotton. In the previous year, India N. Chander (B) Department of Computer Science and Engineering, Osmania University, Hyderabad, Telangana, India e-mail: [email protected] M. Upendra Kumar Department of Computer Science and Engineering, Muffakham Jah College of Engineering and Technology, Affiliated to OU, Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_20
263
264
N. Chander and M. Upendra Kumar
has produced over 53 billion kilograms of cotton. From the latest analysis, we can clearly see that India exceeds China’s cotton production which is nearly 600 billion kilograms. However, the statistics show the huge numbers, but the crop production per unit hectare is strictly low. The cotton agriculture is gradually reducing over years in India due to the pest attack and bacteria causing infections. One of the other reasons is charcoal rot, and this makes a huge impact on the cotton crop plantation; due to this, the cultivators get a huge loss in their production and income. The abovementioned problem can be solved if the cultivators have adequate knowledge on the plant diseases and infections by which they are caused. Once the farmer knows the information regarding the diseases, then he can cut off the crops which are diseased and infected. So that the infection is not spread across the farm. If the farmers have adequate knowledge, then they can use the fertilizers to treat the crops in the early stage of the infection, and also farmers can use the latest drones to sprinkle the medicine frequently to avoid the plant infections and diseases. This project helps the farmers and cultivators to identify the health of the cotton crops by simply taking the pictures of the plant using a mobile phone and uploading it in the application so that they can immediately know the health condition of the plants. Further, this web application can be converted into mobile apps so that farmers can directly click and then upload the image in the mobile app to get the results instantly. The green pixels are removed by using a specific threshold value that is obtained by using Otsu’s method [1]. The noise, which is present in the pictures, is eliminated by using the diffusion techniques like anisotropic method. And green pixels are separated by using some threshold value. The health condition of the cotton crop should be identified quickly and accurately to decrease the growth of diseases on cotton crops [2] and offer a way to find out the texture statistics using the SGDM method to detect the disease on the leaf. It really takes a huge amount of time and also needs rigorous knowledge for farmers to manually check all the crops in the field, which is a highly unfeasible and difficult task [3]. By this, diseases which spread slowly cannot be recognized early, and the spread of that disease increases drastically. To solve these issues, we are developing an application. The objective of this system is to identify the health conditions of these crops that most often result in production. A way to discover the Foliar (major fungal disease) using their HPCCDD proposed algorithm is proposed [4]. Methods like color and texture analysis and K-means clustering for successfully detecting the diseases in Malus Domestica (apple) are used [5]. Machine learning-based outlier detection techniques are proposed for IoT data analysis: a comprehensive survey [6]. Chander et al. [7] proposed metaheuristic feature selection with deep learning enabled cascaded recurrent neural network for anomaly detection in industrial IoT environment, proposed dependable solutions design by agile modeled layered security architectures [8], proposed designing dependable web services security architecture solutions [9], proposed designing dependable business intelligence solutions using agile web services mining architectures [10], proposed automatic water level detection using IoT [11], and proposed Taj-Shanvi framework for image fusion using guided filters [12]. Figure 1 the diseased plant and leaves are affected with bacteria, insects, and fungus that occur in the cotton leaves.
Comparative Analysis on Deep Learning Models for Detection …
265
Fig. 1 a Fresh cotton plant, b fresh cotton leaf, c diseased cotton plant, and d diseased cotton leaf
The remaining paper consists of a few sections such as Sect. 2 consists of related works, i.e., literature survey. Section 3 consists of the proposed model, in which we will discuss related to the machine learning models we used, i.e., Inception V3, ResNet50, and ResNet152V2. Section 4 consists of experimental validation (which shows the information of the training and testing data), and further sections show the conclusion and the references.
2 Related Works Neural networks are used in identification and classification of diseases in grape leaves [2]. The raw grape image with noisy and disturbance background is taken as input, and then it is scaled into 300 × 300 pixels. To maintain the infected section information, the background in the image is removed by using the masking technique for green pixels, then noise is removed using a technique called anisotropic diffusion, and the process is iterated up to five times. Then, using K-means clustering, disease segmentation is performed, and from the diseased part of the leaf, the associated textural information is created by using gray-level co-occurrence matrix (GLCM) for nine features. Feed-forward BPNN uses the extracted characteristics to classify the data. The RGB is converted to another color space, and the K-means clustering approach is used to identify the majority of green color pixels [1]. Now the thresholding technique like Otsu’s thresholding is utilized to mask these green color pixels. Then, using the color co-occurrence matrix (CCM) approach, the diseased part of the RGB image is converted into the HIS color space, and then texture features are observed. Finally, a pre-trained neural network is used to conduct feature recognition for the extracted features. A variety of stages are involved in detecting illness on a leaf [3]. To begin, the original true-color image is converted to HSV and a color descriptor structure in which the hue component is employed for subsequent analysis. Green color pixel masking is therefore done by setting a background value of zero or some other shade of green pixel that is not part of our interest. As a result of the segmentation, meaningful segments containing a large quantity of information are obtained.
266
N. Chander and M. Upendra Kumar
The color co-occurrence approach is used to analyze texture features by computing parameters of spatial gray-level dependence matrices (SGDM) such as correlation for hue content, energy, local homogeneity, and contrast. First the acquired image is improved, and then color image segmentation is performed utilizing edge detection algorithms such as Sobel and Canny. The HPCCDD method is used to classify images and identify sick spots [4]. Farmers directly send the input collected photographs, and after detecting the disease, the best pesticides and fertilizers are recommended in three languages. During the preprocessing, the images with true color are transformed into images of high intensity [5]. During the thresholding process, if the pixel values are larger than the threshold value, then that pixel is treated as an object. This thresholded image’s histogram is obtained, and after that, the histogram is equalized. The color co-occurrence matrix (CCM) is now employed to extract textural information, and the K-means clustering algorithm is used to diagnose illness. This approach is proposed to apply regularization and to extract the eigen features from images of the cotton leaves [13]. Scatter matrices are generated using 100 sample photos to extract the eigen feature. The scatter matrix is produced within a class type, and it is split into distinct subspaces connected to various diseases. This is done by taking into account pixel value variations. The final stage involves feature extraction and dimensionality reduction. The disease is identified once the features are compared. The system has three stages: (i) leaf segmentation, (ii) disease segmentation, and (iii) disease classification [14]. In leaf segmentation phase, the LAB color space is used. The self-organizing feature map (SOFM) is used to cluster the image, and using the back propagation neural network (BPNN), the color features are retrieved. Disease segmentation for the grape leaves is now carried out utilizing a modified self-organizing feature map. Now for better optimization, the genetic algorithms are used, and classification is carried out by using a support vector machine. The Gabor filter is also used to more effectively examine leaf disease color features. Here, a filter like low-pass filter is used to smoothen or enhance the images and a Gaussian filter after they are captured with a digital camera [15]. For picture segmentation, contour models which are active are used. To train the classifier, seven moments of hues are retrieved as features. The rotation, size, and object translation of those hue’s moments are all variable. Finally, the feed-forward back propagation network method is utilized to classify diseases using this feature vector. Using the multiclass support vector machine (SVM) classification method, the author recognized and categorized plant leaf infections and proposed three steps in this process [16]: Step 1: input image acquisition, Step 2: preprocessing of images, and Step 3: segmentation for identifying the affected region from the leaf images by implementing the K-means clustering algorithm, and then gray-level co-occurrence matrix (GLCM) features are mined for classification. Currently, deep learning algorithms and unmanned aerial vehicles (UAVs) are used in care agriculture to achieve agriculture applications, such as the detection and treatment of plant diseases [17]. One of the most significant agricultural applications that has attracted growing interest from researchers in recent years is plant illness monitoring utilizing UAV platforms. Memon et al. [18] presented numerous diseases that influence yield via the
Comparative Analysis on Deep Learning Models for Detection …
267
leaf. Early disease detection protects crops from additional harm. Numerous diseases, such as leaf spot, target spot, bacterial blight, nutrient deficit, powdery mildew, and leaf curl, can affect cotton. For appropriate action to be taken, disease detection must be done accurately. Deep learning is crucial for correctly diagnosing plant diseases. Singh et al. [19] presented a genetic method that was suggested by the author to identify and categorize leaf disease. On the leaf samples that were photographed for the study using a camera, the author used a variety of image processing techniques. The image processing methods include thresholding for green pixels, deleting masked cells, and removing unwanted distortion using a smoothing filter. The samples are applied to the population, and the top samples are chosen in each cycle. Following the application of the clustering process, the image’s color and texture are among the features that are extracted. The SVM classifier is used to classify the feature dataset. In the study by Santos et al. [20], instance segmentation was suggested as a method for employing mask R-CNN to recognize different grape varietals. Due to the annotated grape dataset’s ability to find instances, the proposed method can only be used with grape, which is a limitation of the work. The accurate diagnosis of diseases is aided by deep learning-based plant disease identification. However, it might be challenging to recognize leaf diseases based just on your understanding of the subject. Chander et al. [21] presented the data originating in the data repositories, the data is more information and accuracy over the sensor data of many fields such as health care, and the existing data mining techniques are not capable and able to provide low-efficient results by comprehending the data.
3 Proposed Model The objective of the proposed system is to build an efficient application that will be designed to segregate the infected and disease-free crop from the input image of the leaf or plant. The objective of this project is to minimize the cotton crop wastage and reduce the financial loss. We compare results from three different machine learning models for faster computation and better accuracy and decrease the computation time of the system compared to the existing system. The main problem with the existing system is that it requires more computation time. If the response time is not up to the mark, then the cotton crop will be in vain. The existing systems are not reliable in getting appropriate output. There are existing approaches like “agricultural plant leaf disease detection using image processing” built and attained an accuracy of 90.9% using image processing, other approach “grape leaf disease detection from color imagery using hybrid intelligent system” measured 87% accuracy for test images and has limitation to extract ambiguous color pixel from background, and this problem can be solved with many other approaches with different algorithms and accuracies but have its own limitations.
268
N. Chander and M. Upendra Kumar
3.1 Scope The proposed project was developed using a convolutional neural network, and it helps us to identify the health condition of the cotton crop. The model performs well over the images where it contains only one image with a clean background in it. This project helps farmers to solve the problem of finding whether a leaf is healthy or infected by any disease. The major characteristics observed for plant disease detection using deep learning models are speed and accuracy. So, there is a lot of scope to work on developing fast and efficient interpreting algorithms. These innovative algorithms are used by farmers for detecting disease in infancy stages; also, there is scope to work on evaluating the severity of the disease detected.
3.2 Architecture Diagram In Fig. 2, firstly, import the required libraries and import the dataset; preprocess the data by resizing; import inception library and add preprocessing layers in front of inception layers; create a model object and train the model; save the model in h5 file format, and the same steps are repeated for both ResNet50 and ResNet152V2 models. Now load these models into the backend server. Load the web application. Now the user can upload either a single image or an entire folder containing images. Save the loaded image into an “uploads” file which acts as the database for this application. Retain image path. Preprocess the image by scaling and resizing and feeding the preprocessed image to the models and parsing the results then displaying the result.
Fig. 2 System architecture
Comparative Analysis on Deep Learning Models for Detection …
269
The Inception V3 is built using convolutional blocks which are backed by convolutional neural networks for image classification. When compared to Inception V1 and Inception V2, Inception V3 is a deeper network; at the same time, it does not compromise its speed and efficiency, and it is computationally economical. Auxiliary classifiers are used as regularizers in Inception V3. The important tampering done on the Inception V3 model is factorization into smaller convolutions, spatial factorization into asymmetric convolutions, utility of auxiliary classifiers, and efficient grid size reduction. Figure 3 shows the input size of the image is 299 × 299 × 3. It is an architecture of 48 layers which includes convolutional layers, max pool layers, and average pool layers. The concept of “Inception A” block is factorization into smaller convolutions. By this, the number of parameters is reduced by 28%. In the “Inception B” block, factorized 7 × 7 convolutions are used. The concept of “Inception C” block is factorization into asymmetric convolutions. By this, the number of parameters is reduced by 33%. An auxiliary classifier is introduced in between the layers to remove the problem of vanishing gradient. And the loss occurring in the auxiliary classifier is added to the main classification to improve the results. Figure 4 ResNet50 is built using convolutional blocks which are backed by convolutional neural network. More than one million images are used to train ResNet50, and these images are from the ImageNet database. ResNet50 can classify images into 1000 different categories as it has 50 layers. The network takes an image input size of 224 × 224 pixels. Hence, the network has learnt wide features that identify a wide range of images.
Fig. 3 Inception V3 architecture
270
N. Chander and M. Upendra Kumar
Fig. 4 ResNet50 architecture
4 Experimental Validation Inception model: One of the best features of the inception model is that it extracts the vital information of the images using different scales with multiple convolutional kernels. Finally, it adds all the collected information for better image representation. The Inception V3 model has a recall of 80.08%. This model also has the rate of precision and accuracy of 80.07% and 81.63%, respectively. ResNet model: Residual network is one of the most widely used networks in the current situation. In fact, this deep learning network is used to identify the COVID-19 from the CT scan images. The main difference between ResNet50 and ResNet152 models is the accuracy and number of layers present. Accuracy of ResNet50 is 80.95%, and accuracy of ResNet152V2 is 88.36%. The experimental validation of TRIM (three models) is tested using 10% of the total dataset. The dataset includes four categories, i.e., diseased cotton leaf and plants and fresh cotton leaf and plants. Each of the above categories has 50 images to validate, and in total, we have 200 images with different lighting conditions and different ages of the crops. Approximately, 2000 images are being used for training the models. Figure 5 represents the confusion matrix for our multiclass classifier. For the multiclass classifier, we use Eqs. (1) and (2) to find out the values of precision and recall. Precision i = Value ii/ j Value ji (1)
Comparative Analysis on Deep Learning Models for Detection …
271
Fig. 5 Confusion matrix
Table 1 Classification report Precision
Recall
Diseased cotton leaf
0.98
0.98
0.98
50
Diseased cotton plant
0.92
0.98
0.95
50
Fresh cotton leaf
1.00
0.94
0.97
50
Fresh cotton plant
0.96
0.96
0.96
50
Accuracy
0.98
0.96
0.96
200
Macro average
0.97
0.96
0.97
200
Weighted average
0.97
0.96
0.97
200
Recall i = Value ii/
F 1 -Score
j Value i j
Support
(2)
In the above equations, “i” represents the row (specific category) and “j” represents the columns (all the categories of the classification) in the confusion matrix. Table 1 shows the values such as precision and recall for all the categories of the classification. The above picture also represents the classification report, which shows that overall accuracy is 96%.
5 Conclusion In this study, we implemented deep learning-based techniques, and this mainly uses Python and some packages like Keras. And we used Google Colab for building this application. We have conducted many experiments during the research study in order to get an efficient approach. We also tried by customizing the various parameters.
272
N. Chander and M. Upendra Kumar
Some of the customized parameters are regularization method, number of epochs, color of images in dataset, etc. We have noticed that the efficiency was reached up to 96% for identifying each type of cotton leaf and plants health. By using the symptoms of the leaves and plants, we can develop automated devices to assist the farmers to easily identify the diseased crops. Obtained results show that this approach can be used to develop systems for helping the farmers to reduce the crop damage and increase the cotton production.
References 1. Al-Hiary H et al (2011) Fast and accurate detection and classification of plant diseases. Int J Comput Appl 17(1):31–38 2. Sannakki SS et al (2013) Diagnosis and classification of grape leaf diseases using neural networks. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT). IEEE 3. Dhaygude SB, Kumbhar NP (2013) Agricultural plant leaf disease detection using image processing. Int J Adv Res Electr Electron Instrum Eng 2(1):599–602 4. Revathi P, Hemalatha M (2012) Classification of cotton leaf spot diseases using image processing edge detection techniques. In: 2012 International Conference on Emerging Trends in Science, Engineering and Technology (INCOSET). IEEE 5. Bashir S, Sharma N (2012) Remote area plant disease detection using image processing. IOSR J Electron Commun Eng (IOSRJECE) 2(6):31–34 6. Nenavath C, Upendra Kumar M (2020) Machine learning based outlier detection techniques for IoT data analysis: a comprehensive survey 7. Chander N, Upendra Kumar M (2022) Metaheuristic feature selection with deep learning enabled cascaded recurrent neural network for anomaly detection in Industrial Internet of Things environment. Clust Comput, 1–19 8. Upendra Kumar M et al (2012) Dependable solutions design by agile modeled layered security architectures. In: International Conference on Computer Science and Information Technology. Springer, Berlin, Heidelberg 9. Shravani D et al (2011) Designing dependable web services security architecture solutions. In: International Conference on Network Security and Applications. Springer, Berlin, Heidelberg 10. Krishna Prasad AV et al (2011) Designing dependable business intelligence solutions using agile web services mining architectures. In: International Conference on Advances in Information Technology and Mobile Communication. Springer, Berlin, Heidelberg 11. Mahalakshmi CVSS, Mridula B, Shravani D (2020) Automatic water level detection using IoT. In: Satapathy S, Raju K, Shyamala K, Krishna D, Favorskaya M (eds) Advances in decision sciences, image processing, security and computer vision. Learning and analytics in intelligent systems, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-030-24318-0_76 12. Padala A, Shravani D (2021) Image processing: human facial expression identification using convolutional neural networks. Turk Online J Qual Inquiry 12(6) 13. Gurjar AA Gulhane VA (2012) Disease detection on cotton leaves by eigenfeature regularization and extraction technique. Int J Electron Commun Soft Comput Sci Eng (IJECSCSE) 1(1):1 14. Meunkaewjinda A et al (2008) Grape leaf disease detection from color imagery using hybrid intelligent system. in: 2008 5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, vol. 1. IEEE 15. Rothe PR, Kshirsagar RV (2015) Cotton leaf disease identification using pattern recognition techniques. In: 2015 International Conference on Pervasive Computing (ICPC). IEEE 16. Rajiv K et al (2021) Accuracy evaluation of plant leaf disease detection and classification using GLCM and multiclass SVM classifier. In: Congress on intelligent systems. Springer, Singapore
Comparative Analysis on Deep Learning Models for Detection …
273
17. Bouguettaya A et al (2022) A survey on deep learning-based identification of plant and crop diseases from UAV-based aerial images. Cluster Comput, 1–21 18. Chander N, Upendra Kumar M (2022) Metaheuristic feature selection with deep learning enabled cascaded recurrent neural network for anomaly detection in Industrial Internet of Things environment. Cluster Comput, 1–19 19. Memon MS, Kumar P, Iqbal R (2022) Meta deep learn leaf disease identification model for cotton crop. Computers 11(7):102 20. Singh V, Misra AK (2017) Detection of plant leaf diseases using image segmentation and soft computing techniques. Inf Process Agric 4(1):41–49 21. Santos TT et al (2020) Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association. Comput Electron Agric 170:105247
A Model for Prediction of Understandability and Modifiability of Object-Oriented Software Sumit Babu
and Raghuraj Singh
Abstract Software quality measurement at an early stage helps in improving the product and process of the development. A good-quality software product is easy to maintain. Maintainability of software refers to the ease with which software can be understood, repaired, and improved. The longevity of software depends on the ability of developers to meet new customer requirements and address the problems faced by them. Software maintenance is the most expensive phase of development and consumes more than half of the development budget. This work is carried out to find the early factors of object-oriented software (OOS) that help developers to refine the quality. The proposed model predicts two major attributes of maintainability, namely understandability and modifiability, of OOS through UML class diagrams on the basis of size and structural complexity metrics. The model is validated by establishing a high correlation with the existing prediction model for both the factors. In comparison with an existing model, the proposed model gives a correlation of 0.9603 and 0.9424 for understandability and modifiability, respectively. Keywords Object-oriented software · Software quality · Size metric · Structural complexity metric · Neural network · Understandability · Modifiability · Maintainability
1 Introduction In the recent times, software quality measurement is in special attention to researchers that provides a way to improve the software quality [1] before it is delivered to target users. The quality of a product is analyzed by measuring the factors related to that product. The software quality is also measured by considering major factors which affect the quality [2]. The early measurement of quality helps developers to improve the design during the initial development phase. Measurement of software quality is generally done either at the design level through some of the design attributes or S. Babu (B) · R. Singh Harcourt Butler Technical University, Kanpur 208002, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_21
275
276
S. Babu and R. Singh
through code when the product is ready. These two types of software quality are usually referred to as design quality and product quality, respectively. Many prediction models are available that measure the different factors like reusability, maintainability, reliability, portability, usability, flexibility, etc. for the assessment of quality of OOS. In the software quality assessment process, all factors with high impact have to be considered as they cover all aspects related to requirements and grossly affect the software quality. Software quality prediction models vary on the basis of use of factors and techniques to obtain the results. Prediction of such factors is very important during the development of software for better planning and improved quality of development. As the design phase of software gives a sound and stable platform for the development, it is very crucial for the subsequent phases of development. If the developer identifies the main factors and focuses on them, the quality of design and development of software will be improved. Due to high demand of OOS in the industry and organizations, the objectoriented programming languages and design tools such as UML are of prime focus. These tools help to find these early factors which contribute to the software quality. Design and code quality of software has a direct bearing on the maintainability defined as the ease with which software can be understood, repaired, and improved. Maintainability is taken care of by continuously adapting software to meet new customer requirements, fixing bugs, optimizing existing functionality, and adjusting code to prevent future issues. Active life of software depends on the ability of developers to meet new customer requirements and address the problems faced by them. Being the most expensive phase of development and consuming more than half of the development budget, utmost care has to be taken to settle the maintainability issues. Indicators of maintainability of object-oriented software (OOS) can be developed by measuring the factors like understandability and modifiability using UML class diagrams. The selection of these two factors has been done because these are the major factors affecting the quality of OOS [3]. Work has been investigated through existing metrics that can be found from the design of OOS and that have direct impact on these factors. The rest of the paper is organized as follows: In the next section, related work is presented. Proposed neural network model for understandability and modifiability is described in Sect. 3. Section 4 contains the results and discussion, and finally, Sect. 5 presents the conclusion and future scope.
2 Related Work Quality assessment of OOS is generally done through an aggregation process of factors, attributes, and metrics values available at different hierarchies in the hierarchical models. Several models and methods for prediction and estimation of factors that have impact on the quality of OOS as well as many metric sets based on size and structural complexity metrics of OOS are available in the literature. Marchesi’s metric set [4] is known for the measurement of complexity of class diagrams. This metric set
A Model for Prediction of Understandability and Modifiability …
277
contains seven metrics related to classes such as class count, inheritance hierarchy count, number of dependency of classes, and responsibilities of classes. Genero [5, 6] also proposed a group of metrics which is used to measure the complexity of class diagrams. This metric set contains fourteen metrics related to design of OOS and focuses on classes, attributes, methods, aggregation, generalization, dependency, several hierarchies, maximum depth, and maximum HAgg value of the classes. In’s metric [7], Rufai’s metric [8], Zhou’s metric [9], Kang’s metric [10, 11] are other most common metrical sets for the measurement of complexity of class diagrams. Metric sets like Chidamber and Kemerer metrics suite [12], MOOD metric set for OO design [13], design metric set for testing [14], and product metric set for object-oriented design [15] are used for evaluating the design of OOS. Several studies have been carried out on identification of the quality factors for OOS. Diwaker and Tomar [16] identified factors and techniques which are used to develop reliability models. Jha et al. [17] proposed a model for maintainability prediction using deep learning. Alsolai and Roper [18] discussed various studies related to maintainability prediction which used a machine learning approach. Yaghoobi [19] proposed an algorithm for optimization of software reliability model parameters. Baig et al. [20] proposed a package stability metric which is based on the changes among the contents of package connections between packages. Jayalath and Thelijjagoda [21] proposed a complexity metric to improve the readability of object-oriented software. Mishra and Sharma [22] proposed a model to predict maintainability of object-oriented software using fuzzy systems. Kumar and Rath [23] focused on maintainability of software using object-oriented metrics. Hybrid approach of neural network is used to design prediction models. They used two feature selection techniques to find the best set of metrics. They found better prediction results for maintainability in comparison to other existing models. In this prediction model, three artificial intelligence techniques are used. A set of selected existing size and structural complexity metrics which can be used for the assessment of software quality is described in Table 1. A review of various models for prediction of quality factors such as understandability, maintainability, reusability, modifiability, functionality, and reliability related to OOS is given in Table 2. From the review of related work, it is concluded that several metrics and models that predict the quality factors related to OOS are available, but making appropriate selection parameters and metrics that affect the quality factors is very important to attain the desired goal. Existing models that estimate the quality factors by using different parameters from the code are plenty in number, but the models which measure these factors at the design level are less. Assessment of these factors at the design level helps developers to make changes in the design as per the requirements. According to McCall’s model [32], Boehm’s model [33], Dromey’s model [34], ISO 2001 [35], and ISO 25000 [36], the selected factors for quality assessment should have high priority and impact on the quality of software. In this study, two highpriority factors named understandability and modifiability are selected along with certain metrics that affect the process to assess the design quality of OOS. For this assessment, a set of size and structural complexity metrics of OO design as described in Table 1 has been considered.
278
S. Babu and R. Singh
Table. 1 Selected quality metrics for the model development Sr. No.
Name of the metric
Description
1
Number of Classes (NC)
Sum of all classes in an UML class diagram
2
Number of Attributes (NA)
Sum of all attributes in an UML class diagram
3
Number of Methods (NM)
Sum of all methods in an UML class diagram
4
Number of Associations (NAssoc)
Sum of all associations in an UML class diagram
5
Number of Aggregation (NAgg)
Sum of all aggregation in an UML class diagram
6
Number of Dependency (NDep)
Sum of all dependency in an UML class diagram
7
Number of Generalization (NGen)
Sum of all generalization in an UML class diagram
8
Number of Aggregation Hierarchies (NAggH)
Sum of all aggregation hierarchies in an UML class diagram
9
Number of Generalization Hierarchies (NGenH)
Sum of all generalization hierarchies in an UML class diagram
10
Maximum DIT (MaxDIT)
Highest DIT value of class hierarchy of the class diagram
11
Maximum HAgg (MaxHAgg)
Highest HAgg value of class hierarchy of the class diagram
3 Neural Network-Based Prediction Model The neural network-based model for the assessment of understandability and modifiability proposed here first collects the metrics for design, size, and structural complexity of OOS and then selects metrics which are used to predict the two target quality factors, i.e., understandability and modifiability.
3.1 Model Structure and Data Collection Structure of the proposed model is hierarchical in nature having two-level hierarchies. At the top, there are factors which are assessed, and at the next level, there are selected metrics through which these factors are assessed. Weights of the metrics used in the aggregation process for the assessment of the quality factors are adjusted through a multistage feedback neural network. The neural network is built using a set of metrics shown in Table 1.
A Model for Prediction of Understandability and Modifiability …
279
Table. 2 Existing models for quality factors Author name and reference no.
Model related to factor Objective
Genero et al. [24]
Understandability, modifiability
To find factors of UML diagrams using size and structural metrics. Computed factors in three groups while considering different sets of metrics
Padhy et al. [25]
Reusability
Measured reusability using CK’s metric set. In this model, the algorithms and their primary constructions are considered for finding the metrics from the class diagrams
Papamichail et al. [26]
Reusability
This model predicts reusability from complexity, cohesion, coupling, inheritance, documentation, and size
Padhy et al. [27]
Reusability
This model predicts reusability for OO design-based web of services software systems. They use CK’s metric set for estimation
Bajeh et al. [28]
Modifiability
Used two inheritance metrics for modifiability prediction of OOS inheritance hierarchies
Yadav et al. [29]
Functionality
Proposed a functionality prediction model of OOS using multi-criteria decision-making approach. This model uses expandability, fault tolerance, modularity, and operability as subcriteria for model construction
Sharma et al. [30]
Functionality
Proposed a fuzzy model to evaluate functionality of a software system. This model uses CK’s metric set for assessment. This model considers inheritance, cohesion, coupling, and complexity properties of object-oriented software
Diwaker et al. [31]
Reliability
Proposed a model that estimates the reliability of component-based software using serial reliability model and parallel reliability model. The proposed model is computed using two soft computing techniques—fuzzy logic and PSO
280
S. Babu and R. Singh
Table. 3 Proposed model for understandability and modifiability Factor Understandability Modifiability
Neural network-based model (NC, NA, NM, NAssoc, NAgg, NDep, NGen, MaxHAgg) (NC, NA, NM, NAssoc, NDep, NAggH, NGenH, MaxDIT)
R2 0.9603 0.9424
3.2 Algorithm for the Model Development Step 1: Select most frequently used metrics for OOS which affect the quality factors. Step 2: Take a data set of UML diagrams of OOS projects of different complexity. Step 3: Find the metric values of the selected metrics for all UML diagrams. Step 4: Compute the values of the quality factors using the Genero [24] Model. Step 5: Assess the quality factors using the proposed neural network based model. Step 6: Compare the results and validate the proposed model against the Genero [24] Model. A set of selected size and structural complexity metrics which are generally used for the assessment of various software quality factors is given in Table 1. Metrics selected for prediction of chosen quality factors, i.e., understandability and modifiability are given in Table 3. The dataset consisting of 24 UML diagrams of OOS of three levels of complexity (low level, medium level, and high level) is taken from Genero [24]. Values of various metrics for these 24 UML diagrams are given in Table 5.
3.3 Model Validation Genero [24] model, which uses linear regression and the formulation as defined in Table 4, has been used for comparison and validation of results of the proposed model. Table 3 shows the correlation with the existing model shown in Table 4. Models with group B-Lg for understandability and group A-Lg for modifiability give the best results for prediction. Best results of the Genero [24] model have been used for the comparison and validation purposes. Table. 4 Existing model for understandability and modifiability Factors
Group
Model equation
R2
Understandability
Group B-Lg
1.075lg(NM) + 0.384 * Lg(MaxHAgg)
0.981
Modifiability
Group A-Lg
251.325 * Lg(NA)
0.847
6
7
7
6
6
5
8
7
9
14
17
10
9
13
13
11
30
52
39
24
19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
56
57
65
76
54
46
39
34
42
33
33
27
31
25
17
9
11
18
19
23
28
6
4
2
4
2
3
3
3
5
6
8
6
3
147 13
232 16
71 11
35 15
128 12
79
96
57
89 10
83
66 12
60 11
78
85
24
18
39
38
39
60
52
6
8
6
19
7
2
5
4
1
5
5
0
3
2
3
3
3
2
3
1
0
3
1
8
8
3
1
1
1
0
1
0
0
1
0
2
0
0
0
1
1
0
3
5
23
21
17
2
4
6
2
1
4
6
0
1
1
0
1
0
1
2
0
2
2
2
7
1
1
2
3
1
1
1
1
0
1
1
0
1
0
1
1
0
4
5
3
2
3
2
2
2
1
3
3
0
2
1
1
1
3
2
1
1
0
2
2
2
7
4
1
4
3
1
2
1
0
2
2
2
2
1
1
3
1
0
1
1
3
4
4
1
1
1
1
1
2
2
0
1
1
0
1
0
1
1
0
272.04
443.27
124.87
94.77
306.13
107.27
225.02
115.35
121.86
147.57
88.49
112.37
138.08
151.38
39.12
28.76
50.40
49.02
76.85
79.91
77.82
261.24
425.67
119.91
91.01
293.98
103.01
216.09
110.77
117.02
141.72
84.98
107.91
132.60
145.37
37.57
27.61
48.40
47.07
73.80
76.74
74.73
439.36
441.30
455.63
472.70
435.39
417.89
399.87
384.90
407.96
381.64
381.64
359.74
374.82
351.34
309.24
239.82
261.73
315.48
321.38
342.24
363.71
(continued)
414.06
415.88
429.39
445.47
410.32
393.82
376.84
362.73
384.46
359.66
359.66
339.02
353.23
331.10
291.43
226.01
246.65
297.31
302.87
322.52
342.76
OOS NC NA NM NAssoc NAgg NDep NGen NAggH NGenH MaxHAgg MaxDIT Actual Predicted Actual Predicted projects understandability understandability modifiability modifiability
Table. 5 Values of metrics and factors
A Model for Prediction of Understandability and Modifiability … 281
28
28
27
22
23
24
33
39
49
82
7
111 12
111 10
5
4
4
2
2
2 15
10
14 3
1
3 1
1
1 1
1
2 2
2
3 111.63
154.35
201.41 107.20
148.22
193.42 381.64
399.87
424.79
359.66
376.84
400.32
OOS NC NA NM NAssoc NAgg NDep NGen NAggH NGenH MaxHAgg MaxDIT Actual Predicted Actual Predicted projects understandability understandability modifiability modifiability
Table. 5 (continued)
282 S. Babu and R. Singh
A Model for Prediction of Understandability and Modifiability …
283
Fig. 1 Comparison of understandability values
4 Results and Discussion The neural network approach is used to predict the understandability and modifiability values through UML diagrams of the selected OOS projects. The aggregation of metrics values to predict the values of two factors is done by considering the appropriate weights of the metrics finalized by the hidden layers of the neural network. The values of the metric set are obtained from the UML diagrams, and values of the quality factors predicted through Genero [24] model and the proposed model referred to as actual values and predicted values of factors are shown in Table 5. The values of understandability and modifiability for 24 OOS projects are shown in Figs. 1 and 2. R2 value, which is the coefficient of determination, computed for the two factors understandability and modifiability is found as 0.9603 and 0.9424, respectively. This indicates that the selected metric set is a good predictor of both the factors.
5 Conclusions and Future Scope In this work, an attempt has been made to predict understandability and modifiability of OOS using a neural network-based model. The results are validated on 24 UML diagrams of three different levels of complexity such as low-level design, mediumlevel design, and high-level design by comparing them with the results of an existing model. R2 value predicts understandability and modifiability of 0.9603 and 0.9424, respectively, which is very close to 1 indicating that the selected metric set is a good predictor of understandability and modifiability.
284
S. Babu and R. Singh
Fig. 2 Comparison of modifiability values
In the future, values of other quality factors will be predicted by selecting suitable metric sets and using other techniques like swarm optimization, fuzzy logic, and soft computing. Furthermore, the proposed model will be used to assess the quality factors for more complex and bigger projects.
References 1. Gorla N, Lin SC (2010) Determinants of software quality: a survey of information systems project managers. Inf Softw Technol 52(6):602–610, ISSN 0950-5849. https://doi.org/10.1016/ j.infsof.2009.11.012 2. Hoyer RW, Hoyer BY (2001) What is quality? Qual Prog 34(7):52–62 3. Sadeghzadeh HM, Rashidi H (2018) Software quality models: a comprehensive review and analysis. J Electr Comput Eng Innovations 6(1):59–76. https://doi.org/10.22061/JECEI.2019. 1076 4. Marchesi M (1998) OOA metrics for the unified modeling languages. In: Proceedings of 2nd Euromicro conference on software maintenance and reengineering (CSMR’98) Palazzo degli Affari, Italy, March 1998, pp 67–73: Li W (1998) Another metric suite for object oriented programming. J Syst Softw 44(2):155–162 5. Genero M (2002) Defining and validating metrics for conceptual models. Ph.D. Thesis, University of Castilla-La Mancha 6. Genero M, Piattini M, Calero C (2000) Early measures for UML class diagrams. L´Objet Hermes Sci Publ 6(4):489–515 7. In P, Kim S, Barry M (2003) UML-based object-oriented metrics for architecture complexity analysis. Department of computer science, Texas A&M University 8. Rufai R (2003) New structure similarity metrics for UML models [Master Thesis]. Computer Science, King Fahd University of Petroleum & Minerals 9. Zhou Y et al (2003) Measuring structure complexity of UML class diagrams. J Electron 20(3):227–231
A Model for Prediction of Understandability and Modifiability …
285
10. Kang D et al (2004) A structural complexity measure for UML class diagrams. In: International conference on computational science 2004 (ICCS 2004), Krakow Poland, June 2004 pp 431– 435 11. Kang, D et al (2004) A complexity measure for ontology based on UML. In: IEEE 10th International workshop on future trends in distributed computing systems (FTDCS 2004), Suzhou, China, May 2004, pp 222–228 12. Chidamber SR, Kermer CF (1994) A metric suite for object oriented design. IEEE Trans Softw Eng 20:476–493 13. Abreu FB (1995) The mood metrics set. In: Proceedings of ECOOP’95, workshop on metrics 14. Purao S, Vaishnavi V (2003) Product metrics for object-oriented systems. ACM Comput Surveys 35(2):191–221 15. Henderson-Sellers B, Constantine LL, Graham IM (1996) Coupling and cohesion (towards a valid metrics suite for object-oriented analysis and design). Object Oriented Syst 3(3):143–158 16. Diwaker C, Tomar P (2017) Identification of factors and techniques to design and develop component based reliability model. Int J Sci Res Comput Sci Eng 5(3):107114 17. Jha S et al (2019) Deep learning approach for software maintainability metrics prediction. IEEE Access 7:61840–61855. https://doi.org/10.1109/ACCESS.2019.2913349 18. Alsolai H, Marc R (2020) A systematic literature review of machine learning techniques for software maintainability prediction. Inf Softw Technol 119:106214. ISSN 0950-5849. https:// doi.org/10.1016/j.infsof.2019.106214 19. Yaghoobi T (2020) Parameter optimization of software reliability models using improved differential evolution algorithm. Math Comput Simul 177:46–62. ISSN 0378-4754. https:// doi.org/10.1016/j.matcom.2020.04.003 20. Baig JJA, Mahmood S, Alshayeb M, Niazi M (2019) Package-Level stability evaluation of object-oriented systems. Inf Softw Technol 116:106172. ISSN 0950-5849. https://doi.org/10. 1016/j.infsof.2019.08.004 21. Jayalath T, Thelijjagoda S (2020) A modified cognitive complexity metric to improve the readability of object-oriented software. In: International research conference on smart computing and systems engineering (SCSE), pp 37–44. https://doi.org/10.1109/SCSE49731.2020.931 3049 22. Mishra S, Sharma A (2015) Maintainability prediction of OOS by using adaptive network based fuzzy system technique. Int J Comput Appl 119(9):24–27 23. Kumar L, Rath SK (2016) Hybrid functional link artificial neural network approach for predicting maintainability of object-oriented software. J Syst Softw 121:170–190. ISSN 0164-1212. https://doi.org/10.1016/j.jss.2016.01.003 24. Genero M, Piatini M, Manso E (2004) Finding early indicators of UML class diagrams understandability and modifiability. In: Proceedings of 2004 international symposium on empirical software engineering, ISESE 04 2004, pp 207–216. https://doi.org/10.1109/ISESE.2004.133 4908 25. Padhy N, Singh RP, Chandra SS (2018) Software reusability metrics estimation: algorithms, models and optimization techniques. Comput Electr Eng 69:653–668. ISSN 0045-7906. https:// doi.org/10.1016/j.compeleceng.2017.11.022 26. Papamichail MD, Diamantopoulos T, Symeonidis AL (2019) Measuring the reusability of software components using static analysis metrics and reuse rate information. J Syst Softw 158. https://doi.org/10.1016/j.jss.2019.110423 27. Padhy N, Singh RP, Satapathy SC (2019) Enhanced evolutionary computing based artificial intelligence model for web-solutions software reusability estimation. Cluster Comput 22:9787– 9804. https://doi.org/10.1007/s10586-017-1558-0 28. Bajeh AO, Basri S, Jung LT, Almomani MA (2014) Empirical validation of object-oriented inheritance hierarchy modifiability metrics. In: Proceedings of the 6th international conference on information technology and multimedia, pp 189–194. https://doi.org/10.1109/ICIMU.2014. 7066628 29. Yadav N, Saraswat P, Tripathi RP (2017) Estimating the functionality of object oriented system using MCDM approach. In: Fourth international conference on image information processing (ICIIP), pp 1–6. https://doi.org/10.1109/ICIIP.2017.8313683
286
S. Babu and R. Singh
30. Sharma K, Dubey SK, Gaurav P, Prachi (2020) Functionality assessment of software system using fuzzy approach In: 2020 8th International conference on reliability, infocom technologies and optimization (trends and future directions) (ICRITO), pp 1206–1209. https://doi.org/10. 1109/ICRITO48877.2020.9197795 31. Diwaker C et al (2019) A new model for predicting component-based software reliability using soft computing. IEEE Access 7:147191–147203. https://doi.org/10.1109/ACCESS.2019.294 6862 32. McCall JA (1977) Factors in software quality. US Rome Air development center reports 33. Boehm BW (1978) Characteristics of software quality 1. North-Holland, Amsterdam, p 169 34. Dromey RG (1995) A model for software product quality. IEEE Trans Softw Eng 21(2):146– 162. https://doi.org/10.1109/32.345830 35. ISO/IEC 9126-1 (2001) Software engineering—product quality–part 1: quality model. International Organization for Standardization, Geneva, Switzerland 36. ISO/IEC 25000 (2014)System and software engineering—systems and software quality requirements and evaluation (SQuaRE). International Organization for Standardization, Geneva, Switzerland
Agricultural Insect Pest’s Recognition System Using Deep Learning Model Sapna Dewari, Meenu Gupta, and Rakesh Kumar
Abstract In India, around 70% of the population is dependent on farming as it has a large arable land, while other countries are dependent on seafood. India not only does farming for food but also solves the purpose of employment as agriculture contributes around 20% of the GDP. Agriculture has the main concern of pests who harm the crops at an enormous rate. The average productivity of many crops in India is quite low. In past decades, farmers used many pesticides that harm the crop and land to solve these issues. So, a need for early detection and classification of pests may significantly decrease pest-related losses. Due to the sheer difference in the photo collecting direction, position, pest size, and challenging image backdrop, humongous pest recognition is among the most essential aspects of pest management in outdoor situations. To resolve these, a model is proposed based on the EfficientNetB4 deep CNN model. The suggested model is tested using the IP102 dataset (102 species) and approaches such as data preprocessing, data balancing, and feature extraction. The suggested EfficientNetB4 model achieved classification accuracies of 95%. All of the results indicate that the suggested approach provides a reliable option for recognizing insect pests in the field and enabling special plant treatment in agricultural production. Keywords Insect recognition · Pest detection · CNN · Deep learning · EfficientNetB4
S. Dewari · M. Gupta (B) · R. Kumar Department of Computer Science and Engineering, Chandigarh University, Mohali, Punjab, India e-mail: [email protected] S. Dewari e-mail: [email protected] R. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_22
287
288
S. Dewari et al.
1 Introduction India is a growing country wherein the majority of the population resides in rural regions and relies on agriculture for living life. India not only does farming for food but also for jobs since agriculture accounts for around 20% of the country’s GDP [1]. However these rural residents have been living the same way they did many years ago as a result, many crops in India have relatively low average production. This is due to the numerous issues that farmers are currently facing. One of the most fundamental issues is the pest that destroys the majority of crops. As most agricultural employees lack academic exposure to an extensive variety of pests, it is impossible to accurately match the relevant type of pesticides. Plant pests come in several forms, and each of these can cause economic, social, and ecological harm. For early detection of pests, quick and accurate pest identification is critical for reducing the loss of productivity and quantity of agricultural goods. Previously, professionals such as botanists and agricultural engineers carried out pest identification operations by visual examination and later in a laboratory environment [1], which was time consuming and unproductive. Now models are being built utilizing machine learning and image processing to identify pests in the basic background rather than the complex backgrounds observed in the field [2, 3]. Scales, varied attitudes, the complexity of visual backgrounds, and lighting are the key problems in pest localization and recognition in practice. These old systems usually have low accuracy and inadequate resilience in real pest monitoring that has a complex background. Automatic pest or disease localization and detection have been a research hotspot in recent years due to the rapid development of computer vision technology [4]. Advancements in deep learning approaches based on deep learning have resulted in highly promising improvements in the object, localization, and recognition under natural situations [5, 6], such as CNN, regional-based CNN (RCNN), Faster-RCNN, and You Only Look Once (YOLO), GoogleNet, and other extended variations of these networks. This paper, further classified into sections such as Sect.2, examined many scholars’ perspectives on the early detection of pests. Section 3 discusses the data collection and model formulation and also addresses several matrices and methods. Section 4 presents the evaluation of outcomes, and Sect. 5 concludes this study by discussing its potential scope.
2 Related Work The application of deep learning approaches to gathering crop critical information for identification has been extensively studied to offer correct information for future spray management and therefore improve the survival rate and production of vegetables, fruits as well as field crops.
Agricultural Insect Pest’s Recognition System Using Deep Learning Model
289
Monis et al. [7] suggested an effective model known as ExquisiteNet to identify the invasive species. The majority of ExquisiteNet was broken down into two distinct chunks. The first was a double fusion with a squeeze-and-excitation-bottleneck block, while the second was referred to as a max feature expansion block. The effectiveness of the model was evaluated using a standard pest dataset known as IP102. The model outperformed ResNet101, ShuffleNetV2, MobileNetV3-large, and others with an accuracy of 52% on IP102 without any data augmentation. Xin and Wang [8] used the EfficientNetB0 network that enhanced categorization and detection of insects. Early-stage insect attacks may be quickly and successfully countered with proper responses. Deng et al. [9] After locating the RoI with a saliency model, the invariant properties that define the pest’s image were collected and the model was trained with an SVM classifier. This resulted in an 85.5% identification rate. Xie et al. [10] created a dictionary matrix to perform species categorization and employed sparse decomposition. They got good results when applied to the categorization of 24 common insect species. Souza et al. [11] proposed a CNN classification technique focused on transfer learning technology to detect leaf features of soybean diseases and insect pests and also to reduce image model sophistication and alleviate overfitting caused by insufficient data. Experiments on soybean pests show that the technique is extremely precise and robust. Yang et al. [12] focused on identifying insect pests in maize fields. It includes a unique dataset of field-based photos for major as well as secondary insect pests, including both original and enhanced versions of the photographs so that supervised classification may be performed on them. Additionally, it suggests an adjustment to a residual Inception-V3 model. This adjustment made the learning process more quickly while also improving reliability. The suggested Inception-V3* model obtained the highest prediction performance possible of 97.0% when employing cross-validation on an average basis for all of the different classes of insect pests. Khan et al. [13] merged spatial and channel attention mechanisms, and a CNN model for pest detection and identification was proposed. CNN architecture was first implemented using the spatial transformer networks (STN) module, which provides picture cropping as well as scale normalization of the suitable area, making categorization easier. The second was implemented using enhanced split attention networks and was employed in the distribution of feature map attention among feature maps. The classification accuracies of 78, 96.50, and 73.29% were achieved using three distinct datasets: Li’s dataset with ten species, the suggested dataset with 58 species, and indeed the IP102 dataset with 102 species. Wu et al. [14] described an insect pest recognition system based on deep transfer learning models using the IP102 dataset. Before training, fine-grained saliency was utilized to draw out regions of interest from a crowded environment. Inception-V3 and VGG19 models exhibit greater accuracies of 81.7% and 80%, respectively. The comparison between the proposed topics and existing methods is given in Table 1.
290
S. Dewari et al.
Table 1 Comparison of proposed technique performance with existing methods Paper Refs.
Dataset USED
Techniques
Performance
Monis et al. [7]
IP102
ExquisiteNet
52.32% accuracy
Souza et al. [11]
Soybean diseases and insect pests selected from plant village
Transfer learning and SVM
The experiment shows that transfer learning works better than the CNN model by 5%
Yang et al. [12]
Li’s dataset, suggested Merging spatial and dataset IP102 data channel attention mechanisms using the CNN model
Accuracy of 78% on Li’s dataset, 96.50% on the suggested dataset 73.29% on IP102 data
Wu et al. [14]
IP102
Inception-V3, VGG19
Accuracy of 81.7% by Inception-V3 and 80% using VGG19
Proposed method
IP102
EfficientNetB4
95% accuracy
3 Proposed Methodology This part will go through the dataset that was utilized for model formulation along with the suggested framework. The IP102 insect dataset was used in this study to detect pests. To eliminate bad pictures, the dataset is balanced, and to remove duplicate images, several data preparation methods are utilized, as well as data augmentation and feature extraction approaches are used. The data is then fitted to an EfficientNetB4 deep CNN-based model, and the model is trained as shown in Fig. 1 Finally, the model is evaluated on the test set, and its performance is quantified regarding accuracy, precision, recall, and F 1 -score.
3.1 Dataset Used The present research used the IP102 insect picture collection. The IP102 is taken from [15], which included 102 different kinds of agricultural crop pests and 75,222 photos
IP102 Dataset
Data Preprocessing
Fig. 1 Proposed model
Data Batancing
Data Augmentaon
Feature Extracon
EfficientNetB4 Model
Agricultural Insect Pest’s Recognition System Using Deep Learning Model
291
Fig. 2 Plotting of images
and furthermore, for object recognition, annotated 19,000 photos with bounding boxes. Figure 2 shows some insect images from the IP102 dataset.
3.2 Data Preprocessing Data preprocessing is often used to convert raw data into a format that the ML model can manage, as well as to enhance the model’s quality. The following are the methods used for data preparation: • • • • •
Dropping bad images Balancing the data Removing duplicate images Data augmentation Feature extraction.
3.3 Proposed Algorithms (EfficientNetB4) This paper provides a strategy that is based on the EfficientNetB4 CNN model [16]. A specific iteration of the EfficientNet family has opted because it strikes a healthy balance between the number of processing resources required and the level of precision desire [17]. The models are typically created with either a large amount of depth, a high resolution, or both. Increasing these characteristics initially makes the model better, but as soon as saturation is reached, it only adds additional parameters to an ineffectual model. In EfficientNet, where everything is grown progressively, they are scaled more carefully. Important layers used in this model are explained below. • Input layer: Charging the input powers the convolutional layers. A variety of adjustments, such as feature scaling, mean subtraction, and efficient information augmentation, may be implemented. • Convolution layer: The three operational unit stages that are responsible for the vast bulk are the convolutional filters, pooling functions, and activation functions.
292
S. Dewari et al.
• Dropout layer: In this manner, the network is less likely to be over-fitted, and it is possible to learn more trustworthy qualities. • Pooling layer: In this layer, neurons are arranged in clusters with one another. This approach zeroes down on the most vital components of the incoming data and gets rid of any superfluous details that might be misleading. The act of pooling may be carried out in several different ways. When pooling, either “average pooling” or “max pooling” is used, in which the method selects the neuron value that is the greatest among all of them. • Fully connected layers: For each layer’s input, there was a single vector output that was processed and used to construct the final result. This was done to achieve the final result. • Dense layer: Here the layers are related to one another in such a manner that the neurons of one layer are linked to the neurons in all of the other layers. This layer is the one that is used in ANN networks the vast majority of the time. • Rectified linear unit (ReLU): After a layer of convolution, the linear rectified unit will go on to the ReLU layer, which is the next layer. This layer substitutes a zero for the layer of negative convolution, which allows the training process to be completed much more quickly. • SoftMax layer: This is the last layer that has to be applied before the output layer, which is the final layer. At this level, each class is given a probability, expressed as a decimal, of becoming victorious. Probabilities are represented by numbers ranging from zero to one. As a base network, EfficientNet utilizes convolutional neural networks that have been pre-trained to perform image-related tasks (refer Fig. 3). These base networks can learn from a diverse collection of datasets, allowing for the creation of more specific models using constrained sets of training data more expediently. Such networks are helpful for tasks like the categorization of pictures and the detection of COVID, which gives advantages for high-usage scenarios as well as the development of models that are more accurate and effective [17]. Summary of the EfficientNetB4 model’s parameters: • • • • • • • • • •
Epochs-40 Optimizer—AdamX Loss = categorical cross-entropy Learning rate—1e − 5 Patience = 1 Stop patience = 3 Threshold = 0.9 Factor = 0.5 Dwell = True Freeze = False.
Agricultural Insect Pest’s Recognition System Using Deep Learning Model
293
Fig. 3 EfficientNetB4 architecture [16]
Loss Function (Categorical Cross-entropy Loss) A loss function known as categorical cross-entropy is used for classification. Equation 1 depicts the loss function that has been calculated. L=−
N c
yi j log yˆi j
(1)
i=1 j=1
Adam Optimizer Adaptive moment estimation is a technique for optimizing gradient descent. The method is very useful when dealing with a large problem that has a large number of data or parameters. It uses less memory and is more effective. The flowchart shows the whole steps of the research methodology in Fig. 4.
4 Results In this research, full experimental material is supplied, including the outcomes of each test as well as the interpretation of these data. This section covers the experiment outcomes. This displays the findings of the research, which was carried out using the Python programming language and the Jupiter notebook environment. The experiment’s results are presented in a variety of graphs, metrics, and tables.
294
S. Dewari et al.
Fig. 4 Flow diagram of proposed work
4.1 Evaluation Metrics In the process of putting machine learning into practice, one of the most important tasks is assessing the effectiveness of various algorithms. The data is initially partitioned into training, validation, and testing data to prepare it for the use of machine learning techniques. The training dataset is used to create a model and then to make predictions about the labels in the test dataset. Following that, the model’s performance is evaluated by comparing the various predicted labels to the actual labels to determining model performance using a variety of assessment metrics. The confusion matrix’s four major components are True Positive (TP), True Negative (TN), False Negative (FN), and False Positive (FP). The accuracy is used to calculate the fraction of total predictions that are correctly classified. Acc =
TP + TN TP + TN + FN + FP
(2)
Agricultural Insect Pest’s Recognition System Using Deep Learning Model
295
Precision, also known as a positive predictive value, is the proportion of correctly diagnosed positive instances that are truly positive. Precision =
TP TP + Fp
(3)
Recall, also named sensitivity or true positive rate, is a fraction of true positive cases that are classified as positive. Recall =
TP TP + FN
(4)
The F 1 -score is one of the statistics produced by such a combination. The F 1 -score represents the harmonic mean of recall and accuracy. F1 - score =
1 precision
2 +
1 Recall
=2∗
Precision ∗ Recall Precision + Recall
(5)
The model’s accuracy during training and validation is shown in accompanying Fig. 5, which uses 0–10 epochs. The proposed model is trained using the training data and, afterwards, examined to check the performance on both the training data and the validation data (the accuracy measure is used for evaluation).
Fig. 5 Comparison between training and validation accuracy
296
S. Dewari et al.
Fig. 6 Comparison between training and validation loss
The training and validation losses of the model are shown in Fig. 6, which uses 0–10 epochs. Therefore, it can be assumed that the efficiency of one number epoch is superior to that of another era that was supposed to obtain a 0.1 loss in terms of its superiority. Both the training loss and the validation loss reflect how well the model fits the data that was used for training, but the validation loss also reflects how well the model fits fresh data. The outcomes of visualizing dataset are shown in Fig. 7. This figure demonstrates that the proposed model can concentrate on target more successfully when using the CNN technique, as well as also that the categorization feature is more completely reliant on the insect itself than on the backdrops it is observed in. The majority of the photos in the IP102 dataset have pretty low image quality, even those with hazy targets and advertising phrases, among other things. The parameter performance of EfficientNetB4 model is given in Table 2 and Fig. 8. In Fig. 6, x-axis shows the number of parameters that are accuracy, precision, recall, and F 1 -score, and the y-axis shows the model percentage. The model achieves approx. 95% accuracy, 95% precision, 95% recall, and 95% F 1 -score.
Agricultural Insect Pest’s Recognition System Using Deep Learning Model
297
Fig. 7 Predicated image
Table 2 Parameter performance Model
Accuracy
Precision
Recall
F 1 -Score
Proposed EfficientNetB4
0.95
0.95
0.95
0.95
Proposed EfficientNetB4 1 0.8 0.6 0.4 0.2 0 Accuracy
Precision
Recall
F1-Score
Fig. 8 Bar graph of model performance
5 Conclusion and Future Work Traditional approaches of artificial identification as well as machine learning for crop disease and insect pest detection need sophisticated data preparation, and the fitting
298
S. Dewari et al.
degree of the model varies widely owing to the benefits and drawbacks of the features, resulting in a poor recognition impact. As a result, this study provides a deep learningbased detection model of agricultural diseases and insect pests from the standpoint of ecological environment protection. The CNN-based EfficientNetB4 model was used in this research to develop a pest identification framework. To test the suggested approach, a public dataset IP102 was created. The results of the testing showed that the proposed technique surpassed current conventional models, achieving a new standard on the IP102 dataset, 95%. Furthermore, after multiple trials with the IP102 dataset of varying picture sizes, an appropriate image resolution was identified. The proposed model has a high potential for pest identification in the agricultural environment, according to the findings. More emphasis should be made in future on fine-grained insect detection as well as the impact of long-tail distribution in insect datasets. Many aspects of agricultural pest identification study remain unexplored, such as how to swiftly compute the effectiveness area of disease as well as estimate the severity of disease as well as insect pests in a region, to carry out an effective treatment and avoid large-scale economic losses. These are still pressing issues that must be addressed in pest control activity, which is also the next significant research subject.
References 1. Pavela R (2016) History, presence and perspective of using plant extracts as commercial botanical insecticides and farm products for protection against insects–a review. Plant Prot Sci52(4):229–241 2. Ebrahimi MA, Khoshtaghaza MH, Minaei S, Jamshidi B (2017) Vision-based pest detection based on SVM classification method. Comput Electron Agric137:52–58 3. Durgabai RPL, Bhargavi P (2018) Pest management using machine learning algorithms: a review. Int J Comput Sci Eng Inform Technol Res (IJCSEITR)8(1):13–22 4. Lima MCF, de Almeida Leandro MED, Valero C, Coronel LCP, Bazzo COG (2020) Automatic detection and monitoring of insect pests—a review. Agriculture10(5):161 5. Fuentes A, Yoon S, Kim SC, Park DS (2017) A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors17(9):2022 6. Ngugi LC, Abelwahab M, Abo-Zahhad M (2021) Recent advances in image processing techniques for automated leaf pest and disease recognition–a review. Inf Process Agric 8(1):27–51 7. Monis JB, Sarkar R, Nagavarun SN, Bhadra J (2022) Efficient Net: identification of crop insects using convolutional neural networks. In: 2022 International conference on advances in computing, communication and applied informatics (ACCAI). IEEE, pp 1–7 8. Xin M, Wang Y (2020) An image recognition algorithm of soybean diseases and insect pests based on migration learning and deep convolution network. In: 2020 International wireless communications and mobile computing (IWCMC). IEEE, pp 1977–1980 9. Souza WS, Alves AN, Borges DL (2019) A deep learning model for recognition of pest insects in maize plantations. In: 2019 IEEE international conference on systems, man and cybernetics (SMC). IEEE, pp 2285–2290 10. Deng L, Wang Y, Han Z, Yu R (2018) Research on insect pest image detection and recognition based on bio-inspired methods. Biosyst Eng169:139–148 11. Alves AN, Souza WS, Borges DL (2020) Cotton pests classification in field-based images using deep residual networks. Comput Electron Agric174:105488
Agricultural Insect Pest’s Recognition System Using Deep Learning Model
299
12. . Yang X, Luo Y, Li M, Yang Z, Sun C, Li W (2021) Recognizing pests in field-based images by combining spatial and channel attention mechanism. IEEE Access9:162448–162458 13. Khan MK, Ullah MO (2022) Deep transfer learning inspired automatic insect pest recognition. In: Proceedings of the 3rd international conference on computational sciences and technologies. Mehran University of Engineering and Technology, Jamshoro, Pakistan pp 17–19 14. Wu X, Zhan C, Lai YK, Cheng MM, Yang J (2019) Ip102: a large-scale benchmark dataset for insect pest recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8787–8796 15. Goodfellow I, Bengio Y, Courville A (2016). Deep learning. MIT press 16. Heravi EJ, Aghdam HH, Puig D (2016) Classification of foods using spatial pyramid convolutional neural network. In: CCIA, pp 163–168 17. Chung DTP, Van Tai D (2019) A fruits recognition system based on a modern deep learning technique. In J Phys Conf Ser 1327(1):012050)
Seq2Code: Transformer-Based Encoder-Decoder Model for Python Source Code Generation Naveen Kumar Laskari , K. Adi Narayana Reddy , and M. Indrasena Reddy
Abstract Deep learning for Natural Language Processing has advanced at breakneck speed. Deep learning techniques have recently demonstrated outstanding results in machine translation, question answering, and computer vision applications. Transformer-based architectures, such as BERT, and GPT produced results that are state of the art across all the tasks in NLP. BERT uses an encoder transformer, and GPT uses a decoder transformer architecture for different kinds of NLP tasks. This paper introduces Seq2Code, a transformer-based model, that generates source code in Python for the corresponding problem statement in natural language. We generate source code corresponding to the input statement using an encoder-decoder-based transformer model with multi-head attention. We have collected more than 10,000 pairs of problem statements and code snippets. Special characters play a more significant role in code snippets. We have trained embeddings for all the special characters. The model performance is evaluated using perplexity. The Seq2Code proposal outperformed the existing models. Keywords Source code · Python · Encoder · Decoder · Attention
1 Introduction The computational language models gained significant attention in recent times due to their widespread applications. Language modeling applications help in minimizing human effort in writing as well as improve the quality of writing and correct the faults in writing. In addition, a language model can be used to generate the computer program code for the language description. The automatic source code generation for the specific requirement will increase productivity in software development. A properly constructed automated code generation system can generate well-structured, modularized, object-oriented, and functional code that requires less manual work to maintain. N. K. Laskari (B) · K. A. N. Reddy · M. Indrasena Reddy BVRIT HYDERABAD College of Engineering for Women, Bachupally, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_23
301
302
N. K. Laskari et al.
With the immense growth in computing power and data availability, neural network-based models are showing remarkable results across domains and applications. The deep neural network architectures were used across all kinds of data, and environments. For example, RNNs are popularly utilized for sequence prediction tasks. RNN is popularly used in encoder-decoder architecture. At the encoding, the recurrent structure of RNN processes the input sequentially. This makes it inefficient in processing because it cannot process the input parallelly. To overcome the limitation and achieve parallelism, [1] proposed a transformer-based architecture with attention mechanisms. It has been proved that the transformer-based model could significantly improve model performance for various NLP tasks [2, 3]. Two kinds of attention mechanisms are embedded into transformer architecture. First, “self-attention is the significant component that relates the tokens and their positions within the same input sequence” [1]. On the other hand, “multi-head attention splits the input into fixed-size segments and computes scaled-dot product attention over each component in parallel” [1]. This paper presents a model with a multi-headed attention mechanism for source code generation. The rest of the paper is organized as follows: In this section, we define the problem definition, and Sect. 2 presents the related work carried out toward source code generation. Then, in Sect. 3, we present the Seq2Code model and the methodology used. Finally, Sect. 4 elaborated on results and discussion followed by a conclusion and future scope.
1.1 Problem Definition The seq2seq problem is described as Natural Language Generation, which is the sub-field of NLP. In similar lines, given a natural language problem description as a sequence of words, generate the Python source code. This model construction is defined as finding a suitable model which translates the input sequence to a target sequence representing the source code. For example, the sequence of n-words with a specified vector can be considered as X 1:n : {X 1 . . . X n }
(1)
The seq2code is solved by mapping a function f from an input sequence of n vectors to a sequence of m target word vectors, where m, the length of the target, is dependent on the input sequence. f : {X 1:n → Y1:m }
(2)
We are building a model that converts the sequence of natural language text into target source code.
Seq2Code: Transformer-Based Encoder-Decoder Model for Python …
303
2 Related Work Deep learning algorithms have shown remarkable results in numerous tasks of NLP. Machine translation, question answering, and text summarization are popular NLP applications solved using deep learning [4–8]. Source code generation using natural language description as input is another task in NLP. The source code for the input description is created using a variety of methodologies, including sequence-to-sequence-based encoder-decoder model and transformer-based model [9–17]. Phan et al. [12] implemented a system named CoText. CoText is multitask learning which is used for both source code-to-description generation and description-to-source code generation. The authors used the encoder-decoder model to achieve the functionalities such as defect detection and code debugging. Svyatkovskiy et al. [18] implemented a transformer architecture for code completion. The code completion model is capable of predicting a sequence of code tokens and generating entire lines of syntactically correct code fragments. The authors built and tested the models with Java language with CONCORDE dataset considering the length of input and output is 256. Devlin et al. [3] defined the problem of dealing with natural language and program language together as PLUG (Program and Language Understanding and Generation). The authors build PLBART-Programming Language BART, a bi-directional and autoregressive transformer pre-trained on unlabeled data across PL and NL to learn multilingual representations. The authors conclude that CodeBERT and Graph-CodeBERT outperformed the task of code understanding and code generation tasks. A “novel tree-based neural network architecture titled TreeGen for code generation” proposed by Sun et al. [16] The model utilizes the attention mechanisms of the transformer to capture the long-term dependency and introduced an “AST reader to incorporate grammar rules and AST structures into the network” [16]. The authors reported their model evaluations on various benchmarks to demonstrate that the models outperformed a reasonably good percentage. Perez et al. [13] built a machine learning-based model, which generates the Python source code using the CodeSearchNet dataset. The dataset consists of 2 million pairs of code snippets ranging from languages Python to JavaScript, PHP, Java, Go, and Ruby. The model devised by the authors considered the Python code snippets with a language model which was pre-trained. The model is fine-tuned with the small version of GPT-2 with 117 million parameters.
3 Methodology In order to implement the transformer-based seq2code, the following procedure shown in Fig. 1 is followed.
304
N. K. Laskari et al.
Fig. 1 System architecture
3.1 Dataset and Preparation Data-intensive, data-driven deep learning models consume a lot of data. The sequence-to-sequence model requires a source-target pair, where the source is a natural language description, and the target is a Python source code. Therefore, we prepared 10,000 description-source code pairs for training and testing the model. The sample of two observations is presented in Table 1. First, it is observed that the dataset has a problem description consisting of numbers, special symbols, and extra spaces. The source code also has comment lines for some parts of the code, causing ambiguity when splitting the data with the character #, so the comment lines across all the programs are removed to resolve the ambiguity. Furthermore, the description and source code have added new lines and empty lines in between; the pre-processing step removed these characters. Finally, the embedding vector is given as input to the encoder stack. The word2vec embedding has been generated using Gensim with a 256-dimension vector and fed into the model. In the target source code, we have a special character set usually found in programming languages, and embedding is also generated for the special characters that appear in the dataset.
Seq2Code: Transformer-Based Encoder-Decoder Model for Python …
305
Table 1 Examples of description-code pairs from the dataset Example: 1
Example: 2
Description: # write a Python program to add two numbers
Description: # write a Python program to find the largest among three numbers
num1 = 1.0 num2 = 5.0 sum = num1 + num2 print(f’sum{sum}’)
num1 = 10 num2 = 50 num3 = 14 if (num1 > = num2) and (num1 > = num3): largest = num1 elif (num2 > = num1) and (num2 > = num3): largest = num2 else: largest = num3 print(largest)
Architecture. For most of the tasks, attention has become a required component in neural network architectures. As part of machine translation, an attention mechanism is proposed to align and predict the output word based on the input. Transformerbased models using attention mechanisms proved that they outperformed sequencebased models [1]. The “transformer-based architecture consists of encoder and decoder layers to process input and generate the output depending on the kinds of tasks we deal with” [1]. The stack of encoder layers generates the context vector from the source input. The “decoder stack converts the context vector representation into target output” [1]. Special Character Embeddings. In general, in other NLP tasks during the data pre-processing step, all the special characters such as hashtags, semicolons, curly brackets, square brackets, new lines, and tab space are removed. But, in source code generation, these special characters are very vital in generating the code at humanlevel accuracy. Therefore, to boost the model’s performance and generate better code, we also generated customized embedding for all the special characters. The customized embeddings are generated using the skip-gram approach with a window size of 3, embedding dimension of 256, and a minimum count of 2. Encoder-Decoder. In essence, the encoder reads the input sentence, makes sense of it, and passes the context vector to the decoder. The encoder of the transformer consists of multiple blocks of the same layers. The bottom layer is taking position embedding and the input embedding. Then, the concatenated representation is fed into the multi-head attention and normalization layer, followed by the feed-forward layer. A total of four stacked layers are included in the encoder of the resultant model. The last transformer block’s output is considered a context vector, which inherits the input sequence as a context vector. Similar to the encoder block, the decoder consists of multiple blocks of the same layers. Each layer has masked multi-head attention, multi-head attention, and feedforward layers; the output of the last decoder layer is fed into the softmax layer to produce the target output.
306
N. K. Laskari et al.
Multi-head Attention Mechanism. “An attention function can be described as mapping a query and a set of key-value pairs to an output, where query, keys, values, and output are all vectors. The concept of attention is demonstrated in Fig. 2. The output is computed as a weighted sum of the values, where a compatibility function of the query computes the weight assigned to each value with the corresponding key” [1] “The basic attention is performed parallelly with different dimensions. The results are concatenated and projected again. We compute the multi-head attention for the given query Q and key-value pair (K, V )” [1, 19] as follows. QK T V Attention(Q, K , V ) = Softmax √ dk headi = Attention QWiQ , K WiK , V Wiv
(4)
MultiHead(Q, K , V ) = Concat(head1 , head2 , . . . headn )
(5)
(3)
where WiQ , WiK , Wiv are learnable parameters, and dk is the dimension of the key K [1]. For each time step, the attention distribution over the query and keys is computed as qt K T (6) ai = Softmax √ dk
Fig. 2 Multi-head attention
Seq2Code: Transformer-Based Encoder-Decoder Model for Python …
307
3.2 Evaluation Metrics The model’s performance was evaluated by comparing machine-generated Python source code against ground truth Python source code. BLEU, ROUGE, and perplexity are the popular evaluation measures used for language models. Perplexity is a metric for determining how effectively a probability distribution or probability model predicts a given sample. For example, the model predicts a word in sequence {W1 , W2 , . . . Wm } as source code from the learned vocabulary, and the probability of Wi depends on the previous tokens {W1 , W2 , . . . Wi−1 }. The inherent per-word entropy of an information source measures the average amount of non-redundant information delivered by each new word, defined in bits as: 1 Hˆ = − lim m→∞ m
P(w1 , w2 , . . . , wm ) log2 P(w1 , w2 , . . . , wm )
(7)
w1 ,w2 ,...,wm
The summation carries words of all sequences. If the source is ergodic, the summation over all potential word sequences can be omitted, and the equation is simplified to 1 log2 P(w1 , w2 , . . . , wm ) Hˆ = − lim m→∞ m
(8)
It is plausible to presume ergodicity based on the fact that humans can use language without knowing all of the words that have ever been uttered or written, and we can disambiguate words based on only the most recent bits of a conversation or piece of text. It is approximated as −1 log2 P(w1 , w2 , . . . , wm ) Hˆ = m
(9)
This last estimate is feasible to evaluate, thus providing the basis for a metric suitable for assessing the performance of a language model. “Considering a language model as an information source, it follows that a language model which took advantage of all possible features of language to predict words would also achieve a perword entropy” [20]. Perplexity is one such measure that is in standard use, defined such that: P P = 2H
(10)
ˆ 1 , w2 , . . . , wm ) −1 m P P = P(w
(11)
308
N. K. Laskari et al.
Perplexity can be considered as a measure of how many different equally most probable words can follow any given word. Lower perplexities represent better language models.
4 Result Analysis 4.1 Experimental Setup A transformer model with a four-layered encoder and four-layered decoder stacks has been set up to experiment with generating the Python source code for the input sequence. PyTorch 1.0 is the framework used to build the neural network model, create data loaders, and train the model. In the encoder and decoder layers, the number of heads used is 8. The hidden dimension is assigned as 256, and the encoderdecoder pf-dimension is set to 512. The model was added with a dropout of 0.1 to avoid overfitting.
4.2 Result Discussion The model trained for 50 epochs varying the input, such as with special character embedding and without special character embedding. After the 50 epochs of training, the best perplexity for training and validation data is 1.098 and 155.2, respectively. In addition, the model has been trained with cross-entropy as the loss function. After the 50 epochs, training and testing losses are 0.092 and 5.234, respectively.
5 Conclusion and Future Scope This paper presents a seq2code—transformer-based encoder-decoder model to generate the Python source code for the given sequence of words as input. The manually annotated 10,000 sequences and source code pair were used as the dataset to do the model training and evaluation. Data pre-processing and tokenization have been applied, initializing unique tokens for keywords, indentation, and new line characters. In addition to the embedding creation for all the words in the text, for special characters also embedding creation has been applied. The transformer-based encoder-decoder model could generate the Python source code for the given sequence of tokens. The self-attention and multi-head attention parts of the encoder and decoder layers could ideally generate code. The performance of the model was evaluated using perplexity.
Seq2Code: Transformer-Based Encoder-Decoder Model for Python …
309
References 1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30 2. Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser Ł (2019) Universal transformers. In: 7th International conference on learning representations, ICLR 2019, pp 1–23 3. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019—2019 conference of the North American chapter of the association for computational linguistics: human language technologies—proceedings of the conference, vol 1, no Mlm, pp 4171–4186 4. Goyal S, Choudhary AR, Chakaravarthy V, ManishRaje S, Sabharwal Y, Verma A (2020) PoWER-BERT: accelerating BERT inference for classification tasks. Icml 2020, pp 1–14. (Online). Available: http://arxiv.org/abs/2001.08950 5. Liu Z, Huang H, Lu C, Lyu S (2020) Multichannel CNN with attention for text classification. (Online). Available: http://arxiv.org/abs/2006.16174 6. Sun Q, Wang Z, Zhu Q, Zhou G Stance detection with hierarchical attention network 7. Mohammadi A, Shaverizade A (2021) Ensemble deep learning for aspect-based sentiment analysis. Int J Nonlinear Anal Appl 12(Special Issue):29–38. https://doi.org/10.22075/IJNAA. 2021.4769 8. Zhang S, Xu X, Pang Y, Han J (2020) Multi-layer attention based CNN for target-dependent sentiment classification. Neural Process Lett 51(3):2089–2103. https://doi.org/10.1007/s11 063-019-10017-9 9. Mastropaolo A et al (2021) Studying the usage of text-to-text transfer transformer to support code-related tasks. In: Proceedings—international conference on software engineering, pp 336– 347. https://doi.org/10.1109/ICSE43902.2021.00041 10. Cruz-Benito J, Vishwakarma S, Martin-Fernandez F, Faro I (2021) Automated source code generation and auto-completion using deep learning: comparing and discussing current language model-related approaches. AI 2(1):1–16. https://doi.org/10.3390/ai2010001 11. Yang G, Zhou Y, Chen X, Yu C (2021) Fine-grained pseudo-code generation method via code feature extraction and transformer. (Online). Available: http://arxiv.org/abs/2102.06360 12. Phan L et al (2021) CoTexT: multi-task learning with code-text transformer, pp 40–47. https:// doi.org/10.18653/v1/2021.nlp4prog-1.5 13. Perez L, Ottens L, Viswanathan S (2021) Automatic code generation using pre-trained language models. (Online). Available: http://arxiv.org/abs/2102.10535 14. Le THM, Chen H, Babar MA (2020) Deep learning for source code modeling and generation: models, applications, and challenges. ACM Comput Surv 53(3). https://doi.org/10.1145/338 3458 15. Dowdell T, Zhang H (2020) Language modelling for source code with transformer-XL, pp 1–5 (Online). Available: http://arxiv.org/abs/2007.15813 16. Sun Z, Zhu Q, Xiong Y, Sun Y, Mou L, Zhang L (2020) TreeGen: a tree-based transformer architecture for code generation. In: AAAI 2020—34th AAAI conference on artificial intelligence, pp 8984–8991. https://doi.org/10.1609/aaai.v34i05.6430 17. Sellik H (2019) Natural language processing techniques for code generation, pp 1–13 18. Svyatkovskiy A, Deng SK, Fu S, Sundaresan N (2020) IntelliCode compose: code generation using transformer. In: ESEC/FSE 2020—proceedings of the 28th ACM joint Meeting European software engineering conference and symposium on the foundations of software engineering, no iii, pp 1433–1443. https://doi.org/10.1145/3368089.3417058 19. Xiao L et al (2020) Multi-head self-attention based gated graph convolutional networks for aspect-based sentiment classification. Multimedia Tools Appl. https://doi.org/10.1007/s11042020-10107-0 20. Klabunde R (2002) Daniel jurafsky/james h. martin, speech and language processing. Zeitschrift für Sprachwissenschaft 21(1):134–135
Security Using Blockchain in IoT-Based System Suman
Abstract Devices containing sensors, computing power, software, and other technologies are known as the “Internet of Things” (IoT). IoT devices may connect to and share data with one another over the Internet or another communication network. You can secure your company’s Internet-connected devices, as well as the networks they access, against cyberattacks and breaches by implementing an IoT security solution. What makes your IoT systems safe is their security discipline. Protecting IoT against attacks and breaches is one of the primary functions of security solutions for the IoT. Your IoT solution’s availability, integrity, and confidentiality are all safeguarded by IoT security. For educational institutions, mobile devices and IoT are making it feasible to strengthen campus security, monitor crucial resources, and give students more information. Smart lesson plans, rather than the dreary old ones, may be created by teachers using this new technology. According to this study, one way to protect educational institutions from cyber-assaults is via the use of IoT. Security, reliability, and performance qualities will be evaluated by comparing current studies to those that have been proposed. Keywords IoT · Blockchain · Online education system · Security
1 Introduction This course covers both the fundamentals and the more sophisticated aspects of IoT [1]. Our Internet of Things lesson is suitable for both novices and professionals alike. An IoT gadget is connected to the Internet and may be accessed and controlled through the Internet [2]. Our course covers all you need to know about IoT, from the fundamentals to more complex issues like biometrics and security cameras to the devices themselves [3–5]. The Internet of Things (IoT), which connects the real and digital worlds, is boosting human understanding, from smart school boards to medical devices that can detect Parkinson’s illness [6]. The Internet of Things isn’t Suman (B) Department of Computer Science and Engineering, UIET, MDU, Rohtak, Haryana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_24
311
312
Suman
only being utilized in your home; it is being used in practically every other aspect of life. Students may obtain new skills and information via online education by using the Internet and electronic devices such as laptops, mobile phones, laptop computers, and others [7]. As a result of online education, teachers and mentors can reach all students and teach them the skills they need more rapidly. Before the advent of online learning, students who were unable to attend class could only do it while working or living at home. In the case of a pandemic, the security and performance of a cloud-based system must be improved. The popularity of Covid-19 has risen dramatically. There is a need to increase the accuracy and performance of cloud-delivered digital content categorization, according to past studies. It is possible that cloud computing may assist to increase the security of digital data. Digital data are recognized and protected in the suggested model of remote education. Categorization is used to organize textual and visual material. Before training and evaluation, graphical material is compressed.
1.1 Blockchain A blockchain is a distributed, public, and decentralized digital ledger that records transactions across several computers in such a way that the ledger cannot be edited retrospectively without altering all following blocks and the consensus of the network. Shared databases, such as blockchain, are unique in that they store data in blocks, which are linked together by encryption. A new block is created for each new piece of information that comes in. A chronological chain of data is formed when a block is filled with data and chained to the previous block. Using blockchain, digital data may be recorded and shared without the capacity to alter it. Immutable ledgers or transaction records that cannot be altered, deleted, or destroyed are built on the foundation of a blockchain. Distributed ledger technology (DLT) is another name for blockchains (DLT). Distributed ledger, near real-time updates, chronological and time-stamped, cryptographically sealed, and programmable contracts are some of the core elements of blockchain technology [8, 9].
1.2 Internet of Thing (IoT) The Internet of Things describes physical objects with sensors, processing ability, software, and other technologies that connect and exchange data with other devices and systems over the Internet or other communications networks. Users of IoT technologies can automate, analyze, and integrate their systems at a deeper level, extend the range and increase the precision of these areas [10, 11]. Sensors, networks, and robots are all part of the IoT [12, 13], which uses both established and new technologies. Modern attitudes toward technology, as well as recent improvements in software and hardware, all contribute to the success of IoT [13, 14]. There is a huge shift in the delivery of products, goods, and services as a result of its new and advanced
Security Using Blockchain in IoT-Based System
313
elements. It is a technology that has been quietly changing our future for some time now. They are curious and want to lead an easier and more connected existence; therefore, we created the IoT to help us achieve this goal [15]. It was for this reason that we choose to make our devices smart and take care of the things that will help us be more productive. Machine learning and neural networks have been used to make precise and educated decisions by connecting the devices to each other and the Internet (complex mechanisms) [16, 17]. In IoT, artificial intelligence, connection, sensors, active involvement, and compact device use are the most significant elements [18].
2 Literature Review An overview of the present state of IoT research and practice should have been provided by Wang et al. [1]. IoT architecture, design, implementation, and evaluation research results may be presented at this renowned academic forum [1]. In a “smart city,” according to Harmon et al. [2], innovation in IT-enabled services will flourish. The city’s quality of life has been enhanced as a consequence of the use of information technology by municipal service providers and their clients. The IoT concept is critical for the development of smart cities. A cloud-oriented architecture that combines networks, software, sensors, user interfaces, and data analytics is necessary for value generation. IoT-enabled smart gadgets and the services they deliver will be critical to the emergence of smart cities. This article will investigate IoT systems in a smart city scenario and suggest a methodology for formulating strategy [2]. IoT-connected devices and services are becoming more prevalent, according to Abomhara and Køien [3]. IoT security concerns and attacks have not gone undetected. The IoT is not new to cyberattacks, but as IoT becomes a more fundamental part of our everyday lives, we must take cyber-defense more seriously. The need for IoT security necessitates a full understanding of the threats and attacks on IoT systems. One of the major aims of this study is to classify danger categories and analyze and characterize intrusions and attacks on IoT devices and services [3]. People’s concerns about IoT security were addressed in the research and analysis by Mahmoud et al. [4]. Framework’s purpose is to connect anything to the Internet of Things (IoT). Perception, network, and application layers are all integrated in a typical IoT architecture. Several security rules should be enforced at each level of the IoT deployment to ensure its safety. In order to ensure the IoT’s long-term viability, it must first confront and resolve the security issues inherent in its design. IoT layers and devices have unique security risks that have prompted a number of researchers to look for solutions. This article provides an overview of IoT security concepts, technology and security challenges, potential remedies, and future views [4]. The Internet of Things (IoT) was presented by Nurse et al. [6] as one of the most disruptive technology developments. More than 25 billion connected devices are expected to be in use by 2020, according to Gartner. Many benefits come with
314
Suman
IoT, but there are concerns regarding security and privacy of this massive network of interconnected things. Previous research has focused on how IoT may worsen an already-existing problem: insider threat. We will look at a range of new and updated attack paths to see how IoT can increase the insider-threat situation for corporations. Employees bring their own (personal) devices into the office and use them here. As a first step toward overcoming these issues, a broad study program is recommended [6]. Mandula et al. [7] demonstrated how IoT and an Arduino board and an Android mobile app may be used to perform smart horn automation. A Bluetooth-based horn automation prototype is detailed in this work, while an Ethernet-based horn automation prototype is described in an outdoor context. Connecting and managing devices connected to Internet through IP address is possible with Internet of Things (IoT), a promising technology. Smart government, smart agriculture, and smart health care might all benefit from IoT by providing services more efficiently without the need for human involvement [7]. The IoT has gotten a lot of interest since it was first proposed to connect various devices using a variety of different technologies. Security concerns were not taken into consideration throughout IoT’s rapid development over past decades. Prior to constructing new classification methods for different threats against these goals, the authors of this study explore the IoT’s security and privacy objectives. As a result, it examines how to better handle public safety concerns about IoT networks going forward, as well as future security approaches and problems [19]. It was published by Wang et al. [15] According to Tewari and Gupta, RFID tag mutual authentication in IoT settings may be achieved using an ultra-lightweight device. At the same time, the protocol aims to minimize storage and processing costs. As a consequence, we exploit vulnerability in the technique in our study. The secret key that links a database server to a tag may be obtained by an attacker in this manner. Consideration is also given to the possibility of implementing additional features [15]. Introduced in 2018 by Azmoodeh et al. Ransomware authors in an Internet of Things (IoT) architecture are more likely to target android devices with bigger processing capabilities (e.g., storage capacity). Using a machine learning approach outlined in this paper, android smartphones may be used to detect ransomware threats. We propose monitoring energy use patterns of multiple processes to distinguish ransomware. Precision, recall, and F-measure accuracies beat other techniques such as k-nearest neighbors and neural networks [18]. A review by Awan et al. [20] Cloud computing has made it possible for resourceconstrained client devices to conduct computationally intensive activities. Removing data centers is often used to transmit data and computationally intensive software to smart mobile devices. Since there are new worries about privacy and security in the cloud, the original AES algorithm needs to be enhanced to cope with the rising number of security hazards. Data owners benefit significantly from this study because of the improved security and privacy it provides. Double round key encryption is used to increase encryption speed by 1000 blocks per second using AES 128. Even though 800 blocks a second is considered the norm, a single round key has typically been
Security Using Blockchain in IoT-Based System
315
utilized. Reduced power usage, improved network load balancing, and increased network trust and resource management are all goals of the proposed approach. The proposed framework calls for the use of AES with key sizes of 16, 32, 64, and 128 bits of plaintext. While retaining a particular degree of quality, simulated results are offered to illustrate the algorithm’s suitability. Increased security and reduced resource consumption are achieved as a consequence of the proposed design [20]. Gundu et al. [21] Cloud computing is a prominent issue right now, offered for both IT experts and non-IT people. Pay-per-use services are the foundation of this system. Both service-oriented delivery methods and deployment-oriented infrastructure may be handled by the cloud. There are no clouds without data centers. Demand on cloud servers has risen as a result of increased public participation. It is always important to allocate resources in an efficient manner. A high level of service quality must be provided in compliance with the service-level agreement. Virtualization is a major factor in cloud computing success. Direct connections between many clouds are now possible thanks to multi-cloud exchanges, which allow businesses to expand their multi-cloud capabilities without sacrificing security. Using exchanges also eliminates the time-consuming and tedious procedure of setting up and establishing a public Internet connection. Multi-cloud exchanges allow businesses to use an Ethernet switching platform to connect to several cloud providers simultaneously [21]. Internet-based education has grown rapidly in recent years, as Jiang and Li [22] explained in their paper. Many schools and teachers are increasingly attempting to use online teaching methods. Big data and mobile Internet have all been used into online teaching in recent years as next generation information technology. To achieve teaching reform that incorporates computer technology and multimedia network technology, widespread use of an online teaching platform is necessary. Nowadays, online teaching platforms are being utilized. The use of mobile Internet technologies in higher education will grow significantly in the next few years. An online network teaching platform based on sophisticated speech recognition is shown here as a case study. To assure the quality of service, the identification tool is used to examine both speaker and sound analysis [22]. Bhargav and Manhar [23] looked the key relation for used the cloud was that it allows users to store and access their data from anywhere at any time while also providing all of its services at a reasonable cost. Nonetheless, because the information kept on the cloud is not directly managed by the client, security has always been a major problem with cloud computing. When a user uploads or saves data to a cloud computing service, the data owners are unlikely to be aware of the journey their information is taking. The user has no idea whether or not their data are being collected, processed, or accessed by a third party. Various cryptography algorithms have been proposed to address security concerns. This paper explored several cryptographic techniques utilized in prior work and concentrated on the fundamentals of cloud computing [23]. Alabady et al. [24] reviewed this research and looked at network security flaws, threats, attacks, and hazards in switches, firewalls, and routers, as well as a policy to mitigate those risks. The essentials of a secure networking system were covered
316
Suman
in this paper, including firewalls, routers, AAA servers, and VLAN technology. It presents a unique security model for safeguarding the network against internal and external attacks and threats in the IoT era. A test was used to explore the proposed paradigm, the findings revealed adequate security and good network performance [24]. Ghani et al. [25] provided this industry was changing the globe, from household goods to large-scale businesses. To demonstrate how ICT is being used more widely across a variety of industries (such as business, education, and health care), they introduced new technologies (such as cloud computing, fog computing, Internet of Things, artificial intelligence, and blockchain) as seeds, resulting in a huge amount of data being collected. More than 175 ZB of data will be processed annually by 75 billion devices by 2025, according to current estimates. Big data will flood the Internet as a result of the 5G technology (mobile communication technology), which will allow users to upload high-definition videos in real time [25]. Atlam et al. [26] reviewed the utilization of Internet of Things (IoT) devices and services was completely risk-free. IoT security must also be taken into mind in order to prevent an unacceptable risk of injury or physical damage to the IoT system and its components as well as to take into consideration social behavior and ethical use of IoT technology. This chapter covered IoT security, privacy, safety, and ethics. An introduction to the IoT system’s architecture and essential characteristics follows. After that, we talked about how to protect IoT devices, including the issues, requirements, and best practices. The topic of Internet of Things (IoT) privacy is also addressed, with a focus on the various threats to IoT privacy and the available solutions. They talked about IoT safety, ethics, the need for ethical design, and the problems we are facing. An examination of numerous security dangers and solutions in smart cities was conducted as a case study, with the smart cities themselves serving as an example [26]. Mabodi et al. [8] proposed MTISS-IoT, which stands for grey hole attack reduction via check node information, was the name given to the system based on the AODV routing protocol. This study proposes a cryptographic authentication-based hybrid approach. MTISS-IoT recommends a four-step strategy to verifying node trust in the IoT, testing routes, discovering grey hole attacks, and eradicating malicious attacks. Testing of the approach is done by extensive simulations in the NS-3 software environment. The MTISS-IoT technique had a false-positive rate of 14.104%, a false-negative rate of 17.49%, and a detection rate of 94.5% when a grey hole attack was undertaken [8]. Dhanda et al. [27] expressed lightweight cryptographic primitives available until 2019 were the focus of this study, which aimed to provide an in-depth and current overview. 54 cryptographic primitives, comprising 21 block ciphers, 19 stream ciphers, 9 hash functions, and 5 versions of elliptic curve cryptography, were evaluated in this research (ECC). As for chip size, power consumption, hardware efficiency, throughput, latency and figure-of-merit comparisons were made between the ciphers (FoM). AES and ECC are the best lightweight cryptographic primitives, according to the research. Lightweight cryptography still had several unsolved problems [27].
Security Using Blockchain in IoT-Based System
317
Safara et al. [10] looked at how it was suggested in this work to utilize energy efficiently by using a priority-based routing system (PriNergy). The solution was built on the RPL model, which calculates routing via tents, and the routing protocol for low-power and lossy networks. Each network slot employs timing patterns that take into consideration network traffic, audio, and picture data while sending data to the destination. Congestion was reduced as a consequence of this technique’s increased routing protocol robustness. Research shows that primer may reduce overhead, latency, and energy usage in a mesh network. The quality of service RPL technique, which is one of the most used IoT routing methods, was also outperformed by this method (QRPL). Internet of Things (IoT) priority-based routing help to remove the overhead of latency in network. [10]. Tun et al. [11] provided an in-depth look of IoT and wearable technology applications in geriatric health care as presented in this study, including the sorts of data gathered and the devices accessible. Robotics and integrated applications are two emerging areas of IoT/wearable application research that are highlighted in this article, as well as known areas of IoT/wearable applications. Research from this research might be useful to healthcare solution designers and developers in designing future healthcare strategies that better serve the elderly and enhance their quality of life [11]. Bhatt and Ragiri [5] review a comprehensive analysis of IoT security risks, included identification of existing IoT system risks, innovative security protocols, and recent security efforts, is presented in this study. Next-generation IoT systems will benefit from this paper’s revised analysis of IoT architecture in terms of protocols and standards. Protocol, standard, and supplied security model comparisons are given in accordance with IoT safety criteria. This research, which exposes the hardware, software, and data to a variety of threats and assaults, prompts the need for standardization in communication and data auditing. We have found that strategies that can cope with a variety of dangers are essential. An overview of current security research trends is provided in this article, which will assist in the development of Internet of Things security. Security characteristics of IoT-based devices might be used to improve the research community’s understanding of IoT security issues [5]. Kaur et al. [9] provided for cloud computing purposes, it was to create an infrastructure that could be relied upon to be reliable, secure, stable, long-lasting, and expandable. Resources were limited and added only as the service’s demand grew. They hope this post will help you understand how cloud computing works and how it could be used to protect your data. One of the most important roles cloud computing will play in the future of 5G services; mobile networks, as well as cyber-physical and social computing, are to reorganize and distribute diverse resources to users according to their demands. As one of the most critical cloud services, storing data in the cloud frees up users’ local storage and enables instantaneous access [9]. Parthiban et al. [28] presented this study, examined and reviewed several online learning platforms, teaching material delivery technology, and new technologies that assist students in their studies. Channels for creating a private environment may be used to avoid Internet test copying, which was the focus of this article. An online education strategy has several obstacles, including students’ opinions of e-learning
318
Suman
as more stressful and detrimental to general health and social connections. There are now ways to improve the online classroom teaching experience, making it as good as or better than the experience of one person teaching a group of people in an online classroom. Aiming for a meaningful and stress-free solution for a person, this study examined everyday teaching techniques that integrate online learning with a machine teaching approach [28]. Barrot et al. [29] review consequently, our study aims to fill in the blanks. A mixed-methods technique was used to examine the kinds and levels of online learning difficulties experienced by college students. In terms of difficulty, they were having the most trouble because of the learning environment they were in at home, not because of their technical expertise. Furthermore, the researchers found that the COVID-19 outbreak had the greatest effect on students’ psychological well-being and academic performance. Students’ most popular strategies were resource management and utilization, help-seeking, technical aptitude upgrading, time management, and environmental control. Educators, policymakers, and researchers should take note [29]. Maatuk et al. [30] looked at this research focusing on the adoption and deployment of e-learning technologies at a public institution during the COVID-19 outbreak from the viewpoints of students and instructors in order to better assess the possible challenges encountered by learnt activities. Study participants include students and faculty members from the University of Benghazi’s Information Technology (IT) department. Results were examined using descriptive-analytical and statistical methods. The student questionnaire and the instructor questionnaire were both produced and distributed independently of each other. The COVID-19 outbreak, the benefits, drawbacks, and obstacles to creating E-learning in the IT faculty were emphasized in order to accomplish the intended goals. When they crunched the numbers, they found that e-learning systems may be a boon or a curse when it comes to higher education and emergency circumstances [30].
3 Blockchain and Internet of Things There is an explosion of blockchain applications that incorporate IoT [31, 32]. As the number of IoT devices grows, hackers will have more opportunities to steal your personal information from everything from an Amazon Alexa to a smart thermostat. Blockchain-infused transparency and virtual incorruptibility of IoT’s technology [33] are used to safeguard objects “smart” from being hacked [34, 35]. By storing data in transactions and verifying those transactions with nodes, the blockchain can ensure safe connectivity between IoT devices [36–38]. As a result, sensor data can be tracked back to its source. Using a blockchain to encrypt data can potentially improve present IoT protocols [39–41]. IoT solutions can be more secure and private thanks to blockchain technology. 1. Preventing the alteration or removal of transactions.
Security Using Blockchain in IoT-Based System
319
2. Keeping a permanent record of previous steps. 3. Allowing customers to track every step of their transactions in real time. A IoT will depend on the needs of the IoT network when it comes to decide for the integration of IoT with blockchain technology [42, 43]. A hybrid approach may be the best option if you need to deploy a blockchain consensus process that requires mining. Fog computing, which has fewer computationally limited devices such as gateways, can be used for mining in this architecture.
4 Online Education System On December 31, 2019, Chinese officials in Wuhan City, Hubei Province, China, alerted the WHO of pneumonia cases [28, 29]. Disease, which was once called 2019nCoV before being renamed COVID-19, began as a mystery. A total of 44 suspected cases of the mysterious ailment were recorded in China beginning in January 2020. Several people with pneumonia-like sickness have been linked to the Huanan seafood market, which has since been closed down by authorities [30]. The outbreak was traced to a novel strain of coronavirus discovered in China. There were many different types of Corona viruses, ranging from the ordinary cold to the most serious and fatal infections. Humans can contract and transmit several of these diseases, which are usually found in animals [44]. The Covid-19 epidemic had a direct impact on every aspect of the economy. Students were unable to attend a typical classroom due to the lockdown, which had an adverse effect on their academic performance. As a result, the Indian educational establishment has chosen to provide students the opportunity to complete their education entirely online. Because of this, there will be an increase in the popularity of distance learning options, and the significance of online education will be better understood [45]. Online courses are being offered by a growing number of colleges, universities, and educational institutions. Students will be able to learn from the comfort of their own homes thanks to online courses. Students can access online courses at any time of day or night because they are available round-the-clock. Students will be able to study from virtually anywhere at any time because of the remarkable advancements in equipment and technology (such as mobile phones). This level of learning adaptability is impossible to achieve in a regular classroom setting, where students are restricted by the four walls of the classroom. As a result of the COVID-19 pandemic, schools and institutions all across the world have had to adapt to online programmers [46]. The Online Education System includes Fig. 1 audio, video, animation, discussions with teachers or mentors, and virtual instruction delivered by teachers to pupils [22]. To help students become professionals in their fields, these are methods of teaching those skills and knowledge they need. Teachers and students are able to communicate with one other using a variety of Internet resources. They include Google Meet, WhatsApp, Zoom, and other social media services [11]. Teachers can reach a large
320
Suman
Fig. 1 Online education system [22]
online university
Elearnirese archng
webinar
Online Education video lesson
online test
audio books
number of pupils at once by utilizing these resources. Teachers can teach large no. of students at the same time thanks to these resources [26].
4.1 Role of Security in Online Education System Security education is a learning endeavor focused at lowering the overall number of security flaws that are likely to arise as a result of a lack of staff understanding. Employee orientation frequently includes such training to explain each employee’s responsibility in preserving information security. The importance of security education and awareness initiatives in enforcing security awareness among firm employees is particularly noteworthy [10, 27]. In order to secure digital content, present research is making use of encryption mechanisms [44, 46]. Data are encrypted when it is scrambled so that only those with the proper permissions may decipher it. Technically, it is known as ciphertext, which is conversion of human-readable plaintext to incomprehensible text shown in Fig. 2. To put it another way, encryption is the process of making otherwise readable data appear random. For an encrypted message to be decrypted, both the sender and recipient must agree on a set of mathematical values called a cryptographic key [47, 48]. However, while encrypted data appear to be a complete mystery, encrypted data may be decrypted by anyone who has the correct key and receives encrypted data.
Security Using Blockchain in IoT-Based System
“Hello”
Plaintext
Encryption
321
“SNifgNiuk”
Ciphertext
Fig. 2 Encryption mechanisms
True security in encryption relies on keys that are so difficult to guess that a third party can’t decrypt or break the ciphertext via brute force.
4.2 Influencing Factors Several factors are influencing the performance and security of cloud-based education systems.
4.3 Security Factors In a cloud environment, there have been security breaches due to malware and external threats [25]. As a result, there is the potential for instructional information to be hacked through a network. Data access without authentication is the responsibility of hackers. The cracker, on the other hand, is in charge of decrypting the material. To ensure security, encryption methods and firewalls are widely utilized [23, 24].
4.4 Performance Factors In order to increase the system’s security, encryption methods must be utilized; however, this slows down the system in the process. The cloud environment’s performance is influenced by a variety of variables, including [24]. 1. The system’s performance is influenced by the type of transmission media used, whether wired or wireless. When compared to conventional media, wireless technologies are infamously slow. Wireless and wired systems come in a variety of configurations. 2. The amount of data that can be transferred in a given length of time is referred to as bandwidth. More bandwidth allows more data to be transported in less time.
322
Suman
3. Protocols are the rules that govern data transmission over a network. When compared to the slower transmission control protocol, user datagram protocol is much faster due to its lack of connection and lack of acknowledgment. 4. Because the applied security mechanism wastes a lot of time determining whether or not the communication is authentic, it may occasionally reduce the cloud network’s performance. 5. Depending on how far apart the transmitter and receiver nodes are, performance might be affected. When the distance between the transmitter and receiver is greater than the distance between the transmitter and receiver, the transmission time and performance suffer. A reduction in transmission time occurs as a consequence of reduced distance between the transmitter and receiver. A signal loses energy as it travels from one location to another. Attenuation is affected by distance and transmission medium. A signal regenerator is still necessary to overcome the problems presented by attenuation.
5 Problem Statement Traditional cryptography procedures were used in a number of studies on the Internet of Things, blockchain, security, and education systems. There are, however, a number of investigations of the educational system as well. A realistic answer was not found in these researches. A comprehensive picture of blockchain, IoT, and the educational system was missing from prior investigations; however, IoT may help enhance the security of the education system by introducing a new system that incorporates this technology [27].
6 Need of Research Need of IoT has been integrated into the educational system. A high-performance and safe solution may be provided by an advanced cryptography method that employs compression techniques. Compression techniques are used in the proposed work in order to lower the file size while still providing content security. For IoT data security, the suggested study is aimed at providing a more secure and adaptable solution [49]. The idea is to combine encryption and compression in IoT. Comparative data security and performance analyses have been completed. The goal of this research is to develop an IoT data security solution that is both safer and more adaptable. The idea is to combine encryption and compression in IoT. Comparative data security & performance evaluations are complete.
Security Using Blockchain in IoT-Based System
323
7 Conclusion Blockchain generation is one of the exciting research topics right now and could be used to the general public in IoT situations. Some principal motives for the use of the blockchain in education systems are its super capabilities like decentralization, immutability, protection, privacy, and transparency. It is concluded that IoTbased blockchain-based education system is providing more security as compared to conventional approaches. There is need to do more work on blockchain in order to make it more applicable in real-life applications. The probability of attack is reduced in case of hybrid approaches that are making use of encryption and blockchain.
8 Scope of Research Such studies are expected to have a substantial impact on IoT-based healthcare, education, and commercial applications, all of which need high levels of data security. As a matter of fact, many real-time systems are capable of solving real-world issues [50]. Machine learning and optimization approaches may be used to enhance these systems.
References 1. Wang P, Valerdi R, Zhou S, Li L (2015) Introduction: advances in IoT research and applications. Inf Syst Front 17(2):239–241. https://doi.org/10.1007/s10796-015-9549-2 2. Harmon RR, Castro-Leon EG, Bhide S (2015) Smart cities and the internet of things. Portl Int Conf Manag Eng Technol 2015-Sep:485–494. https://doi.org/10.1109/PICMET.2015.727 3174 3. Abomhara M, Køien GM (2015) Cyber security and the internet of things: vulnerabilities, threats, intruders and attacks. J Cyber Secur Mobil 4(1):65–88. https://doi.org/10.13052/jcs m2245-1439.414 4. Mahmoud R, Yousuf T, Aloul F, Zualkernan I (2016) Internet of things (IoT) security: current status, challenges and prospective measures. In: 2015 10th international conference for internet technology and secured transactions, ICITST 2015, pp 336–341. https://doi.org/10.1109/ICI TST.2015.7412116 5. Bhatt S, Ragiri PR (2021) Security trends in internet of things: a survey. SN Appl Sci 3(1):1–14. https://doi.org/10.1007/s42452-021-04156-9 6. Nurse JRC, Erola A, Agrafiotis I, Goldsmith M, Creese S (2016) Smart insiders: exploring the threat from insiders using the internet-of-things. In: Proceedings—2015 international workshop on secure internet of things, SIoT 2015, pp 5–14. https://doi.org/10.1109/SIOT.2015.10 7. Mandula K, Parupalli R, Murty CHAS, Magesh E, Lunagariya R (2016) Mobile based home automation using internet of things (IoT). In: 2015 International conference on control, instrumentation, communication and computational technologies ICCICCT, pp 340–343. https://doi. org/10.1109/ICCICCT.2015.7475301
324
Suman
8. Mabodi K, Yusefi M, Zandiyan S, Irankhah L, Fotohi R (2020) Multi-level trust-based intelligence schema for securing of internet of things (IoT) against security threats using cryptographic authentication. J Supercomput 76(9):7081–7106. https://doi.org/10.1007/s11227-01903137-5 9. Kaur S, Kumar Shukla A, Kaur M (2022) Cloud cryptography: security aspect. 10(05):38–42. https://doi.org/10.1109/smart52563.2021.9676300 10. Safara F, Souri A, Baker T, Al Ridhawi I, Aloqaily M (2020) PriNergy: a priority-based energyefficient routing method for IoT systems. J Supercomput 76(11):8609–8626. https://doi.org/ 10.1007/s11227-020-03147-8 11. Tun SYY, Madanian S, Mirza F (2021) Internet of things (IoT) applications for elderly care: a reflective review. Aging Clin Exp Res 33(4):855–867. https://doi.org/10.1007/s40520-02001545-9 12. Khan AA, Rehmani MH, Rachedi A (2017) Cognitive-radio-based internet of things: applications, architectures, spectrum related functionalities, and future research directions. IEEE Wirel Commun 24(3):17–25. https://doi.org/10.1109/MWC.2017.1600404 13. Akhtar F, Rehmani MH, Reisslein M (2016) White space: definitional perspectives and their role in exploiting spectrum opportunities. Telecommun Policy 40(4):319–331. https://doi.org/ 10.1016/j.telpol.2016.01.003 14. Alaba FA, Othman M, Hashem IAT, Alotaibi F (2017) Internet of things security: a survey. J Netw Comput Appl 88(Suppl. C):10–28. https://doi.org/10.1016/j.jnca.2017.04.002 15. Wang KH, Chen CM, Fang W, Wu TY (2018) On the security of a new ultra-lightweight authentication protocol in IoT environment for RFID tags. J Supercomput 74(1):65–70. https:// doi.org/10.1007/s11227-017-2105-8 16. Granjal J, Monteiro E, Silva JS (2015) Security for the internet of things: a survey of existing protocols and open research issues. IEEE Commun Surv Tutor 17(3):1294–1312. https://doi. org/10.1109/COMST.2015.2388550 17. Roman R, Alcaraz C, Lopez J, Sklavos N (2011) Key management systems for sensor networks in the context of the internet of things. Comput Electr Eng 37(2):147–159. Modern trends in applied security: architectures, implementations and applications 18. Azmoodeh A, Dehghantanha A, Conti M, Choo KKR (2018) Detecting crypto-ransomware in IoT networks based on energy consumption footprint. J Ambient Intell Humaniz Comput 9(4):1141–1152. https://doi.org/10.1007/s12652-017-0558-5 19. Andrea I, Chrysostomou C, Hadjichristofi G (2016) Internet of things: security vulnerabilities and challenges. In: Proceedings—IEEE symposium on computers and communication, vol 2016-Feb, pp 180–187. https://doi.org/10.1109/ISCC.2015.7405513 20. Awan IA, Shiraz M, Hashmi MU, Shaheen Q, Akhtar R, Ditta A (2020) Secure framework enhancing AES algorithm in cloud computing. Secur Commun Netw 2020. https://doi.org/10. 1155/2020/8863345 21. Gundu SR, Panem CA, Thimmapuram A (2020) Hybrid IT and multi cloud an emerging trend and improved performance in cloud computing. SN Comput Sci 1(5):1–6. https://doi.org/10. 1007/s42979-020-00277-x 22. Jiang Y, Li X (2020) Intelligent online education system based on speech recognition with specialized analysis on quality of service. Int J Speech Technol 23(3):489–497. https://doi.org/ 10.1007/s10772-020-09723-w 23. Bhargav AJS, Manhar A (2020) A review on cryptography in cloud computing. Int J Sci Res Comput Sci Eng Inf Technol February:225–230. https://doi.org/10.32628/cseit206639 24. Alabady SA, Al-Turjman F, Din S (2020) A Novel security model for cooperative virtual networks in the IoT era. Int J Parallel Program 48(2):280–295. https://doi.org/10.1007/s10 766-018-0580-z 25. Jan SU, Ghani DA, Alshdadi AA, Daud A (2020) Issues and challenges in cloud storage architecture: a survey. SSRN Electron J 1(1):50–65. https://doi.org/10.2139/ssrn.3630761 26. Atlam HF, Wills GB (2020) IoT security, privacy, safety and ethics. Springer International Publishing
Security Using Blockchain in IoT-Based System
325
27. Dhanda SS, Singh B, Jindal P (2020) Lightweight cryptography: a solution to secure IoT, vol 112, no 3. Springer US 28. Parthiban K, Pandey D, Pandey BK (2021) Impact of SARS-CoV-2 in online education, predicting and contrasting mental stress of young students: a machine learning approach. Augment Hum Res 6(1). https://doi.org/10.1007/s41133-021-00048-0 29. Barrot S, Llenares II, del Rosario LS (2021) Students’ online learning challenges during the pandemic and how they cope with them: the case of the Philippines. Educ Inf Technol 26(6):7321–7338. https://doi.org/10.1007/s10639-021-10589-x 30. Maatuk AM, Elberkawi EK, Aljawarneh S, Rashaideh H, Alharbi H (2022) The COVID-19 pandemic and e-learning: challenges and opportunities from the perspective of students and instructors. J Comput High Educ 34(1):21–38. https://doi.org/10.1007/s12528-021-09274-2 31. Leible S, Schlager S, Schubotz M, Gipp B (2019) A review on blockchain technology and blockchain projects fostering open science. Front Blockchain 32. Petersz GW, Panayiy E (2015) Understanding modern banking ledgers through blockchain technologies: future of transaction processing and smart contracts on internet of money 33. Dorri A, Kanhere SS, Jurdak R, Gauravaram P (2017) Blockchain for IOT security and privacy: the case study of a smart home. In: 2017 IEEE international conference on pervasive computing and communications workshops (PerCom workshops). IEEE, pp 618–623 34. Learning A (2014) Storing and querying bitcoin blockchain using SQL databases. Inf Syst Educ J 12(4):6 35. Elisa N, Yang L, Chao F, Cao Y (2020) A framework of blockchain-based secure and privacypreserving E-government system. Wirel Netw 0. https://doi.org/10.1007/s11276-018-1883-0 36. Minoli D, Occhiogrosso B (2018) Blockchain mechanisms for IOT security. Internet Things 1:1–13 37. Khan MA, Salah K (2018) IOT security: review, blockchain solutions, and open challenges. Futur Gener Comput Syst 82:395–411 38. Atzori L, Iera A, Morabito G (2010) The internet of things: a survey. Comput Netw 54(15):2787–2805 39. Yli-Huumo1 J, Ko D (2016) Where is current research on blockchain technology?—A systematic review 40. Zikratov I, Kuzmin A, Akimenko V, Niculichev V, Yalansky L (2017) Ensuring data integrity using blockchain technology. In: Proceedings of 20th conference of Fruct association 41. Gaetani E, Aniello L, Baldoni R, Lombardi F (2017) Blockchain-based database to ensure data integrity in cloud computing environments 42. Park JH, Park JY, Huh EN (2017) BlockChain based data logging and integrity management system for cloud forensics 43. Zheng Z, Xie S, Dai H, Chen X, Wang H (2017) An overview of blockchain technology: architecture, consensus, and future trends. IEEE 44. Purwanti S, Nugraha B, Alaydrus M (2018) Enhancing security on e-health private data using SHA-512. In: 2017 International conference on broadband communication, wireless sensors and powering, BCWSP 2017, vol 2018-January, pp 1–4 45. Huh S, Cho S, Kim S (2017) Managing IOT devices using blockchain platform. In: 2017 19th international conference on advanced communication technology (ICACT). IEEE, pp 464–467 46. Dorri A, Kanhere SS, Jurdak R, Gauravaram P (2019) LSB: a lightweight scalable blockchain for IOT security and anonymity. J Parallel Distrib Comput 134:180–197 47. Abimbola O, Zhangfang C (2020) Prevention of SQL injection attack using blockchain key pair based on stellar. Eur Sci J ESJ 16(36). https://doi.org/10.19044/esj.2020.v16n36p92 48. Shafay M, Ahmad RW, Salah K, Yaqoob I, Jayaraman R, Omar M (2022) Blockchain for deep learning: review and open challenges. Cluster Comput 0–32. https://doi.org/10.1007/s10586022-03582-7
326
Suman
49. Chen W, Xu Z, Shi S, Zhao Y, Zhao J (2018, December) A survey of blockchain applications in different domains. In: Proceedings of the 2018 International Conference on Blockchain Technology and Application, pp 17–21. https://doi.org/10.1145/3301403.3301407 50. Rouse M, Wigmore I (2016) Internet of things. http://internetofthingsagenda.techtarget.com/ definition/Internet-of-Things-IOT
Machine Learning, Wearable, and Smartphones for Student’s Mental Health Analysis Deivanai Gurusamy, Prasun Chakrabarti, Midhunchakkaravarthy, Tulika Chakrabarti, and Xue-bo Jin
Abstract Students are the backbone of civilization, and the world’s future is in their minds, but many students nowadays suffer from stress, depression, and anxiety. As computing technology progresses, there has been research on efficiently, costeffectively, and quickly managing mental disorders in pupils. It is unknown if these technologies have given acceptable remedies for student mental illness or if gaps remain. The research reviews articles on student mental health utilizing machine learning, wearables, and smartphones. The paper examines how technologies are employed and which have the most potential to provide solutions, as well as aspects regarding mental illness causes and those that are not. This review shows researchers how to improve student mental health services. Keywords Mental health · Students · Machine learning · Wearable · Smartphone
1 Introduction Stress degrades mental health, which is crucial to human wellbeing. Stress and depression are common from adolescence through old age and are more prevalent among students. Academic pressure and other factors cause student mental illness. In many countries, stress and depression are suicide causes. Given the gravity of the situation, the World Health Organization has created a guide [1] called “Doing What Matters in Times of Stress”, which can be highly beneficial to stressed individuals. Researchers have used computational discoveries to regulate stress. Machine D. Gurusamy (B) · P. Chakrabarti · Midhunchakkaravarthy Lincoln University College, Petaling Jaya, Malaysia e-mail: [email protected] P. Chakrabarti ITM SLS Baroda University, Vadodara, India T. Chakrabarti Sir Padampat Singhania University, Udaipur, Rajasthan, India X. Jin School of Artificial Intelligence, Beijing Technology and Business University, Beijing, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_25
327
328
D. Gurusamy et al.
learning is one of the fields that helps health care by analyzing large volumes of healthcare data [2]. On the other hand, wearable devices and sensors monitor physiology and behavior in health care. Further, smartphone apps also boost health and their use in health care is unavoidable today. Hence, the researcher is interested in machine learning, wearables, and smartphone apps. This research evaluates studies that used all three technologies for stress detection in student mental health management. The main contributions are to research machine learning, wearable gadgets, and smartphones for stress/depression management in students, to determine if these technologies provide adequate stress management, and to identify student stress sources already found so researchers can focus on other causes. Section 2 addresses similar work, Sect. 3 describes the research methodology, Sect. 4 illustrates the role of technology usage in student’s mental health, Sect. 5 discusses the study outcomes, and Sect. 6 concludes.
2 Related Work With stress and depression on the rise, numerous academics have developed technology to detect them. This has led to reviews on stress detection; [3] is about stress detection devices, [4] is about stress induction tactics, and [2, 5–8] are about stress detection and depression prevention in real life. Yogeswaran and Morr [9] assessed just three studies on mental health interventions for medical students, and [10, 11] cover self-guided therapies for college students. No review paper has looked at combined mental health treatment utilizing machine learning, wearable devices, and smartphones, especially for student stress.
3 Methodology Only English-language papers published between 2006 and 2022 were evaluated. The study used ACM, IEEE Xplore, Science Direct, and Springer. “AND” and “OR” were used to find relevant documents with the keywords student, mental health, and stress detection. After deleting duplicates, titles and abstracts determined significant articles. Unavailable full-text papers, out-of-scope, and review papers were omitted. Students who were not addressed in the paper were eliminated. Sixty-two papers passed all these tests. Figure 1 shows the method.
Machine Learning, Wearable, and Smartphones for Student’s Mental …
329
Fig. 1 Flowchart of systematic review
4 Student’s Mental Health and Technology 4.1 Machine Learning Contribution This section reviews publications on student mental health using machine learning. It explores machine learning algorithms that can identify stress, how they are used, and how stress is categorized. A brain-computer interface is built on [12] to extract EEG features of students during and after examinations and support vector machines (SVM) and k-nearest neighbors (k-NN) to accurately classify stress. The data set collected for the student life study [13] from the smartphone is used in [14] to categorize stress as not stressed, lightly stressed, and stressed. The authors ran the data through various machine learning classifiers and models, such as generalized, clustering, and person specific.
330
D. Gurusamy et al.
According to their findings, the person-specific model performs better. However, machine learning algorithms do not contribute much to accurately classifying stress levels. Further, the author’s primary focus is on the ML algorithms rather than analyzing the students’ stress. Also, the data set relies on students’ self-evaluation of their stress levels, which is ineffective for stress detection. Castaldo et al. [15] examine oral exam stress. Student data is gathered via ECG signals. In 3 min, advanced datamining algorithms analyze signals. Nonlinear properties classify students’ stress and relaxation levels, with C4.5 performing best. Xie et al. [16] assess master’s project’s defense stress a week beforehand. Wearable devices assess blood volume pulse to measure stress. The time domain properties are extracted and the extreme learning machine (ELM) groups students by the stress level. Egilmez et al. [17], employ passive sensing to measure student’s tension, while the students are normal. The researchers monitored heart rate and skin reactivity with wrist, chest, and finger devices and reported that any equipment could detect stress. Furthermore, they discovered that the stress level of the students increases during the singing stress-inducing method. Random forest (RF) produced the best Fmeasure (88.8%) from the collected features. The stress experienced during an exam is examined using various classifiers by Carneiro et al. [18]. The authors make data collection transparent to the students. That is, the students do nothing extra to allow the author to assess their behavioral stress. Instead, features are extracted from their mouse dynamics while they take the exam. Long clicks, more mouse movement, and a longer time to complete the exam are all considered indicators of a student’s poor performance. Different subsets of features are fed into different classifiers and neural networks (NN), and RF produces good results for each. This approach analyzes student behavior, but it does not detect stress. Patel et al. [19] built a therapeutic chatbot that extracts user’s emotions from data input by the user using convolutional neural networks (CNN), recurrent neural networks (RNN), and hierarchical attention networks (HAN). It categorizes stress levels as normal, strained, or depressed. Ahuja and Banga [27] use SVM, RF, Naïve Bayes (NB), and k-NN to categorize 206 student’s stress levels. The machine learning system is trained using the perceived stress scale (PSS) and Internet activity a week before the exam. However, data acquisition is not covered, and only exam duration taken by the individual indicates stress. Mounika [20] uses a recurrent neural network and Twitter data to detect student strain. Two-thirds of students are stressed, says the author. Wu et al. [21] collect data before and throughout training to detect stress. Students used the Empatica E4 to measure heart rate, skin temperature, electrodermal activity, and blood pressure. Segmented, filtered, and extracted features are input into K-means unsupervised clustering algorithms. The silhouette score analyzes cluster assignment, and the ground truth table examines accuracy, precision, recall, and F1-score. For two clusters, the approach is 50% accurate. The study captures real-time data only during general training, which may not be adequate for clustering students. Ding et al. [22] use deep neural networks to classify stress based on the student’s social network data. By Jain et al. [23], based on the features extracted from questionnaire data collected from students, depression has been classified into five different
Machine Learning, Wearable, and Smartphones for Student’s Mental …
331
levels. Further, Twitter data was classified with high accuracy using logistic regression. Exam stress was detected by Coutts et al. [24] using an exam monitoring approach. A wrist device measures heart rate variability (HRV) and compares results to stress markers. Deep learning (DL) categorizes HRV data. The recommended technique achieves 80% in minutes. The study categorizes students to assist them in overcoming mental health issues, does not discuss what causes stress. RodríguezArce et al. [25] monitor pupils’ respiration, temperature, GSR, and pulse, while they take a timed arithmetic test. Twenty-one features are extracted from those signals and fed into various machine learning classifiers. SVM accuracy exceeds 90%. This study measures student stress via customized wearable technology. However, stress stimulation is a fictitious process, and the results cannot be compared to real-world stress scenarios. Kumar et al. [26] extract facial characteristics that are video-captured and Gaborfiltered. SVM labels the subject as scornful, disgusted, and happy. Altaf et al. [28] use ECG and machine learning to detect student stress over math and soft music. This work preprocesses data using a discrete wavelet transform. Though the authors state that this work is for students, student-specific stress issues are not addressed. Katerine et al. [29] claim to have made the first attempt at detecting academic stress during the COVID pandemic. They use galvanic skin response and an electronic node that is a gas sensor to detect student’s stress levels during exams. SISCO, a self-administered stress evaluation tool, is used to validate the collected values. To categorize the students, linear discriminant analysis (LDA), k-NN, and SVM algorithms were used. The authors claim that the electronic nose achieves 96% accuracy, and the galvanic skin response achieves 100% accuracy. Hence, the section has reviewed the contribution of machine learning algorithms and identified gaps. Table 1 shows which ML algorithm has more potential than others in each approach for quick reference.
4.2 Wearable Contribution This section reviews papers on mental health of students using wearables. The purpose of Milosevic et al. [30] is to track the stress levels of nursing students during their training simulation. A wearable chest belt is used to collect physiological signals from nursing students, and calls are made to them during training to see how the disruption during work hours affects their stress levels. This work is a stress monitoring process, and it concludes that there will be increased stress if work is interrupted. Biological sensors such as ECG, EEG, and galvanic skin response (GSR) are used by Tivatansakul and Michiko [31] to detect people’s emotions. With the help of augmented reality, people are provided with amusing, relaxing, and exciting services based on their level of emotion. Using an ECG, Ramteke and Thool [32] classify the students as normal or stressed. This study detected student stress by measuring HRV before the examination and seminar presentation. The features extracted from the signals were subjected to a
332
D. Gurusamy et al.
Table 1 Machine learning in mental health References
Algorithms
Max. performance
References
Algorithms
Max. performance
[12]
k-NN, SVM
SVM with above 80%
[20]
RNN, CNN
RNN almost 80%
[14]
SVM, j48, Bagging, and RF
RF with 60% for the person-specific model
[21]
K-means
70% precision
[15]
NB, SVM, MLP, AB, DT using C4.5
C4.5 with 79%
[22]
DISVM
86%
[16]
ELM
91.3%
[23]
XGBoost
83.87%
[17]
NB, SVM, LogR, RF
RF with F1-measure 88.8
[24]
DL (LSMN)
83%
[18]
RF, LogR, NN, RF and NN GP
[25]
SVM, k-NN, RF, LogR
KNN 95.98%
[19]
CNN, RNN
CNN up to 75% [26]
SVM
62%
[27]
NB, k-NN, SVM, RF
SVM with 85%
SVM, k-NN, instance-based learner, LDA, K-means, NB
NB with 96.67%
[29]
LDA, k-NN, SVM
k-NN with 96%
[28]
MLP-multilayer perceptron, AB-AdaBoost, DT-decision tree, LogR-logistic regression, DISVMdeep integrated SVM, LSMN-long short memory network
Poincare plot analysis, which assisted in categorizing the students. Castanier et al. [33] examine the stress levels of Ph.D. students by identifying the students’ nonverbal behavior. Students are stressed by asking questions about their thesis, and self-stress assessment and state-trait anxiety inventory (STAI) are also performed. The stress levels are measured using a webcam, ECG, and Kinect (behavior capturing camera). Zhang et al. [34] create “Happort”, a mobile app that collects data from multiple sources, including a wristband and a smartphone. The mobile app collects and sends to the server, data related to sleep, exercise, and the user’s mood in text and image form. The server combines the collected data and returns the user’s stress level on a predefined scale. School and university students were used as subjects to test the app. Vhaduri et al. [35] assess student stress by analyzing changes in phone call patterns based on data collected from smartphones and Fitbits. Vasavi [36] evaluated the thermal signature for detecting stress in students with heart rate variability. Furthermore, many approaches [14–17, 21, 24, 25, 28, 29] have combined wearable devices with machine learning techniques. Table 2 shows the overview of the wearable devices used in the reviewed papers.
Machine Learning, Wearable, and Smartphones for Student’s Mental …
333
Table 2 Wearable devices in mental health References
Device
Analysis/method
Intention
[30]
Zephyr BioHarness 3 ECG, HRV, blood pressure, respiration, GSR, skin temperature, and voice
Parameters
Simulation
Stress assessment
[31]
Biological sensor
EEG, ECG, GSR
Framework design
Emotion detection
[32]
ECG
HRV
Time, frequency, Poincare plot
Stress classification
[33]
ECG
HRV
ANOVA analysis
Stress detection
[34]
Wrist band
Exercise data
Fusion multimodality analysis
Stress detection
[35]
Fitbit
Physical activity level, step count, heart rate, and calorie burn
One-way ANOVA test and 2 sample T-test
Stress assessment
4.3 Smartphone’s Contribution The role of smartphones in stress detection and the features extracted from them are discussed in this section. Bauer and Lukowicz [37] used call patterns, SMS patterns, location details, and Bluetooth active state to determine student’s test and stress-free behavior. Their research shows that stressed-out people’s behavior varies. Gjoreski et al. [14] employ ML algorithms to classify stress using smartphone data, including accelerometer, audio recorder, GPS, Wi-Fi, call log, and light sensor. Baras et al. [38] use a smartphone’s sensor feature to indicate student’s moods and a calendar-based mobile app to track the schedule and deadlines of events the students must complete. According to the data, students are given virtual tutor assistance to reduce their stress levels. In addition, an IoT-based smart room was installed to supplement the mobile app. Boukhechba [39] presented “DemonicSalmon”, a study that assesses the stress, anxiety, and depression of 72 university students using an app called “sensus” installed on the participant’s smartphones. Text logs, GPS data, and student selfreports were utilized to identify depression in students. According to the authors, this is the first study on social anxiety and smartphones. Ma et al. [40] developed a smartphone-based salivary amylase detection method that was validated against a trier social stress test performed on 12 students. The results show that the developed system can effectively detect stress in humans. Lattie et al. [41] proposed an “intellicare” mobile app to examine college student’s mental health using the patient health questionnaire 8 (PHQ-8), generalized anxiety disorder-7 (GAD-7), and feedback reviews. The software aids anxious pupils.
334
D. Gurusamy et al.
Table 3 Smartphones in mental health References
Features used
[37]
Bluetooth devices seen, Phone feature’s behavior location, call patterns, SMS analysis
Analysis/method/tool
Stress detection
[14]
Accelerometer, audio recorder, GPS, Wi-Fi, call log, light sensor
Stress classification
[38]
Mobile sensors and calendar Calendar-based app
Stress reduction
[39]
Text log, GPS
Mobile app
Stress detection and management
[40]
Mobile sensors
Amylase analysis
Stress detection
[41]
Application platform
Mobile app
Stress prevention
[44]
Application platform
Chatbot
Stress reduction
Machine learning
Intention
However, because of COVID-19, students learning online have influenced their desire to download and use the app. As a result, the authors report that using a mobile app to support student’s mental health is ineffective. Other apps such as “mHealth” [42] employ a custom-designed questionnaire, and “smileteq” [43] regulates stress. Zhang et al. [34] created a mobile app to work in tandem with a wristband to detect student stress. Dederichs et al. [45] used online and mobile interventions to assess student stress. Students want a simple app, according to a questionnaire. Marques et al. [46] surveyed students using the general health questionnaire-12 (GHQ-12) to assess COVID-19’s influence on mental health and smartphone acceptability as a stress-awareness technique. The outcome reports that the students suffered tension, despair, and anxiety during the COVID epidemic, and their interest in using the mobile app as a health awareness tool is modest but would develop. Liu [44] created and tested Xiaonan, a chatbot-based therapy that uses the mobile app “WeChat” on 83 university students. Users can access this chatbot by using chatbot therapy, has proven to be an effective tool for assisting students experiencing mental stress. Table 3 depicts the features of the smartphone used, analysis methods, and intention of each approach.
4.4 Other Contributors Aside from machine learning, wearables, and smartphones, this section includes online interventions and other unique approaches shown in Table 4. Currie et al. [47] tested the first phase of “feeling better”, a mental health support program and reported that its outcomes help lower student’s stress. Bayesian networks convert gesture symbols into mental states, further video and self-reports collect student’s motions [48]. The study reports that 97.4% of students experienced mental health difficulties. George [49] used Facebook to give stress-awareness programs as stress
Machine Learning, Wearable, and Smartphones for Student’s Mental …
335
treatments. Ayat and Farahani [50] used artificial neural networks to examine the tendency for suicide among university students, with stress being one of the most common reasons. Li et al. [51] identify stress periods and triggers using microblog posts and Poisson probability. Bai [52] uses the Diffie Hellman key model to predict music student’s mental wellbeing and implement the neural network model using MATLAB. Eyestrain increases student stress, Jyotsna and Amudha [53] report. For this study, 20 students were stressed by math, a questionnaire, and films. Using an eye tracker, student’s stress levels were calculated. Farrer et al. [54] have developed an online mental health curriculum called uni virtual clinic for college students. This study includes those who scored above 15 in Kessler Psychological distress scale 10 (K10). After the online intervention, stress is measured. The program was popular but did not alleviate anxiety and despair. Likewise, Internet interventions were suggested by Richards et al. [55]. Acceptance and commitment therapy was used with the students who said they were depressed or stressed, with the help of a guide developed by Panajiota et al. [56]. Zhang [57] demonstrated that self-disclosing mental disorders on Facebook lessen stress. The COVID-19 epidemic causes dread, confusion, and despair, according to Katerine et al. [29]. Teacher-student connections minimize stress, research shows. The papers [20, 51, 53, 54, 56, 58–62] identified student mental health causes. Bolinski [63] found that mental stress affects academic achievement. Durán et al. [64] distribute a stress questionnaire and use audiovisual content as it promotes awareness among digital-savvy students.
5 Discussion and Future Research Directions The systematic review included nearly 62 research papers on student mental health. Nine papers used both machine learning and wearables to study student mental health. Nine studies detected stress with smartphones. Two papers merged wearables, phones, and machine learning. The rest include sources and interventions. All these studies focus on student stress detection and machine learning. Smartphone papers focus on online stress prevention and reduction. The review makes some additional observations on the following point. Types of Students. Mental health was only examined for college and university students, and it was reported that college students are stressed and depressed. No paper has analyzed mental stress of school students or recommended management options. Technology Usage. Some articles employ a small data set, which affects machine learning classifier accuracy. In terms of wearables, the market’s devices are mostly preferred. Only two papers customized wearable sensors. However, the choice of the wearable device is determined by two factors: compactness and cost. Mobile apps generally aim for mental health support and equip students with mental health intervention strategies.
Technique/method/tool
Internet-based program
Bayesian networks
Facebook
Diffie Hellman key model and neural network
Statistical analysis
References
[47]
[48]
[49]
[52]
[53]
Table 4 Other approaches to mental health
Eye tracker
Psychometric scale
Stress-awareness programs
Video recording
Online posting
Source
Continuous monitoring
Identifies factors that affect students
User-friendliness and peer support
Feasible
User friendly
Upside
Single modal system
Complexity
Requires personnel
Empirical data based
Subjects specific to a single school
Downside
Stress recognition
Psychological changes prediction
Stress reduction
Detection of mental states
Stress, depression, anxiety reduction
Intention
336 D. Gurusamy et al.
Machine Learning, Wearable, and Smartphones for Student’s Mental …
337
Stress factors. The reviewed papers only examine academic stress by measuring test, seminar, or presentation stress with math. However, in the real world, such stress is only temporary and has little impact on the students. Few articles include social media [58, 60, 61, 65], over phone usage [59, 65–68], Internet addiction [49, 56, 62], technology addiction [69], and COVID-19 [70] as mental disorder causes. As a result, factors such as racial tension, sexual abuse, inability to afford education, and some internalizing factors can be examined. The research further lists the following future research directions regarding the mental health of students. a. Identification of school student’s mental stress states and stressors. b. Building and testing the applications on different groups of students in different scenarios can improve the effectiveness of smartphone interventions. c. Advanced machine learning and deep learning can analyze student’s stress levels. d. Accuracy, reliability, and validity can be tested on wearables. e. Stress can be detected by identifying additional stressors. f. Real stress deduction mechanisms can replace the stress induction method.
6 Conclusions Student health papers were studied, and according to the research findings, good mental health entails being free of stress, depression, and anxiety, as most of the paper focuses on stress management. While wearable devices and machine learning algorithms focus on stress detection, smartphone apps offer a variety of stress reduction and prevention tools. According to the review, machine learning emerges as a good way to achieve good results, and wearable devices with machine learning may provide significant results. Furthermore, it was discovered that the smartphone app facility works well but is underutilized by students, so students must be educated about the seriousness of mental health issues and the use of those apps. Further, the study offers researchers future research directions for student’s mental health support.
References 1. WHO (2020) Doing what matters in times of stress: an illustrated guide 2. Dhillon A, Singh A (2018) Biology and today’s world machine learning in healthcare data analysis: a survey. J Biol Today’s World 8(2):1–10. (Online). Available: http://journals.lexisp ublisher.com/jbtw 3. Shanmugasundaram G, Yazhini S, Hemapratha E, Nithya S (2019) A comprehensive review on stress detection techniques. In: 2019 IEEE international conference on system, computation, automation and networking, ICSCAN 2019, pp 1–6. https://doi.org/10.1109/ICSCAN.2019. 8878795
338
D. Gurusamy et al.
4. Karthikeyan P, Murugappan M, Yaacob S (2011) A review on stress inducement stimuli for assessing human stress using physiological signals. In: Proceedings—2011 IEEE 7th international colloquium on signal processing and its applications CSPA 2011, pp 420–425. https:// doi.org/10.1109/CSPA.2011.5759914 5. Can YS, Arnrich B, Ersoy C (2019) Stress detection in daily life scenarios using smartphones and wearable sensors: a survey. J Biomed Inform 92:103139. https://doi.org/10.1016/j.jbi.2019. 103139 6. Garcia-Ceja E, Riegler M, Nordgreen T, Jakobsen P, Oedegaard KJ, Tørresen J (2018) Mental health monitoring with multimodal sensing and machine learning: a survey. Pervasive Mob Comput 51:1–26. https://doi.org/10.1016/j.pmcj.2018.09.003 7. Hickey BA et al (2021) Smart devices and wearable technologies to detect and monitor mental health conditions and stress: a systematic review. Sensors 21(10):1–17. https://doi.org/10.3390/ s21103461 8. Richter T, Fishbain B, Richter-Levin G, Okon-Singer H (2021) Machine learning-based behavioral diagnostic tools for depression: advances, challenges, and future directions. J Pers Med 11(10). https://doi.org/10.3390/jpm11100957 9. Yogeswaran V, El Morr C (2021) Mental health for medical students, what do we know today? Procedia Comput Sci 198(2021):307–310. https://doi.org/10.1016/j.procs.2021.12.245 10. Ma L, Huang C, Tao R, Cui Z, Schluter P (2020) Meta-analytic review of online guided self-help interventions for depressive symptoms among college students. Internet Interv 25(June):2021. https://doi.org/10.1016/j.invent.2021.100427 11. Amanvermez Y et al (2022) Effects of self-guided stress management interventions in college students: a systematic review and meta-analysis. Internet Interv 28. https://doi.org/10.1016/j. invent.2022.100503 12. Khosrowabadi R, Quek C, Ang KK, Tung SW, Heijnen M (2011) A brain-computer interface for classifying EEG correlates of chronic mental stress. In: Proceedings of international joint conference on neural networks, San Jose, California, USA, July 31–August 5, 2011, vol 138632, pp 757–762 13. Wang R et al (2014) Studentlife: assessing mental health, academic performance and behavioral trends of college students using smartphones. In: UbiComp 2014—proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing, pp 3–14. https:// doi.org/10.1145/2632048.2632054 14. Gjoreski M, Gjoreski H (2015) Automatic detection of perceived stress in campus students using smartphones. In: 2015 International conference on intelligent environments, pp 132–135. https://doi.org/10.1109/IE.2015.27 15. Castaldo R et al (2016) Detection of mental stress due to oral academic examination via ultra-short-term HRV analysis. In: 2016 38th Annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 3805–3808 16. Xie J, Wen W, Liu G, Chen C, Zhang J, Liu H (2017) Identifying strong stress and weak stress through blood volume pulse. In: PIC 2016—proceedings of the 2016 IEEE international conference on progress in informatics and computing, pp 179–182. https://doi.org/10.1109/ PIC.2016.7949490 17. Egilmez B, Poyraz E, Zhou W, Memik G, Dinda P, Alshurafa N (2017) UStress: understanding college student subjective stress using wrist-based passive sensing. In: 2017 IEEE international conference on pervasive computing and communication workshops (PerCom Workshops), pp 673–678. https://doi.org/10.1109/PERCOMW.2017.7917644 18. Carneiro D, Novais P, Durães D, Pego JM, Sousa N (2019) Predicting completion time in highstakes exams. Futur Gener Comput Syst 92:549–559. https://doi.org/10.1016/j.future.2018. 01.061 19. Patel F, Thakore R, Nandwani I, Bharti SK (2019) Combating depression in students using an intelligent chatbot: a cognitive behavioral therapy. In: 2019 IEEE 16th India council international conference, pp 1–4. https://doi.org/10.1109/INDICON47234.2019.9030346 20. Mounika SN (2019) Detection of stress levels in students using social media feed. In: Proceedings of the international conference on intelligent computing and control systems (ICICCS 2019), no Iciccs, pp 1178–1183
Machine Learning, Wearable, and Smartphones for Student’s Mental …
339
21. Wu Y, Daoudi M, Amad A, Sparrow L, D’Hondt F (2020) Unsupervised learning method for exploring students’ mental stress in medical simulation training. In: ICMI 2020 companion publication of the 2020 international conference multimodal interaction, pp 165–170. https:// doi.org/10.1145/3395035.3425191 22. Ding Y, Chen X, Fu Q, Zhong S (2020) A depression recognition method for college students using deep integrated support vector algorithm. IEEE Access 8:75616–75629. https://doi.org/ 10.1109/ACCESS.2020.2987523 23. Jain S, Narayan SP, Dewang RK, Bhartiya U, Meena N, Kumar V (2019) A machine learning based depression analysis and suicidal ideation detection system using questionnaires and Twitter. In: 2019 IEEE students conference on engineering and systems (SCES), pp 1–6. https:// doi.org/10.1109/SCES46477.2019.8977211 24. Coutts LV, Plans D, Brown AW, Collomosse J (2020) Deep learning with wearable based heart rate variability for prediction of mental and general health. J Biomed Inform 112(Feb). https:// doi.org/10.1016/j.jbi.2020.103610 25. Rodríguez-Arce J et al (2020) Towards an anxiety and stress recognition system for academic environments based on physiological features. Comput Methods Programs Biomed 190 26. Kumar S, Varshney D, Dhawan G, Jalutharia H (2020) Analysing the effective psychological state of students using facial features. In: Proceedings of the international conference on intelligent computing and control systems (ICICCS 2020), pp 648–653 27. Ahuja R, Banga A (2019) Mental stress detection in university students using machine learning algorithms. In: International conference on pervasive computing advances and applications— PerCAA 2019, pp 349–353 28. Altaf H, Ibrahim SN, Olanrewaju RF (2021) Non invasive stress detection method based on discrete wavelet transform and machine learning algorithms. In: 2021 IEEE 11th IEEE symposium on computer applications & industrial electronics (ISCAIE), pp 106–111 29. Katerine J, Carrillo G, Manuel C (2021) Academic stress detection on university students during COVID-19 outbreak by using an electronic nose and the galvanic skin response. Biomed Signal Process Control 68(Mar) 30. Milosevic M, Jovanov E, Frith KH, Vincent J, Zaluzec E (2012) Preliminary analysis of physiological changes of nursing students during training. In: 34th Annual international conference of the IEEE EMBS, 2012, vol 35899, pp 3772–3775 31. Tivatansakul S, Michiko O (2013) Healthcare system design focusing on emotional aspects using augmented reality. In: 2013 IEEE symposium on computational intelligence in healthcare and e-health, pp 88–93 32. Ramteke R, Thool V (2017) Stress detection of students at academic level. In: International conference on energy, communication, data analytics and soft computing (ICECDS-2017), pp 2154–2157 33. Castanier DAGC, Chang B, Martin JC (2017) Toward automatic detection of acute stress: relevant nonverbal behaviors and impact of personality traits. In: 2017 Seventh international conference on affective computing and intelligent interaction (ACII), pp 354–361 34. Zhang H, Cao L, Feng L, Yang M (2019) Multi-modal interactive fusion method for detecting teenagers’ psychological stress. J Biomed Inform 106(July):2020 35. Vhaduri S, Dibbo SV, Kim Y (2021) Deriving college students’ phone call patterns to improve student life. IEEE Access 9:96453–96465. https://doi.org/10.1109/ACCESS.2021.3093493 36. Vasavi S, Neeharica P, Wadhwa B (2018) Regression modelling for stress detection in humans by assessing most prominent thermal signature. In: 2018 IEEE 9th annual information technology, electronics and mobile communication conference (IEMCON), no 1, pp 755–762 37. Bauer G, Lukowicz P (2012) Can smartphones detect stress-related changes in the behaviour of individuals ? In: IEEE international conference on pervasive computing and communications workshops, 2012, no March, pp 423–426 38. Baras K et al (2018) Supporting students’ mental health and academic success through mobile app and IoT. Int J E-Health Med Commun 9(1):50–64. https://doi.org/10.4018/IJEHMC.201 8010104
340
D. Gurusamy et al.
39. Boukhechba M et al (2018) DemonicSalmon: monitoring mental health and social interactions of college students using smartphones. Smart Heal 9–10:192–203 40. Ma L, Ju F, Tao C, Shen X (2019) Portable, low cost smartphone-based potentiostat system for the salivary α-amylase detection in stress paradigm *. In: 2019 41st Annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 1334–1337 41. Lattie EG et al (2022) Uptake and effectiveness of a self-guided mobile app platform for college student mental health. Internet Interv 27 42. Liang Z, Tatha O, Andersen LE (2020) Developing mHealth app for tracking academic stress and physiological reactions to stress. LifeTech 2020 IEEE 2nd Glob Conf Life Sci Technol 147–150. https://doi.org/10.1109/LifeTech48969.2020.1570618580 43. Rosario DFTD, Mariano AED, Samonte MJC (2019) SmileTeq: an assistive and recommendation mobile application for people with anxiety, depression or stress. ICTC 2019 10th Int Conf ICT Converg ICT Converg Lead Auton Futur 1304–1309. https://doi.org/10.1109/ICT C46691.2019.8940036 44. Liu H, Peng H, Song X, Xu C, Zhang M (2022) Using AI chatbots to provide self-help depression interventions for university students: a randomized trial of effectiveness ✩. Internet Interv 27 45. Dederichs M, Weber J, Pischke CR, Angerer P, Apolin J (2021) Exploring medical students’ views on digital mental health interventions: a qualitative study. Internet Interv 25 46. Marques G et al (2020) Impact of COVID-19 on the psychological health of university students in Spain and their attitudes toward Mobile mental health solutions. Int J Med Inform 147(November):2021 47. Currie SL, Mcgrath PJ, Day V (2010) Development and usability of an online CBT program for symptoms of moderate depression, anxiety, and stress in post-secondary students. Comput Hum Behav 26(6):1419–1426. https://doi.org/10.1016/j.chb.2010.04.020 48. Abbasi AR, Dailey MN, Afzulpurkar NV, Uno T (2010) Student mental state inference from unintentional body gestures using dynamic Bayesian networks. J Multimodal User Interfaces 3(1):21–31. https://doi.org/10.1007/s12193-009-0023-7 49. George DR, Dellasega C, Whitehead MM, Bordon A (2013) Facebook-based stress management resources for first-year medical students: a multi-method evaluation. Comput Hum Behav 29(3):559–562. https://doi.org/10.1016/j.chb.2012.12.008 50. Ayat S, Farahani HA (2013) A comparison of artificial neural networks learning algorithms in predicting tendency for suicide. Neural Comput Appl 23:1381–1386. https://doi.org/10.1007/ s00521-012-1086-z 51. Li Q, Xue Y, Zhao L, Jia J, Feng L, Member S (2016) Analyzing and identifying teens stressful periods and stressor events from a microblog. IEEE J Biomed Heal Inf 21(5):1434–1448. https://doi.org/10.1109/JBHI.2016.2586519 52. Bai Y (2019) Research on the effect of psychological stress intervention in music students based on Diffie–Hellman key exchange algorithm. Clust Comput 3:13723–13729 53. Jyotsna C, Amudha J (2018) Eye gaze as an indicator for stress level analysis in students. In: 2018 International conference on advances in computing, communications and informatics (ICACCI), pp 1588–1593 54. Farrer LM, Gulliver A, Katruss N, Fassnacht DB, Kyrios M, Batterham PJ (2019) A novel multicomponent online intervention to improve the mental health of university students: randomised controlled trial of the uni virtual clinic 55. Richards D et al (2016) Effectiveness of an internet-delivered intervention for generalized anxiety disorder in routine care: a randomised controlled trial in a student population. Internet Interv 6:80–88 56. Panajiota R, Muotka J, Lappalainen R (2020) Examining mediators of change in wellbeing, stress, and depression in a blended, internet-based, ACT intervention for university students. Internet Interv 22. https://doi.org/10.1016/j.chb.2016.12.057.This 57. Zhang R (2017) The stress-buffering effect of self-disclosure on Facebook: an examination of stressful life events, social support, and mental health among college students. Comput Hum Behav. https://doi.org/10.1016/j.chb.2017.05.043
Machine Learning, Wearable, and Smartphones for Student’s Mental …
341
58. Wartberg L, Thomasius R, Paschke K (2021) The relevance of emotion regulation, procrastination, and perceived stress for problematic social media use in a representative sample of children and adolescents. Comput Hum Behav 121(Mar) 59. Samaha M, Hawi NS (2016) Relationships among smartphone addiction, stress, academic performance, and satisfaction with life. Comput Hum Behav 57:321–325 60. Masood A et al (2020) Adverse consequences of excessive social networking site use on academic performance: explaining underlying mechanism from stress perspective. Comput Hum Behav 61. Brailovskaia J, Schillack H, Margraf J (2020) Tell me why are you using social media (SM)! Relationship between reasons for use of SM, SM flow, daily stress, depression, anxiety, and addictive SM use—an exploratory investigation of young adults in Germany. Comput Hum Behav 113(Feb) 62. Malak MZ, Khalifeh AH, Shuhaiber AH (2017) Prevalence of internet addiction and associated risk factors in Jordanian school students. Comput Hum Behav. https://doi.org/10.1016/j.chb. 2017.01.011 63. Bolinski F, Boumparis N, Kleiboer A, Cuijpers P, Ebert DD, Riper H (2020) The effect of e-mental health interventions on academic performance in university and college students: a meta-analysis of randomized controlled trials. Internet Interv 20(April) 64. Durán L, Almeida AM, Figueiredo-braga M, Margarida A (2020) Digital audiovisual contents for literacy in depression: a pilot study digital audiovisual contents for literacy in depression: a pilot study with university students with university a students. Procedia Comput Sci 2021:1–8 65. Jasso-Medrano L et al (2018) Measuring the relationship between social media use and addictive behavior and depression and suicide ideation among university students. Comput Hum Behav. https://doi.org/10.1016/j.chb.2018.05.003 66. Liu Q et al (2018) Perceived stress and mobile phone addiction in Chinese adolescents: a moderated mediation model. Comput Hum Behav. https://doi.org/10.1016/j.chb.2018.06.006 67. Kim E, Koh E (2018) Avoidant attachment and smartphone addiction in college students: the mediating effects of anxiety and self-esteem. Comput Hum Behav 84:264–271 68. Wang J, Wang H, Gaskin J, Wang L (2015) The role of stress and motivation in problematic smartphone use among college students. Comput Hum Behav 53:181–188 69. Eklo M, Thome S, Gustafsson E, Nilsson R (2007) Prevalence of perceived stress, symptoms of depression and sleep disturbances in relation to information and communication technology (ICT) use among young adults—an explorative prospective study. Comput Hum Behav 23:1300–1321. https://doi.org/10.1016/j.chb.2004.12.007 70. Liu X, Li M (2021) How COVID-19 affects mental health of Wuhan college students and its countermeasures. In: International conference on public health and data science
Improving K-means by an Agglomerative Method and Density Peaks Libero Nigro
and Franco Cicirelli
Abstract K-means is one of the most used clustering algorithms in many application domains including image segmentation, text mining, bioinformatics, machine learning and artificial intelligence. Its strength derives from its simplicity and efficiency. K-means clustering quality, though, usually is low due to its “modus operandi” and local semantics, that is, its main ability to fine-tune a solution which ultimately depends on the adopted centroids’ initialization method. This paper proposes a novel approach and supporting tool named ADKM which improves K-means behavior through a new centroid initialization algorithm which exploits the concepts of agglomerative clustering and density peaks. ADKM is currently implemented in Java on top of parallel streams, which can boost the execution efficiency on a multicore machine with shared memory. The paper demonstrates by practical experiments on a collection of benchmark datasets that ADKM outperforms, by time efficiency and reliable clustering, the standard K-means algorithm, although iterated a large number of times, and its behavior is comparable to that of more sophisticated clustering algorithms. Finally, conclusions are presented together with an indication of further work. Keywords Clustering problem · K-means · Agglomerative clustering · Density peaks · Java · Parallel streams · Multi-core machines · Benchmark datasets
L. Nigro (B) University of Calabria, DIMES, 87036 Rende, Italy e-mail: [email protected] F. Cicirelli CNR—National Research Council of Italy—Institute for High Performance Computing and Networking (ICAR), 87036 Rende, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_26
343
344
L. Nigro and F. Cicirelli
1 Introduction The goal of the K-means clustering algorithm [1, 2] is to partition N data points, supposed with numerical coordinates, into K groups (clusters) in such a way to minimize the objective function Sum-of-Squared Errors (SSE). Each cluster is represented by its centroid, and each point is assigned to the cluster whose centroid is nearest to the point, according to the Euclidean distance. Following the assignment phase, K-means continues by updating the centroid of each cluster as the mean point of all the points assigned to the cluster. The two phases, assignment and update (see also Algorithm 1), are repeated until convergence is reached (new centroids almost coincide with previous ones), or a maximum number of iterations, T , are executed. The simplicity, ease of implementation and efficiency of K-means, that is, O (KNT), are at the basis of its recurrent application in such fields as pattern recognition, image processing, medical diagnosis, bioinformatics, machine learning and artificial intelligence. For example, K-means was exploited in [3] for crime detection, in [4] for university academic evaluation and in [5] for graph clustering. One further reason for choosing K-means, instead of a more sophisticated clustering algorithm, is the fact that its limitations are known. Basic properties of K-means, namely its dependence on (a) the centroids initialization method, (b) the degree of overlap among clusters, (c) the number of clusters, (d) the unbalance of cluster sizes and (e) the number of dimensions, were deeply investigated in [6]. In [6], it is shown that cluster overlap is the most important factor to predict, to a large extent, the success of K-means. Well-separated clusters are more difficult to manage, despite intuition. At parity of overlap, an increasing number of clusters negatively influence the success rate, that is, the percentage of cases where K-means correctly solves the dataset. Unbalance of cluster sizes is another source of weakness for K-means. The existence of both low and high-density areas can reduce the probability for K-means to allocate a centroid in all the low-density areas with the consequence that a correct solution is hard to find. Finally, only spherical and Gaussian cluster shapes are under the reach of K-means. The fundamental behavior of standard K-means is related to its modus operandi and local semantics of centroids management. K-means depends a lot on the centroids’ initialization [7] which can favor it to be stuck in a suboptimal solution. The K-means ability to fine-tune centroids and cluster borders is often exploited in more advanced algorithms like random swap (RS) [8, 9] which is guided by a global strategy of centroids definition and improvement. RS integrates K-means for local fine-tuning of a current solution. The global K-means [10] variation uses a top-down approach. At each step, every data point is considered as a potential location for a new cluster. It then applies some K-means iterations and chooses the candidate solution that minimizes the objective function. Agglomerative clustering [11, 12] follows a bottom-up approach where a solution emerges from a sequence of cluster merge operations. At each step, the pair of clusters to merge is the one which minimizes the increase of the objective function.
Improving K-means by an Agglomerative Method and Density Peaks
345
Numerous stochastic or deterministic initialization methods have been proposed in the literature [7, 13, 14]. A rough classification [7] identifies methods based on (a) random points, (b) further points semantics and (c) density heuristics. The standard initialization method is the random one which randomly chooses K points from the dataset. Of course, such points could be outliers or error points representing poor choices. Therefore, to improve the K-means performance, it is necessary to repeat it (repeated K-means or RKM) a fixed number of times so as to accept the candidate solution which minimizes the SSE (or another index, see later in this paper). As demonstrated in [6], though, RKM is still unable to correctly solve datasets when there are many clusters. A variant is the random partitions method [2] which randomly assigns points to clusters. Centroids are then inferred as the mean points of the partitions. More in general, centroids should be selected in dense areas [15, 16] and should be far away from each other [17] to avoid splitting a real cluster into multiple wrong clusters. The local semantics of K-means can be better understood (see also [7]) as an intrinsic inability to move centroids globally, e.g., from an area with some redundant centroids to another area with fewer centroids, if the two areas are distant from one another and/or are separated by stable clusters in-between which impede the movement. The two points for improving the K-means performance [7] can be summed up as: (1) using a “better” initialization method and (2) increasing the number of repeats. Repeated K-means is recognized by many researchers as a necessary condition for obtaining a “good” solution, despite the associated increased computation time. The original contribution of this paper consists of the proposal of an initialization method for K-means, ADKM, which combines concepts of agglomerative clustering [11, 12] and of k-nearest neighbors [18–20] for achieving density peaks [15, 21– 23]. Density peaks are then used to define the initial centroids following a technique borrowed by density K-means ++ (DK-means ++ ) [16], which favors centroid selection which are far away from each other. The new initialization method improves K-means significantly, as confirmed by applying ADKM to clustering basic benchmark datasets proposed in [6, 24]. Both fewer iterations of K-means are required, and careful clustering solutions can be obtained. The paper is structured as follows. Section 2 formalizes the K-means behavior. Section 3 discusses some measures useful for checking clustering quality. Section 4 describes the proposed improved K-means algorithm. Section 5 illustrates the benchmark datasets chosen to assess the performance of the proposed algorithm. Section 6 presents the gathered experimental results. Finally, Sect. 7 concludes the paper with an indication of on-going and further work.
2 K-means Formalization There is a dataset X of N data points {x1 , x2 , . . . , xN }, each data point being a vector in D-dimensional numerical space: xi ∈ R D . Data points must be partitioned in K clusters, K being fixed, {C1 , C2 , . . . , CK }, whose centroids are {μ1 , μ2 , . . . , μK }.
346
L. Nigro and F. Cicirelli
Each data point xi is assigned to cluster C j whose centroid μ j is nearest according to Euclidean distance: μ j = nc(xi ), with j = arg min xi − μ j 1≤ j≤K
where xi − μ j denotes the distance between xi and μ j . The pseudo-code of Kmeans is shown in Algorithm 1. Algorithm 1: Basic operations of K-means 1. Initialize the K centroids {μ1 , μ2 , . . . , μk } by some method (e.g., random) 2. Assign each data point xi ∈ X to cluster C j such that μ j = nc(xi ) 3. Update centroids with the mean point of each cluster: μ j = |C1 | xh j x ∈C h
j
4. Repeat from step 2 until a convergence criterion is met, e.g., ∀ j, 1 ≤ j ≤ K , μj − μ j < threshold, or a maximum number of iterations was executed
As recommended in [2] and [7], a clustering method should be distinguished from a clustering algorithm. The adopted objective function identifies the clustering method. A clustering algorithm aims to optimize the clustering method. In the case of K-means, the clustering method is the Sum-of-Squared Errors SSE: SSE =
K x h − μ j 2 where μ j = nc(x h ) j=1 x h ∈C j
and the clustering algorithm is that shown in Algorithm 1. In some applications, it is preferred to use a normalized version of SSE (normalized mean SSE or nMSE); thus: nMSE =
SSE N∗D
Notwithstanding the goal of optimizing SSE or nMSE, other measures exist which can allow the analyst to check specifically the obtained cluster structure and cluster borders, against ground truth centroids or ground truth partitions (see next section). It is not uncommon to have cases where a correct cluster structure does not correspond to a minimal SSE/nMSE or vice versa. So, the ultimate answer is upon the analyst and the needs of the considered pattern recognition application.
Improving K-means by an Agglomerative Method and Density Peaks
347
3 Measures of Clustering Accuracy One of the most important measure to judge the quality of a clustering solution was proposed in [25] (centroid index, CI) and generalized in [26] (generalized CI, GCI). The centroid index (CI) evaluates the correct distribution of centroids against real clusters (see Fig. 1). Roughly speaking, CI counts the number of real clusters which received redundant centroids and the number of real clusters that were missing centroids. CI is the maximum of the two numbers. In many practical cases, a dataset, e.g., synthetic, comes with the so-called ground truth centroids (GT) which were used to design specifically the dataset along with particular point distributions (shapes, e.g., Gaussian) around GT. In these cases, the quality of an obtained clustering solution can be assessed by mapping the achieved centroids onto GT and vice versa. Each calculated centroid is mapped on the GT centroid according to minimal Euclidean distance. Then, the number of “orphans” in GT is evaluated as the number of GT centroids to which no computed centroid is mapped on. Similarly, GT centroids are mapped on computed centroids and the number of orphans in computed centroids is determined. Then: CI = max(#orphans(centroids → GT), #orphans(GT → centroids)) Sometimes, it can be useful to express the CI as a relative value: CI/K , to better figure out the percentage of wrong centroids. In other cases, instead of GT, a dataset can be accompanied by ground truth partitions (GTP), which is an initial labeling of each point to a belonging cluster. In these cases, the CI was generalized in [26] to compare the final partitions determined Fig. 1 Intuitive illustration of the centroid index CI [7]
348
L. Nigro and F. Cicirelli
by the clustering algorithm to those of the GTP. Mapping can be based on the maximal sharing of points between a partition of GTP and a computed partition (and vice versa). In this work, instead, the Jaccard distance between partitions is used, which not only evaluates the number of shared points but normalizes this number against the total number of points in the two partitions: CPi ∩ GTP j Jaccard_distance = 1 − CPi ∪ GTP j More in particular, a computed partition (CPi ) is mapped on the partition GTP j of GTP with which it has minimal Jaccard distance. Then, the number of orphans in GTP is established as in the CI case. Finally: GCI = max(#orphans(CP → GTP), #orphans(GTP → CP) It can be observed that in real datasets GT/GTP are normally unknown. However, even in these cases, a golden clustering solution can be defined using an advanced clustering algorithm (e.g., random swap [8, 9]) and the solution established as GT or GTP for comparison purposes.
4 Proposed Improved K-means Algorithm The new improved K-means algorithm, named agglomerative density peaks KM (ADKM) is summarized in the Algorithm 2. First, the dataset is loaded into memory. Then, possibly, the GT or GTP (see Sect. 3) are loaded too. The novel initialization method is split into three methods: cuto f f _ pr ediction(), agglomerative_density() and centr oids_selection() that are detailed below. The k_means() method (see Algorithm 5) hides the operations of K-means which were abstracted in Algorithm 1. Algorithm 2: Summary of proposed AD K M algorithm in Java load_dataset(); if( INIT_GT) load_gt(); else if( INIT_PARTITIONS) load_partitions(); long start = System.currentTimeMillis(); cutoff_prediction(); agglomerative_density(); centroids_selection(); k_means(); long end = System.currentTimeMillis(); output operations
Improving K-means by an Agglomerative Method and Density Peaks
349
4.1 Centroid Initialization The new centroid initialization method owes to both agglomerative clustering concepts [11, 12] and density peaks concepts [15, 21–23]. First, each point of the dataset is assumed as a potential centroid and it is associated with a partition (set of points) in which initially is put the point itself. Then, the partition of each point is enriched by merging (bottom-up) to it the points of other partitions (which are correspondingly split) which fall in a hyperball radius (cutoff distance [21] dc) centered on the point. A partition (and its representant point) disappears if it becomes empty. The agglomerative process is continued until no further merging operation can be carried out. Whereas in the original density peaks clustering algorithm [21], the dc parameter is assumed to be defined in advance, e.g., according to a “thumb rule” (the dc should ensure a density mass around each point between 1% and 2% of the dataset dimension N ), in this work, using an approach inspired by the k-nearest neighbors (k − N N ) technique proposed in [18], the dc is predicted by a preliminary phase using a stochastic sample of the dataset points (for low or medium-sized datasets, the sample size SS can be made equal to N ). More in particular, a k (lower case, for not confusing it with the number of clusters K ) value is anticipated and, for each sample point, the first k distinct nearest neighbour distances are determined. The average value of these k distances is then established as the local dc value of the sample point. Finally, the local dc values of sample points are sorted by ascending values and the median value is established as the dc for all the dataset. Choosing the median is a way to reduce the influence of outliers. Of course, theprediction phase of dc costs O(SS ∗ N + SS log SS) which easily can become O N 2 , that is, the known all pairwise distances problem. To smooth-out the computational costs, many operations can be computed in parallel. To give an idea of the problems which arise when using Java parallel streams [27], the Algorithm 3 depicts an excerpt of the agglomerative_density() method. A partitions array with N elements (set of points) is introduced which is initialized by putting a distinct dataset point into each partition. Then the merging process compares the distance of the current point with the points of subsequent partitions. In order to avoid race conditions, a ConcurrentLinkedQueue (toMerge variable) is used which is lock-free and which can be simultaneously accessed by multiple threads. All the partition points that fall within the dc radius of the current point, instead of being immediately merged with the current partition, are temporarily stored in the toMerge list. When all the subsequent partition points have been checked, the points (actually their indexes) in the toMerge lists are extracted and safely copied onto the current partition. After that, the subsequent dataset point is considered and its agglomerative process executed and so forth.
350
L. Nigro and F. Cicirelli
Algorithm 3: An excerpt of the Java code for the agglomerative_density() method
… ConcurrentLinkedQueue toMerge=new ConcurrentLinkedQueue(); for( int i=0; i{ //check if q partition can be merged with current partition i if( q.getID(){ define p density (rho) as the size of the partition associated to p return p; }) .forEach( p->{} );
At the end of the agglomerative process, partitions’ sizes are used for defining the density (the r ho field) of the corresponding points. Of course, points with the highest density are candidates for centroids. However, the actual selection of centroids follows the technique of DK-means ++ [13, 16] to ensure centroids are far-away to one another (see next section). Both the agglomerative process and the operations to define the density of points as the size of the partitions exploit Java parallel streams and the underlying fork/join mechanism which splits the stream (associated with a native array) into segments and spawns separate threads to process segments in parallel. The combination of the
Improving K-means by an Agglomerative Method and Density Peaks
351
results of the various threads, in the case of the agglomerative process, is represented by moving the content of the toMerge list into the current partition.
4.2 Centroids Selection The centroids_selection() method starts from knowing the density of points and follows the approach of DK-means ++ [13, 16] which is summarized in the Algorithm 4, where the D(xi ) denotes the minimal distance of point xi from currently defined centroids: {μ1 , μ2 , . . . , μL }, L < K . Algorithm 4: Abstract version of the centroids_selection() method (see Alg. 2) 1. map point densities to the range [0, 1] with min–max normalization: r hoi =
r hoi −r homin r homax −r homin
2. define first centroid as the point x h having the maximal density, and put L = 1 3. calculate the prospectiveness of each point: ϕi = r hoi ∗ D(xi ). Select the next centroid as the point x j , distinct from existing centroids, having the maximal prospectiveness:x j = arg max j ϕ j , and put L = L + 1 4. if L < K come back to point 3
The Java implementation of the method depends on parallel streams in all the subphases: min–max normalization, calculation of the prospectiveness and detecting the point with maximal prospectiveness at each iteration of point 3.
4.3 K-means Algorithm Differently from Alg. 1, both the assignment and update operations are implemented in the k_means() method (see Algorithm 5) using Java streams [27], which open to an exploitation of the computing potential of today’s multi/many-core machines with shared memory. More details can be found in [14]. The threshold value (see Alg. 1) for checking convergence was set to 10−10 .
5 Benchmark Datasets The benchmark collection of basic datasets described in [6] and available in [24] was chosen for checking the properties, accuracy and time efficiency, of the proposed ADKM algorithm. Almost all the datasets are provided with ground truth (GT)
352
L. Nigro and F. Cicirelli
centroids and characterized by the number and shape (point distributions) of included clusters. Algorithm 5: An excerpt of the k_means() method (see Algorithm 2)
for( int it=0; it { find centroid with minimal distance to p define the cluster ID (CID) of p as return p; } ) .forEach( p->{} ); prepare new centroids //Update Stream c_stream=Stream.of( newCentroids ); if( PARALLEL ) c_stream=c_stream.parallel(); c_stream .map( c -> { add to c all the data points having the CID of c c.mean(); return c; } ) .forEach( c->{} ); check for termination copy newCentroids on to centroids if( termination ) { print number of iterations break; } }//for( it... )
The A sets (Fig. 2) contain spherical clusters. They are subsets of each other: A1 ⊂ A2 ⊂ A3 . The S sets (Fig. 3) contain Gaussian clusters with a varying degree of overlap. The G2 sets (Fig. 4) contain 2048 points organized around two Gaussian clusters placed at fixed locations. The clusters’ overlap is controlled by modifying the standard deviation from 10 to 100, hence the name G2 − dim −std of a particular dataset. The G2 − 1024 − 100 (with overlap) was chosen for the execution experiments. The DIM collection of datasets (Fig. 5) contain well-separated clusters in high-dimensional space. For the experiments, DIM32 was selected: 32 dimensions for each of the 1024 points. Points are randomly distributed among clusters by Gaussian distribution. The Unbalance dataset (Fig. 6) consists of eight clusters split in two well-separated groups. The first three clusters are dense, and each cluster admits 2000 points. Each of the last five clusters contains instead 100 points. The two Birch datasets shown in Fig. 7 are made up of spherical clusters, respectively, regularly distributed on a 10 × 10 grid (Birch1), or following a sine curve (Birch2). As a
Improving K-means by an Agglomerative Method and Density Peaks
353
Fig. 2 Three A datasets
Fig. 3 Four S datasets
Fig. 4 G2 datasets
further, challenging, example, the Aggregation dataset [21, 24] (see Fig. 8), which comes with GTP, was included in the tests. The parameters of the chosen datasets are collected in Table 1.
6 Results ADKM was checked on the 13 benchmark datasets shown in Figs. 2, 3, 4, 5, 6, 7 and 8 and its performance compared to that achievable from the repeated K-means (RKM) [14] and parallel random swap (PRS) [8, 9]. RKM with random initialization of centroids was repeated 100 times for each dataset. Table 2 shows the emerged average value of CI/GCI (it is recalled that the Aggregation dataset in Fig. 8 comes
354
L. Nigro and F. Cicirelli
Fig. 5 DIM32 dataset
Fig. 6 Unbalance dataset
Fig. 7 Two birch datasets
with ground truth partitions and not ground truth centroids), the average relative value of CI/GCI (the CI/GCI value is divided by K ), the success rate (i.e., the frequency by which the event CI/GCI = 0 occurs in the 100 runs), the minimal observed nMSE, and the average number of iterations (IT) K-means executes until convergence. It emerges from Table 2 that RKM only in the G2 case was able to correctly solve the dataset, for the good amount of existing overlap. In the other cases, up to 30% of the success rate was observed by RKM to produce a correct clustering solution. In the majority of cases, RKM was unable to correctly solve the dataset. A1 was solved
Improving K-means by an Agglomerative Method and Density Peaks
355
Fig. 8 Aggregation dataset
Table 1 Parameters of benchmark datasets [24] Dataset
N
D
K
A1, A2, A3
3000,5250,7500
2
20,35,50
S1, S2, S3, S4
5000
2
15
G2-1024-100
2048
1024
2
DIM32
1024
32
16
UNBALANCE
6500
2
8
BIRCH1, BIRCH2
100,000
2
100
AGGREGATION
788
2
7
Table 2 Performance of repeated K-means on the chosen benchmark datasets Dataset
CI/GCI
rel − CI/GCI (%)
Success – rate (%)
nMSE
IT
A1
2.41
12
1
2.02E6
23.22
A2
4.64
13.3
0
2.33E6
25.29
A3
6.57
13.1
0
2.42E6
26.84
S1
1.89
12.6
2
8.92E8
18.86
S2
1.3
9
17
1.33E9
25.44
S3
1.15
8
15
1.69E9
27.68
S4
0.83
6
30
1.57E9
40.34
G2
0
0
100
9982.39
2.59
DIM32
3.62
22.6
0
140.99
7
UNBALANCE
3.93
49
0
3.41E7
33.16
BIRCH1
6.71
6.7
0
5.02E8
119.01
BIRCH2
16.6
16.6
0
5.87E6
48.75
AGGREGATION
5.68
81
0
6.98
15.13
356
L. Nigro and F. Cicirelli
Table 3 Results of application of PRS and ADKM to the 13 benchmark datasets ADKM (SS = N , k = 15)
PRS Dataset
CI/GCI
nMSE
CI/GCI
nMSE
IT
A1
0
2.02E6
0
2.02E6
3
A2
0
1.93E6
0
1.93E6
5
A3
0
1.93E6
0
1.93E6
5
S1
0
8.92E8
0
8.93E8
2
S2
0
1.33E9
0
1.33E9
4
S3
0
1.69E9
0
1.69E9
6
S4
0
1.57E9
0
1.57E9
10
G2
0
9.98E3
0
9.98E3
1
DIM32
0
7.096
0
7.096
1
UNBALANCE
0
1.65E7
0
1.65E7
2
BIRCH1
0
4.64E8
0
4.64E8
10
BIRCH2
0
2.28E6
0
2.28E6
3
AGGREGATION
1
6.98
0
7.13
7
only in one run over 100. A2 was correctly solved only in two runs over 100. The results of Table 2 comply with similar results documented in [6]. Table 3 reports the experimental results collected when the parallel random swap (PRS) and the ADKM algorithm proposed in this paper are applied to the 13 benchmark datasets. ADKM was executed using a sample size SS = N and a k parameter equal to 15. The SS = N condition implies (almost) a deterministic evolution of ADKM. As one can see from Table 3, PRS solved all the datasets except for the Aggregation case where 1 centroid over 7 was wrongly computed. ADKM estimated the same results predicted by PRS. Moreover, the more challenging Aggregation the dataset was correctly solved too. The final value of the nMSE cost was predicted exactly as for PRS. By comparing the behavior of ADKM and RKM (see Table 3), it comes out that ADKM was capable of terminating its task by fewer K-means iterations, due to the good initialization of centroids based on the agglomerative method and density peaks. The time efficiency of ADKM was checked by 10 runs of the Birch1 dataset, separately executed in parallel (parameter PARALLEL = true) and then in sequential mode (PARALLEL = false). Despite the resultant deterministic behavior of ADKM, multiple runs are necessary to cope with the unavoidable uncertainties due to the underlying Operating System. Table 4 reports the observed sequential elapsed time (SET) and the parallel elapsed time (PET) both in ms, for the 10 runs. From the values in the Table 4, an average PET of 30,437 ms and an average SET = 214206 = 7.04, SET of 214,206 ms were estimated, with a speedup = avg avg PET 30437 meaning that a sequential time of about 3.57 min is reduced, in parallel mode, to about 0.51 min.
Improving K-means by an Agglomerative Method and Density Peaks
357
Table 4 Execution times of ADKM in parallel and sequential mode for birch1 (16 threads) Run
P E T (ms)
S E T (ms)
1
31,645
216,144
2
30,275
210,890
3
30,770
212,058
4
29,940
218,068
5
30,519
210,883
6
29,583
215,377
7
30,732
214,120
8
30,049
214,821
9
30,037
217,366
10
30,819
212,332
The execution experiments were carried out on a Win10 Pro, Dell XPS 8940, Intel i7-10,700 (8 physical + 8 virtual cores), [email protected] GHz, 32 GB Ram, with Java 17.
7 Conclusions This paper proposes ADKM, an initialization method for K-means which relies on an agglomerative technique and density peaks. ADKM is currently implemented in Java using parallel streams. It was inspired by the fast density peaks algorithm developed by Sieranoja and Franti in 2019, which is based on the k-nearest neighbors (k–NN) approach. Similarly to Sieranoja and Franti method, the preliminary definition of the parameter k is more intuitive to set instead of the cutoff kernel distance dc advocated by the original Rodriguez and Laio density peaks algorithm. Moreover, in this work, the construction of the k–NN graph is not required. The improved K-means algorithm proves to be both time efficient and accurate in the clustering quality, as demonstrated by applying it to a collection of benchmark datasets. Prosecution of the research aims to (a) optimizing ADKM in Java and checking it on other challenging datasets, e.g., with irregularly located clusters of variable size and shape, (b) porting ADKM to Python; (c) completing the development of a full density peaks based clustering algorithm, by exploiting the particular k–NN approach proposed in this work.
References 1. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability.
358
L. Nigro and F. Cicirelli
Berkeley, University of California Press, pp 281–297 2. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666 3. Vignesh K, Nagaraj P, Muneeswaran V, Selva Birunda S, Ishwarya Lakshmi S, Aishwarya R (2022) A framework for analyzing crime dataset in R using unsupervised optimized K-means clustering technique. In: Proceedings of congress on intelligent systems. Springer, Singapore, pp 593–607 4. Yu D, Zhou X, Pan Y, Niu Z, Sun H (2022) Application of statistical K-means algorithm for university academic evaluation. Entropy 24(7):1004 5. Sieranoja S, Fränti P (2022) Adapting k-means for graph clustering. Knowl Inf Syst 64(1):115– 142 6. Fränti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets. Appl Intell 48(12):4743–4759 7. Fränti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats? Pattern Recogn 93:95–112 8. Fränti P (2018) Efficiency of random swap clustering. J Big Data 5(1):1–29 9. Nigro L, Cicirelli F, Fränti P (2022) Efficient and reliable clustering by parallel random swap algorithm. In: Proceedings of IEEE/ACM 26th international symposium on distributed simulation and real time applications (DSRT 2022), Alès, France, 26–28 September 10. Likas A, Vlassis N, Verbeek JJ (2000) The global k-means clustering algorithm. Pattern Recognit 36:451–461 11. Kurita T (1991) An efficient agglomerative clustering algorithm using a heap. Pattern Recognit 24:205–209 12. Fränti P, Virmajoki O (2006) Iterative shrinking method for clustering problems. Pattern Recognit 39(5):761–775 13. Vouros A, Langdell S, Croucher M, Vasilaki E (2021) An empirical comparison between stochastic and deterministic centroid initialization for K-means variations. Mach Learn 110:1975–2003 14. Nigro L (2022) Performance of parallel K-means algorithms in Java. Algorithms 15(4):117 15. Al Hasan M, Chaoji V, Salem S, Zaki MJ (2009) Robust partitional clustering by outlier and density insensitive seeding. Pattern Recogn Lett 30(11):994–1002 16. Nidheesh N, Nazeer KA, Ameer PM (2017) An enhanced deterministic K-means clustering algorithm for cancer subtype prediction from gene expression data. Comput Biol Med 91:213– 221 17. Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for industrial and applied mathematics, pp 1027–1035 18. Sieranoja S, Fränti P (2019) Fast and general density peaks clustering. Pattern Recogn Lett 128:551–558 19. Yuan X, Yu H, Liang J, Xu B (2021) A novel density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy. Int J Mach Learn Cybern 12(10):2825–2841 20. Du H, Hao Y, Wang Z (2022) An improved density peaks clustering algorithm by automatic determination of cluster centres. Connect Sci 34(1):857–873 21. Rodriguez R, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):14.92–14.96 22. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 93–104 23. Li Z, Tang Y (2018) Comparative density peaks clustering. Expert Syst Appl 95:236–247 24. Benchmark datasets. http://cs.uef.fi/sipu/datasets/. Last Accessed on June 2022 25. Fränti P, Rezaei M, Zhao Q (2014) Centroid index: cluster level similarity measure. Pattern Recogn 47(9):3034–3045
Improving K-means by an Agglomerative Method and Density Peaks
359
26. Fränti P, Rezaei M (2016) Generalizing centroid index to different clustering models. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, pp 285–296 27. Urma RG, Fusco M, Mycroft A (2019) Modern Java in action. Manning, Shelter Island
Assessing the Best-Fit Regression Models for Predicting the Marine Water Quality Determinants Karuppanan Komathy
Abstract The machine learning algorithms generally were found to be utilized in forecasting the physiochemical properties of river water, coastal water, and groundwater meant for the usage in agriculture, industry, and drinking. Prediction of quality indicators of the open sea ecosystem to maintain the healthy marine ecosystems has not been focused much as it is not directly consumed by humankind. Monitoring the quality of sea water is crucial in order to prevent maritime disasters, preserve the environment, and guarantee an ecological expansion of marine resources, which have an immediate impact on living mankind. Hence, as part of our study on marine water quality management, this paper has considered the dataset on open sea ecosystems and evaluated the best-fit regression model for predicting the quality determinants. Instead of investigating the fitness of a single regression model to predict the entire set of physiochemical properties, evaluation of about 18 regression models was comprehended in this paper and an exclusive regression model was then recommended for each of the water quality determinants. The advanced techniques such as tuning and stacking were also contemplated in regression analysis to enhance the performance of the best-fit model. Keywords Ballast water · Marine water quality · Physiochemical properties · Regression analysis · Prediction model
1 Introduction The core determinants that regulate the quality of freshwater or marine water are temperature, pH, total dissolved solids (TSS), dissolved oxygen (DO), salinity, and total suspended solids (TSS). The aquatic temperature adjusts several toxins in the water. The number of hydrogen ions in water, which can make it acidic, neutral, or alkaline is represented by the pH. For the survival of aquatic organisms, dissolved oxygen is essential. Plankton, microorganisms, or sediments suspended in the water K. Komathy (B) Academy of Maritime Education and Training, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_27
361
362
K. Komathy
are indicated by turbidity. The dissolved salts in the water, such as calcium, sodium, potassium, and magnesium, are indicated by the salinity. Monitoring the water quality indicators has, therefore, become crucial in order to develop and implement effective statutory controls. As a part of our ongoing research on managing the quality of ballast water in ships as per the regulations laid out by the International Maritime Organization (IMO) [13], this paper has focused on predicting the quality indicators of marine water. Ballast water principally helps to maintain the stability of the ship. A ship extracts water from the sea or ocean of one port where it unloads the goods and expels the ballast water into the sea or ocean of another port where it loads the goods. In other words, the ballast water drawn from one sea or ocean gets unloaded into another sea or ocean and this continuous process is termed as ballast water exchange (BWE). Living organisms in marine water are basically dependent on the physical, biological, and chemical characteristics of water, and hence, the organisms displaced from their domicile region to another may provoke a bio-invasion. IMO restricts the quality of the marine water exchanged with another water region to protect the marine ecosystems, environment, and human health. The paper, therefore, attempts to design and evaluate the prediction models to determine the quality parameters of marine water prior to any BWE. The objective of the paper is to assess the various regression models so that the best-fit model for each of the quality determinants of marine water would be capable of predicting precisely.
2 Review of Machine Learning Techniques Used in Water Quality Analysis Significant research works exist on studying natural water bodies such as freshwater and marine water around the world. Reviewing these research projects was essential to observe the findings under relevant topics, such as the prediction of quality determinants of water, in order to fulfill the study’s purpose. It is also most important to focus on the studies involved in monitoring the quality of sea water in order to prevent maritime disasters, preserve the environment, and guarantee an ecological expansion of marine resources. This section reviews the research works on monitoring the quality of water assisted by machine learning technology. Deng et al. [5] applied machine learning (ML) algorithms namely artificial neural network (ANN) and support vector machine (SVM) to predict the trend of algal growth in Tolo Harbour, Hong Kong, and observed that the water quality variables such as biological oxygen demand (BOD), total inorganic nitrogen (TIN), dissolved oxygen (DO), phosphorus (PO), and pH were key variables contributing to abundance of blooms during past three decades. They also suggested that the ML can help to identify the complicated algae’s behavior. Goldstein et al. [6] suggested that uncertainty derived from ML could be improved by using probabilistic ML-based
Assessing the Best-Fit Regression Models for Predicting the Marine …
363
predictions. They further advised to utilize the probabilistic nature of the predictors used in ML as stochastic parameterizations, so that the deterministic numerical model could be converted as probabilistic models. Azrour et al. [3] have designed a machine learning model that predicted water quality index and classified the water quality to control water pollution and alert when detected the poor quality. The authors concluded that a multiple linear regression would be suitable to predict the water quality index for the input physiochemical and biological parameters. Also, ANN was preferred to categorize the water quality. Imani et al. [12] have used a 17-year dataset collected on the water quality dataset from the State of São Paulo, Brazil, to build the ANN model. Fuzzy analytic hierarchy technique was used to rate the water basins according to their level of their resistance to retain the quality. A study by Nafsin and Li [17] in predicting the biochemical oxygen (BOD) of the Buriganga River, Bangladesh, confirmed that ML algorithms were able to categorize the most influential parameters for BOD prediction, namely chemical oxygen demand (COD), total dissolved solids (TDS), conductivity, total solids (TS), suspended solids (SS), and turbidity. The application of hybrid machine learning models has given a higher prediction accuracy. Singha et al. [23] had studied the water region of Arang, Raipur, India, using the deep learning (DL) model for predicting the groundwater quality and compared it with ML models, namely random forest, eXtreme gradient boosting, and ANN. Results indicated that the DL model was able to produce higher accuracy compared to the ML model. The regression relations developed for Bahr El-Baqar Drainage System [16] to estimate the water quality using the data gathered between 2004 and 2006. For calculating the water quality index of Mirim Lagoon Lake, Brazil, Valentini et al. [24] created a multilinear regression equation with minimal parameters, namely phosphorus, dissolved oxygen (DO), and thermos-tolerant coliforms. Data, from 154 samples obtained through 7 monitoring stations located around the lake over a period of three years, were used. A spatial regression for Ru River Watershed, China, was developed by Yang et al. [20] to investigate the impacts of pollutants on ambient water quality. This method estimated regression correlations between water quality measures and major influencing watershed factors. Regression results had shown that human activities and the physical properties of the watershed had an impact on the total nitrogen concentrations. Guo and Lee [8] proposed an artificial intelligencebased binary classification model called easy ensemble using a 3-decade dataset comprising three marine beaches in Hong Kong having hydrographic and pollution attributes. The class-imbalance method employed in the model was able to learn to predict even under the worst cases of E. coli. The performance of the method overtook multilinear regression and classification tree models. The research work of Grbˇci´c et al. [7] explored the data of Escherichia coli and enterococci over 15 public beaches of Rijeka, Croatia, to develop ML models for forecasting pathogen levels based on environmental characteristics and to analyze their interactions with environmental parameters. Their findings showed that the nearest neighbor classifier algorithm provided the best overall performance. Hatzikos et al. [10] developed an underwater measurement system where the prediction of water quality variables such as temperature, pH, conductivity, salinity, dissolved
364
K. Komathy
oxygen, and turbidity was involved. The nearest neighbor classifier proved to be an optimal technique. Aldhyani et al. [1] created AI model for water quality index (WQI) using the long short-term memory (LSTM), a deep learning algorithm, and a nonlinear autoregressive neural network (NARNET) and developed classification models using SVM, KNN, and naive Bayes to get the water quality classes (WQC). Results of the predictions showed that the NARNET model was suitable to predict WQI and SVM algorithm was preferred for WQC prediction. Wang [25] proposed a machine learning approach called stacking to predict the quality of Lake Erie, New York, USA, and showed that the stacking provided higher accuracy compared to the base models. Chen et al. [4] recommended big data techniques to analyze the surface water of the rivers and lakes of China having more than five years’ data. They found that ML techniques such as decision tree, random forest, and deep cascade forest produced more accurate predictions. The authors also proved that the primary water quality components such as pH, dissolved oxygen, chemical oxygen demand-Mn, and NH3– N played a vital role in determining the water quality. Haghiabi et al. [9] studied the quality of the Tireh River of Iran and developed the prediction models using ML techniques namely ANN, SVM, and GMDH-type of neural network. They declared that GMDH and SVM models produced more consistent results compared to ANN while testing with water quality determinants such as calcium, chlorine, specific conductivity, bicarbonate, sulfate, pH, sodium, magnesium, and TDS. To forecast the water quality index (WQI), Hmoud et al. developed an adaptive neuro-Fuzzy inference system algorithm [11] and demonstrated that it produced higher regression coefficient in prediction whereas feed-forward neural network was preferred for its accuracy in classification. Khoi et al. [14] explored the competence of ML models including boosting-based algorithms, decision tree-based algorithms, and ANN-based algorithms while predicting the water quality index (WQI) of La Buong River, Vietnam. XGBoost algorithm was the topper among twelve such algorithms investigated. The work by Koranga et al. [15] reported that prediction of water quality class and the water quality index of Nainital Lake, India, involved the ML techniques. Random forest method was found to be the most suitable ML technique for predicting the WQI among the eight models examined whereas classification algorithms such as stochastic gradient descent, random forest, and support vector machine produced the same accuracy. In summary, the literature survey has brought out the various machine learning algorithms used in forecasting the quality of water from rivers, lakes, coastal, and groundwater for safer consumption by agriculture, industry, and people. Research works for an open sea water quality prediction to maintain the healthy marine ecosystems were not focused much. Hence, the objective of this paper was to consider the dataset of open sea water to examine the best-fit regression model for predicting the quality determinants.
Assessing the Best-Fit Regression Models for Predicting the Marine …
365
3 Experimental Setup to Evaluate the Goodness of Fit of Regression Models Predicting the Water Quality Determinants Regression is a statistical method used to determine the relationship between dependent and independent parameters out of engineering processes, financial analysis, and other fields of work. The dependent variable is often called as a target or predictor and the independent variable as a feature. Regression is a supervised learning technique where the algorithm is trained to learn the pattern from the given history of data and is applied to estimate the same with real data. Anaconda [2] is an opensource software with an integrated development environment (IDE) having Python for coding. It embeds many data science and analysis packages supporting machine learning and deep learning models. NumPy [19], pandas [20], and PyCaret [21] were some of the packages used under a machine learning framework with Python coding [22] meant for data manipulation and data analysis. Regression in PyCaret uses supervised machine learning for appraising the relationship between the dependent and independent variables.
3.1 Application Dataset Preparation This study has used the dataset from the open public dataset maintained by National Centres for Environmental Information (NCEI) [18]. The Atlantic Ocean data extracted from the larger NCEI dataset has been utilized for training, validation, and testing of the regression model developed. Table 1 shows the characteristic of data taken for analysis by demonstrating the pairwise relationship between the input parameters chosen for the study such as temperature (Temp), distance, depth, salinity (SAL), pH, total alkalinity (TA), dissolved oxygen (DO), dissolved inorganic carbon (DIC), and CO2 . The table displayed the presence of negative correlation and low correlation as well. The negative R2 value indicates that the dependent variable tends to decrease as the independent variable increases whereas the low correlation illustrates that the dependent variable does not reflect any change in the independent variable. However, if the residuals of the prediction stay around the x-axis of the independent variable, then the regression would yield a better goodness of fit for prediction.
0.643
−0.291
−0.606
0.386
DIC µmol/kg
DO µmol/kg
CO2 µmol/kg
−0.042
−0.131
−0.034
0.090
0.149
−0.077
TA µmol/kg
pH
Salinity ppt
0.019
0.113
−0.220
1.000
−0.371
Distance NM
Depth m
1.000
0.048
Temp °C
Distance NM
Temp °C
Parameter
−0.064
−0.017
0.405
−0.011
−0.268
0.107
1.000
Depth m
0.070
−0.061
0.046
0.049
−0.502
1.000
Salinity ppt
−0.790
0.557
−0.627
−0.060
1.000
pH
Table 1 Pairwise correlation of input water quality indicators of Atlantic Ocean
0.148
−0.559
0.079
1.000
TA µmol/kg
0.570
−0.034
1.000
DIC µmol/kg
−0.632
1.000
DO µmol/kg
1.000
CO2 µmol/kg
366 K. Komathy
Assessing the Best-Fit Regression Models for Predicting the Marine …
367
4 Methodology 4.1 Process Flow for Building, Evaluating, and Finalizing the Best-Fit Regression Model for Prediction Initially, the application data, representing the marine water quality data with missing values, empty data, or outliers, were prepared for analysis and to normalize the data if required. Pre-processing process has dealt with the missing values or outliers before training, as the automated machine learning algorithms would fail to analyze the presence of missing or outliers. It also used the z-score method to normalize the data for machine learning. After pre-processing, the dataset got split into two parts, namely (i) 80% of data for modeling and (ii) 20% of data for testing. Next, the data for modeling were divided further into two parts comprising 70% for training and 30% for validation during the setup process. Setting up the environment for building the regression models was meant to initialize the experiment before training with the two mandatory parameters called input data and target parameter. Compare models, a function used in the training stage, was to evaluate the performance of all the regression models using the crossvalidation and thereby identifying the top performers. The output of this stage listed the performances of 18 different models of regression in ascending order of their regression metrics. However, this stage did not yield any trained model. Following were the regression metrics enabled to declare the performance of a model. The MSE, MAE, RMSE, R2 , RMSLE, and MAPE metrics generally evaluate the prediction error rates and the model performance during the regression analysis. Equations (1)–(5) were drawn based on the definitions of those metrics. Mean absolute error (MAE) is the absolute difference between the actual and estimated data, which is given by, MAE =
N 1 yi − yˆ N i=1
(1)
Mean squared error (MSE) is the squared difference between the actual and estimated data, which is given by, MSE =
N 2 1 yi − yˆ N i=1
(2)
Root mean squared error (RMSE) is calculated taking the square root of MSE. The Rsquared or R2 represents the coefficient of correlation and is given by the proportion of the variance in the dependent variable. Eq. (3) gives R2 as,
368
K. Komathy
(yi − yˆ )2 R =1− (yi − y)2 2
(3)
Mean absolute percent error (MAPE) is the average absolute percent difference between the estimated and the actual data, which is given by, 100 yi − yi N i=1 yi N
(4)
Root mean squared logarithmic error (RMSLE) measures the ratio of estimated and actual values and is given by, N 1 2
log yi + 1 − log(yi + 1) N i=1
(5)
where N is the number of observations; yi is the actual value; yi is the estimated value; and y i is the mean value. During the training phase, each regression algorithm was prepared using the create model function, which made the model competent to learn when new data were fed in. During this stage, the three top-performing regressing algorithms were considered for training and their performance metrics were recorded accordingly. Table 2 illustrated the outcome of the training of regression models, which were obtained from the top performers for each of the water quality parameters. R2 measures the goodness of fit of the model chosen. Out of the marine water quality parameters listed in Table 2, pH adapted the best learning model called linear regression (LR) with 0.988 as R2 and exhibited very low errors compared to all others. Regardless of salinity showing a low R2 , the errors exhibited a low range, and therefore, it is understood that salinity also has settled with a better model compared to others. Table 2 Top regression models for the target parameters of Atlantic Ocean after training Regression model MAE
MSE
RMSE
R2
RMSLE
MAPE
CO2
Extra trees (ET)
14.528
9576.57
44.758
0.938
0.049
0.022
DIC
Linear regression (LR)
5.122
58.686
7.502
0.978
0.003
0.002
DO
Extra trees (ET)
12.420
367.904
18.632
0.898
0.101
0.061
pH
Linear regression (LR)
0.010
0.0002
0.015
0.988
0.002
0.001
SAL
Random forest (RF)
0.301
0.3590
0.563
0.731
0.016
0.009
TA
Linear regression (LR)
5.058
56.475
7.363
0.961
0.003
0.002
Target parameter
Assessing the Best-Fit Regression Models for Predicting the Marine …
369
Before finalizing the model, the plot model process helped to visualize the residuals and prediction error from the trained models and to review the metrics. The difference between the observed value and the predicted value of the parameter is called the residual. The residuals were represented in the graph having the zero residual line as the reference, where positive residual was positioned above the line; negative value was shown below; and zero value was marked on the line. To illustrate that the regression model was unbiased, residuals were mapped for the six quality parameters and then the ensuing scatter plots were consolidated in Fig. 1. The residuals were scattered randomly around the zero residual line of each quality indicator as shown in Fig. 1 confirming the respective regression model is unbiased.
Fig. 1 Residuals (in y-axis) versus predicted values (in x-axis) for the training data (blue color) and validation data (green color) along with the distribution graphs
370
K. Komathy
5 Results and Discussion The function called finalize models was to train the models to make them competent to the data taken for modeling. The trained model was then fed to predict the model’s function to predict the targeted variables. Figure 2 shows the predicted output against the observed value for CO2 using the trained model of extra trees regression (ET); DIC with linear regression (LR); DO with extra trees regression (ET); TA with linear regression; pH with linear regression; and salinity with random forest regression (RF) as stated in Table 2. Saving and loading the models were also planned for deploying them in the real environment.
Fig. 2 Prediction by each of the regression models for water quality indicators (prediction values in y-axis and observed values in x-axis)
Assessing the Best-Fit Regression Models for Predicting the Marine …
371
This section explores further on the advanced techniques to enhance the performance of the weak-learning algorithms among the selected models. The function, namely tune model, was applied automatically to improve the performance of hyperparameters of a regression model. By default, R2 value gets optimized. If required, other metrics can also be optimized during this stage. Stacking refers to combining multiple models to create a meta-model that uses the predictions from the stacked ones. Stacked regression model was created using either blending or ensemble techniques. For comparing the outcome from these advanced processes and identifying the best-fit model for deployment, the blending and ensemble were examined considering the three top-performing regression models. Blending techniques combined the multiple models, and the predictions from individual models were then averaged to yield the final value. Ensemble learning is a machine learning technique to train weak learners and to combine these models to get a better fit. Methods such as bagging and boosting were involved during the ensemble. Once the best-fit model got validated, the model had to enter the testing phase to see that the model fits well for the new data as well. The accuracy of a regression model after tuning, blending, or ensemble was compared in Table 3 in terms of MAE, MSE, RMSE, R2 , RMSLE, and MAPE. The table also highlighted the best R2 value of the regression model for the respective water quality determinant. From Table 3, it is observed that CO2 data fitted best with ET model; DIC with the stacked model blended with the top-3 regression models; DO with ET; pH with LR; salinity with RF; and TA with LR. Also, it is found that stacked models are not suited for CO2 and DO data due to very high errors exhibited such as MAE, MSE, and RMSE. Stacked models of salinity, pH, and DIC are equally adaptable.
6 Conclusion As a part of the study on marine water quality management, data on the open sea water were investigated in this paper to evaluate the best-fit regression model for predicting the quality parameters. It was opined that a single model does not fit commonly for predicting the quality determinants CO2 , DIC, DO, pH, salinity, and TA. Extra trees (ET) and linear regression (LR) were mostly adopted by the physiochemical properties of open sea water after validation. This paper has also comprehended the use of advanced techniques such as tuning, blending, and stacking to enhance the performance of the selected model for each of the quality determinants to achieve higher accuracy. The outcome of the assessment showed that extra trees regression model was the best-fit for CO2 data; stacked model blended with the top-3 regression for DIC; extra trees regression model for DO; linear regression model for the pH; random forest model for salinity; and linear regression model for TA. It was observed that the stacked models failed to fit for CO2 and DO data due to high errors.
372
K. Komathy
Table 3 Comparison of prediction accuracy from customary and advanced techniques Target parameter Regression model MAE
MSE
RMSE R2
CO2
0.000
0.000
1.000 0.00
0.000
Stack with top-3 45.879 regression models
8761.4
93.602
0.827 0.235
0.120
Stack with tuned regression
44.636
6643.3
81.506
0.868 0.222
0.119
Stack with ensemble regression
142.909 78,680
280.5
−0.553 0.575
0.291
Linear regression
DIC
DO
PH
SAL
Extra trees regression
0.000
RMSLE MAPE
4.974
50.213
7.086
0.984 0.003
0.002
Stack with top-3 3.813 regression models
31.112
5.577
0.990 0.002
0.001
Stack with tuned regression
4.191
37.912
6.157
0.988 0.002
0.002
Stack with ensemble regression
4.254
54.127
7.357
0.983 0.003
0.002
Extra trees regression
0.000
0.000
0.000
1.000 0.000
0.000
Stack with top-3 13.284 regression models
391.576 19.788
0.898 0.147
0.06
Stack with tuned regression
13.976
423.874 20.588
0.890 0.204
0.064
Stack with ensemble regression
13.209
374.569 19.353
0.902 0.121
0.060
Linear regression
0.010
0.0002
0.014
0.991 0.001
0.001
Stack with top-3 0.011 regression models
0.0003
0.016
0.989 0.001
0.001
Stack with tuned regression
0.011
0.0003
0.016
0.989 0.001
0.001
Stack with ensemble regression
0.010
0.0002
0.015
0.990 0.001
0.001
Random forest regression
0.164
0.065
0.256
0.967 0.007
0.004 (continued)
Assessing the Best-Fit Regression Models for Predicting the Marine …
373
Table 3 (continued) Target parameter Regression model MAE
MSE
RMSE R2
Stack with top-3 0.336 regression models
0.260
0.510
0.869 0.014
0.009
Stack with tuned regressions
0.354
0.280
0.529
0.859 0.014
0.010
Stack with ensemble regressions
0.348
0.268
0.518
0.865 0.014
0.010
Linear regression
5.066
52.634
7.255
0.977 0.003
0.002
Stack with top-3 5.279 regression models
60.386
7.770
0.974 0.003
0.002
Stack with tuned regression
5.279
60.387
7.770
0.974 0.003
0.002
Stack with ensemble regression
7.688
154.759 12.440
0.934 0.005
0.003
TA
RMSLE MAPE
Acknowledgements Department of Science and Technology (DST), India, has sponsored this research study under the scheme of optimal water usage for the industrial sector (OWUIS).
References 1. Aldhyani TH, Al-Yaari M, Alkahtani H, Maashi M (2020) Water quality prediction using artificial intelligence algorithms. Appl Bionics Biomech 2020:6659314 2. Anaconda homepage. https://www.anaconda.com/products/distribution. Last Accessed 12 Dec 2021 3. Azrour M, Mabrouki J, Fattah G, Guezzaz A, Aziz F (2022) Machine learning algorithms for efficient water quality prediction. Model Earth Syst Environ 8(2):2793–2801. https://doi.org/ 10.1007/s40808-021-01266-6 4. Chen K, Chen H, Zhou C, Huang Y, Qi X, Shen R, Ren H (2020) Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Res 171:115454 5. Deng T, Chau KW, Duan HF (2021) Machine learning based marine water quality prediction for coastal hydro-environment management. J Environ Manage 284:112051. https://doi.org/ 10.1016/j.jenvman.2021.112051 6. Goldstein EB, Coco G, Plant NG (2019) A review of machine learning applications to coastal sediment transport and morphodynamics. Earth Sci Rev 194:97–108. https://doi.org/10.1016/ j.earscirev.2019.04.022 7. Grbˇci´c L, Družeta S, Mauša G, Lipi´c T, Luši´c DV, Alvir M, Luˇcin I, Sikirica A, Davidovi´c D, Travaš V, Kalafatovic D, Kranjˇcevi´c L (2021) Coastal water quality prediction based on machine learning with feature interpretation and spatio-temporal analysis. arXiv:2107.03230. https://doi.org/10.48550/arXiv.2107.03230 8. Guo J, Lee JH (2021) Development of predictive models for “very poor” beach water quality gradings using class-imbalance learning. Environ Sci Technol 55(21):14990–15000. https:// doi.org/10.1021/acs.est.1c03350
374
K. Komathy
9. Haghiabi AH, Nasrolahi AH, Parsaie A (2018) Water quality prediction using machine learning methods. Water Qual Res J 53(1):3–13 10. Hatzikos EV, Tsoumakas G, Tzanis G, Bassiliades N, Vlahavas I (2008) An empirical study on sea water quality prediction. Knowl-Based Syst 21(6):471–478 11. Hmoud Al-Adhaileh M, Waselallah Alsaade F (2021) Modelling and prediction of water quality by using artificial intelligence. Sustainability 13(8):4259 12. Imani M, Hasan MM, Bittencourt LF, McClymont K, Kapelan Z (2021) A novel machine learning application: water quality resilience prediction model. Sci Total Environ 768:144459. https://doi.org/10.1016/j.scitotenv.2020.144459 13. IMO homepage (2017) Guidelines for ballast water exchange (G6). Resolution MEPC 288(71). https://wwwcdn.imo.org/localresources/en/KnowledgeCentre/IndexofIMOResoluti ons/MEPCDocuments/MEPC.288(71).pdf. Last Accessed 02 June 2020 14. Khoi DN, Quan NT, Linh DQ, Nhi PTT, Thuy NTD (2022) Using machine learning models for predicting the water quality index in the La Buong River, Vietnam. Water 14(10):1552. https:// doi.org/10.3390/w14101552 15. Koranga M, Pant P, Kumar T, Pant D, Bhatt AK, Pant RP (2022) Efficient water quality prediction models based on machine learning algorithms for Nainital Lake, Uttarakhand. Mater Today: Proc 57(4):1706–1712 16. Korashey R (2009) Using regression analysis to estimate water quality constituents in Bahr El Baqar drain. J Appl Sci Res 5(8):1067–1076 17. Nafsin N, Li J (2022) Prediction of 5-day biochemical oxygen demand in the Buriganga River of Bangladesh using novel hybrid machine learning algorithms. Water Environ Res 94(5):e10718. https://doi.org/10.1002/wer.10718 18. NCEI homepage, OAS accession detail for 0171017. National Centers for Environmental Information. https://www.ncei.noaa.gov/archive/archive-management-system/OAS/bin/prd/ jquery/accession/details/171017. https://www.nodc.noaa.gov/archive/arc0117/0171017/1.1/ data/1-data/ 19. Numpy homepage. https://numpy.org/. Last Accessed 09 Jan 2022 20. Pandas homepage. https://pandas.pydata.org/docs/getting_started/install.html. Last Accessed 09 Jan 2022 21. PyCaret homepage. https://pycaret.gitbook.io/docs/. Last Accessed 09 Jan 2022 22. Python homepage. https://www.python.org/. Last Accessed 09 Jan 2022 23. Singha S, Pasupuleti S, Singha SS, Singh R, Kumar S (2021) Prediction of groundwater quality using efficient machine learning technique. Chemosphere 276:130265 24. Valentini M, dos Santos GB, Muller Vieira B (2021) Multiple linear regression analysis (MLR) applied for modeling a new WQI equation for monitoring the water quality of Mirim Lagoon, in the state of Rio Grande do Sul-Brazil. SN Appl Sci 3(1):1–11 25. Wang L, Zhu Z, Sassoubre L, Yu G, Liao C, Hu Q, Wang Y (2021) Improving the robustness of beach water quality modeling using an ensemble machine learning approach. Sci Total Environ 765:142760
E-commerce Product’s Trust Prediction Based on Customer Reviews Hrutuja Kargirwar, Praveen Bhagavatula, Shrutika Konde, Paresh Chaudhari, Vipul Dhamde, Gopal Sakarkar , and Juan C. Correa
Abstract The Internet is strengthening the e-commerce industry, which is fast growing and helping enterprises of all sizes, from multinational organizations to tiny firms. Customers may buy things online with little or no personal interaction with sellers they purchase online; user reviews play a vital role in online shopping. Consumers’ comprehension and interpretation of product reviews impacts buying decisions. This research paperwork presents a unique, reproducible data processing methodology for customer evaluations across 10 product categories on India’s one of the most popular e-commerce platforms with 11,559 customer reviews. We investigated the efficacy of a collection of machine learning algorithms that may be used to assess huge reviews on e-commerce platforms by using consumer ratings as a source to automatically classify product reviews as highly trustable or notso-trustable. Results show that the algorithms can reach up to 85% of accuracy in classifying product reviews correctly. The research discusses the practical ramifications of these findings in terms of consumer complaints and product returns, as evidenced by customer reviews. H. Kargirwar · P. Bhagavatula (B) · S. Konde · P. Chaudhari · V. Dhamde G. H. Raisoni College of Engineering, Nagpur, India e-mail: [email protected] H. Kargirwar e-mail: [email protected] S. Konde e-mail: [email protected] P. Chaudhari e-mail: [email protected] V. Dhamde e-mail: [email protected] G. Sakarkar D. Y. Patil Institute of Master of Computer Applications and Management, Akurdi, Pune 411044, India J. C. Correa Colegio de Estudios Superiores de Administracion, Bogota, Colombia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_28
375
376
H. Kargirwar et al.
Keywords Reviews · E-commerce · India · Classification · Machine learning algorithms
1 Introduction Experts in internet commerce, consumer behavior [1], and data scientists from other industries [2] might benefit from product reviews [3]. Customer satisfaction measures the total impact of online shopping [4], and consumer reviews in E-commerce platforms [5] are frequently viewed as vital tools that represent the user’s knowledge, sentiments, and readiness to buy a product [6]. This multidisciplinary topic pertains to natural language’s complexity and its significance as a reliable information source for customers making purchasing decisions [7]. It may be difficult for newcomers to understand the literature on the empirical analysis of consumer reviews because customer reviews are ubiquitous on a variety of online platforms, including but not limited to supermarkets [8], the hotel industry [9], healthcare institutional Web sites [10], food delivery apps [11], and education platforms [12]. An essential element of the empirical study of customer reviews is their role for multiple practical purposes. Baek et al. [13] observed that customers focus on different information sources of reviews based on their goals, causing reviews to function as a criteria for finding information or comparing product alternatives. Song et al. [14] found that product reviews affect competitiveness among different categories of items in different ways in the retail sector. Customer reviews are important for the pricing strategy of returnfreight insurance, according to Geng et al. [15]. According to Bhargava et al. [16], consumer reviews help potential buyers to choose items of high quality produced by manufacturers of high reputation and tradition. The majority of e-commerce platforms nowadays calculate trust score percentages based on user ratings, neglecting subjective information in feedback comments, according to Xia and Jiang [17]. Given the abundance of studies concentrating on empirical assessments of customer evaluations, we would want to build a case by focusing on Flipkart as a data source for past research. Moving further, the third section is materials and methods where we have discussed methodological approaches on Flipkart, including data collection process and data preprocessing process. After completing methodology, there is result section where the precision, recall, and F1-score of ten products are displayed in tabular format.
2 Related Work After reviewing the number of research papers, the following comprehensive finding from the literature has been noted down as per Table 1. It represents the working approach of various other researchers
E-commerce Product’s Trust Prediction Based on Customer Reviews
377
Table 1 Description of comparative papers S. No. Year of publishing Approach
References
1
2018
To determine elements that influence people’s [18] purchase intentions, either positively or negatively, the author of this paper developed an improved Bayes technique based on TF-IDF feature weight and grade factor feature weight for automatic Chinese text categorization. The combined effect of the TF-IDF feature and the grade factor feature can obviate the assumption of feature independence
2
2015
In this paper, the authors suggested a novel [19] framework using various data mining approaches for analyzing customers’ attitude toward buying a product online. Data collection, preprocessing, feature extraction, and feature engineering are all part of the proposed framework. In comparison with other measures, the results demonstrate that TF-IDF is the best but did not assess the performance of algorithms using big data sets
3
2018
In this paper, SVM was used to categorize text data in [20] order to explore emotions in text, and it was proven that SVM is more accurate than other machine learning models. SVM uses the structure risk minimization concept, which avoids local minimums and effectively handles overlearning while ensuring strong generalization and classification accuracy
4
2019
The author of this paper did research and proposed a [21] model to analyze the relationship between online reviews and performance revenue, namely to determine the impact of reviews on product collection by collecting detailed product information, clustering review data using TF-IDF, and classifying negative and positive reviews using SVM classification
5
2011
In this paper, the author states that by preprocessing [22] and feature selection of two critical processes to increase the quality of mining through the implementation of different feature selection and classifiers accessible in Waikato Environment for Knowledge Analysis (WEKA). When compared to other text classifiers, the naive Bayes model allows each attribute to contribute equally and independently to the final judgment, resulting in more efficient computing
6
2015
The author, in one of the research, has used random forest (RF), logistic regression (LR), support vector machine (SVM), and multinomial naive Bayes (MNB) to classify the data. The results demonstrate that naive Bayes provides the best accuracy
[23]
378
H. Kargirwar et al.
3 Flipkart Data-Driven Analyses Headquartered in Bangalore, Karnataka, Flipkart is the leading e-commerce platform in India. Thanks to its prosperous business model, it is growing to other markets, with another branch in Singapore as a private limited company. Our Flipkart-focused information search revealed that only 6.55% of the literature (12 out of the 183 Scopus-indexed published documents) actually tackle reviews classification. The contribution of this paper is a novel method for review classification, which relies on the analysis of both customers’ written text and numeric ratings. By introducing these two pieces of information into the investigation, the method provides the analyst to estimate how reliable a review is as framed by its rating. From this framework, the basic assumption is that customers’ text and numeric ratings are not always consistent [18].
4 Materials and Methods Our methodological approach focused on Flipkart, the leading e-commerce platform in India [19] and followed a similar data collection procedure applied to Amazon as the leading platform in North America [14] and Mercadolibre as the leading platform in South America [18]. Despite previous efforts to analyze the Indian ecommerce market [4], data-driven scrutiny of Indian customer reviews is scarce. Above and beyond the importance of analyzing customer reviews from a natural language processing information perspective, the Indian e-commerce sector remains economically relevant to the planet as it is expected to grow and reach US$99 billion by 2024, given consumers’ preference for online buying, fueled by the increasing and inexpensive Internet data coverage across the country. As per Kumar and Ayodeji [20], this circumstance is of paramount importance as being able to engage customers in online retail stores is essential for successful businesses in India.
4.1 Data Collection We developed an ad-hoc web scraping procedure with the aid of the Scrapy library (version 2.6.1) in Python (version 3.9.7). Scrapy is an open source, free-to-use web scraping framework [21] and can be used to extract public information either from a single page or from multiple pages which are linked to one another. These spiders work on relevant Cascading Style Sheets (CSS) components that are useful for identifying specific alphanumeric characters that remain as text in a Web site. We used the inspect feature of the web browser to identify key CSS components of Flipkart Web site and based on these components. The Flipkart Web site is scraped with the use of a spider/crawler using the framework Scrapy. As mentioned in [22] with help of a
E-commerce Product’s Trust Prediction Based on Customer Reviews
379
Table 2 Sampled items and their average price Product category
Sampled items
Baby products
Average price
841
363.8
Books
1550
194.0
Bottles
1189
380.4
783
298.4
Groceries Gym
1262
521.6
Home Furnishing
1010
335.8
Laptop
1292
46,086.4
Mobile
1485
16,540.2
Sports
1111
371.8
Tools
1036
1497.2
crawler, we can only access certain areas which are mentioned in the robot.txt.Scrapy can only scrape the data which is allowed in robot.txt as scrapy first reads the robot.txt and after that proceeds to scrape the data. The data was publicly posted in reviews by the customers.
4.2 Data Preprocessing After running our developed spiders, we were able to sample 11,559 specific items for 10 product categories. For each product category, we obtained a varied number of specific items on which we then analyzed customer reviews. The particular product categories and their average price in Indian rupees are summarized in Table 2. Figure 1 depicts the customers reviewing data preprocessing. We relied on the following Python libraries for textual analytical purposes: nltk (version 3.2.5), pyspellchecker (version 0.6.2) and from these libraries, we employed other related functionalities such as word tokenization, stopwords, and Word Net Lemmatizer.
5 Results Table 3 shows the accuracy of classification of the two different algorithms obtained upon using the proposed method. The high accuracy of the proposed method is observed in the classification of trust semantics in the comments. We used a certain set of products including mobiles, laptops, sports, bottles, baby products, tools, gym, books, groceries, and home furnishing. The items were decided on the basis of the categories of products on Flipkart Web sites. The reviews of items selected were one of the best sellers from the respected category in the e-commerce Web site.
380
H. Kargirwar et al.
Fig. 1 Customer reviews’ data preprocessing workflow
Table 3 Classification accuracy Product category
Support vector machine
Naive Bayes
Baby products
79
80
Books
76
76
Bottles
76
82
Groceries
80
80
Gym
79
80
Home Furnishing
79
85
Laptop
84
85
Mobile
84
81
Sports
75
82
Tools
81
80
The feedback was later classified as low trust comments and high trust comments. The process can be used to understand the consumer trust relation on the Indian e-commerce Web site and may be employed as a standard tool for comparative e-commerce reviews across countries. The accuracy metrics we have used to understand our comments are the confusion matrix and precision–recall–F1 measure. The confusion matrix provides us with information like true positive, true negative, false negative, and false positive of the data we feed to model. Now with help of this data, we can get precision and recall as shown in Table 4.
E-commerce Product’s Trust Prediction Based on Customer Reviews
381
Table 4 Precision, recall, and F1-score Categories Support vector machine
Naive Bayes
Precision
Recall
F1-score
0
1
0
Precision
Recall
0
1
0
1
0
Baby products
0.82
0.77
0.76 0.83 0.78
0.80
0.86
0.77
0.73 0.88 0.79
Books
0.75
0.78
Bottles
0.76
0.77
0.79 0.73 0.77
0.75
0.76
0.75
0.75 0.77 0.76
0.76
0.77 0.76 0.77
0.76
0.88
0.77
0.74 0.90 0.80
0.83
Groceries
0.79
0.81
0.82 0.79 0.81
0.80
0.86
0.76
0.72 0.88 0.79
0.88
Gym
1
F1-score 1
0
1 0.82
0.80
0.78
0.77 0.81 0.79
0.80
0.85
0.76
0.73 0.88 0.78
0.82
Home 0.75 furnishing
0.84
0.87 0.72 0.81
0.78
0.90
0.82
0.79 0.91 0.84
0.86
Laptop
0.85
0.84
0.84 0.85 0.84
0.84
0.87
0.84
0.83 0.87 0.85
0.85
Mobile
0.83
0.84
0.84 0.83 0.84
0.84
0.80
0.81
0.80 0.81 0.80
0.81
Sports
0.78
0.73
0.71 0.80 0.74
0.77
0.86
0.79
0.76 0.88 0.81
0.83
Tools
0.81
0.81
0.82 0.80 0.81
0.81
0.84
0.77
0.74 0.85 0.79
0.81
6 Discussion Customer reviews are valuable data for e-commerce data analysts, as these reviews are a trustable information source that users of e-commerce platforms can leverage for purchase decisions [11]. In the current article, we have focused on providing a novel data analysis application that facilitates understanding the influence that reviews have on the trustworthiness other customers have put on particular products. Even though the relevance of this application is evident for Flipkart, as the leading e-commerce platform in India, we believe that the framework described here is applicable for any other e-commerce platform with multinational commercial operations, such as Mercadolibre in South America [18] or Amazon in North America [23], let alone the contribution that provides for reproducible research and for the empirical literature on the use of customer reviews for a variety of practical reasons. In this context, although some scholars have suggested that consumers are more concerned about a high-quality pleasant purchasing experience in the entire shopping process [24], this does not imply that managers should discard the importance of related processes such as customers’ complaints and product returns that are visible through customers’ reviews analyses [25]. Our central assumption not only was evident (as illustrated in Table 2), but also conveys other practical implications. By spotting classification accuracy in the range of 75–85%, we regard that this ceiling suggests that reviews do not polarize as much as the ratings do. Ratings of high trust (i.e., ratings of 4 and 5 stars) and ratings of low trust (i.e., ratings 1, 2, or 3 stars) indicate the satisfaction level of customers from any particular seller. Although we are unaware of previous works that suggested a similar conclusion from Flipkart-based samples, we regard
382
H. Kargirwar et al.
this finding as one that deserves further exploration in other regions to assess its generalizability regardless of cultural differences.
7 Conclusion This study can further be elaborated by classifying reviews into multiple categories. The results obtained can be further refined by using state-of-the-art natural language transformers like BERT. E-commerce has seen exponential growth in India with the rise of the Internet throughout the country, and the COVID-19 issue has propelled e-commerce’s growth toward new businesses, clients, and product categories. This growth is anticipated to result in a long-term change in the sorts of e-commerce transactions from those involving luxury goods and services to those involving basic needs. Customer segmentation is one of the most crucial aspects of the e-commerce platforms majorly for two reasons. Firstly, new customers are influenced by reviews of the existing customers, and secondly, sellers can gain valuable information as to why customers prefer them or avoid them which also show the demanding product. Author’s Contribution Juan C. Correa and Dr. Gopal Sakarkar conceived the study and coordinated the research team. Paresh Chaudhari, Shrutika Konde, Hrutuja Kargirwar, Praveen Bhagavatula, and Vipul Dhamde conducted the literature review. Paresh Chaudhari and Praveen Bhagavatula curated the data and analyzed the data. Juan C. Correa and Dr. Gopal Sakarkar wrote the original version of the manuscript. Juan C. Correa wrote the final version of the manuscript. All authors verified and approved the final version of the manuscript.
References 1. Dai H, Chan C, Mogilner C (2020) People rely less on consumer reviews for experiential than material purchases. J Consum Res 46(6):1052–1075 2. Correa JC (2020) Metrics of emergence, self-organization, and complexity for EWOM research. Front Phys 8:35 3. Dash A, Zhang D, Zhou L (2021) Personalized ranking of online reviews based on consumer preferences in product features. Int J Electron Commer 25(1):29–50 4. Kandulapati S, Bellamkonda RS (2014) E-service quality: a study of online shoppers in India. Am J Bus 29(2):178–188 5. Zhang S, Zhong H (2019) Mining users trust from e-commerce reviews based on sentiment similarity analysis. IEEE Access 7:13523–13535 6. Bag S, Tiwari M, Chan F (2019) Predicting the consumer’s purchase intention of durable goods: an attribute-level analysis. J Bus Res 94(C):408–419 7. Hsieh J-K, Li Y-J (2020) Will you ever trust the review website again? The importance of source credibility. Int J Electron Commer 24(2):255–275 8. Kitapci O, Dortyol IT, Yaman Z, Gulmez M (2013) The paths from service quality dimensions to customer loyalty: an application on supermarket customers. Manag Res Rev 36(3):239–255 9. Zhang J, Lu X, Liu D (2021) Deriving customer preferences for hotels based on aspect- level sentiment analysis of online reviews. Electron Commer Res Appl 49
E-commerce Product’s Trust Prediction Based on Customer Reviews
383
10. Gupta S, Valecha M (2016) Neglected impact of online customer reviews in healthcare sector. In: Proceedings of conference on brand management, pp 170–171 11. Teichert T, Rezaei S, Correa JC (2020) Customers’ experiences of fast food delivery services: uncovering the semantic core benefits, actual and augmented product by text mining. Br Food J 122(11):3513–3528 12. Li L, Johnson J, Aarhus W, Shah D (2022) Key factors in MOOC pedagogy based on NLP sentiment analysis of learner reviews: what makes a hit. Comput Educ 176:104354 13. Baek H, Ahn J, Choi Y (2012) Helpfulness of online consumer reviews: readers’ objectives and review cues. Int J Electron Commer 17(2):99–126 14. Song W, Li W, Geng S (2020) Effect of online product reviews on third parties’ selling on retail platforms. Electron Commer Res Appl 39:100900 15. Geng S, Li W, Qu X, Chen L (2017) Design for the pricing strategy of return-freight insurance based on online product reviews. Electron Commer Res Appl 25:16–28 16. Bhargava K, Gujral T, Chawla M, Gujral T (2016) Comment based seller trust model for ecommerce. In: 2016 international conference on computational techniques in in- formation and communication technologies (icctict), pp 387–391 17. Xia P, Jiang W (2018) Understanding the evolution of fine-grained user opinions in product reviews. In: 2018 IEEE smart world, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (Smart-World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp 1335–1340 18. Correa JC, Laverde-Rojas H, Martinez CA, Camargo OJ, Rojas-Matute G, Sand-oval-Escobar M (2021) The consistency of trust-sales relationship in Latin American e-commerce. J Internet Commerce 1–21 19. Dutta N, Bhat AK (2014) Flipkart: journey of an Indian e-commerce start-up. Emerald Emerg Markets Case Stud 4(7):1–14 20. Kumar V, Ayodeji OG (2021) Determinants of the success of online retail in India. Int J Bus Inf Syst 37(2):246–262 21. Kouzis-Loukas D (2016) Learning scrapy. Packt Publishing Ltd., pp 1–11 22. Peshave M (2005) How search engines work and a web crawler application. Citeseerx.ist.psu.edu, pp 7–8 23. Rabiu I, Salim N, Da’u A, Nasser M (2022) Modeling sentimental bias and temporal dynamics for adaptive deep recommendation systems. Expert Syst Appl 191 24. Zhao W, Deng N (2020) Examining the channel choice of experience-oriented customers in omni-channel retailing. Int J Inf Syst Serv Sect (IJISSS) 12(1):16–27 25. de Borba J, Magalhaes M, Filgueiras R, Bouzon M (2020) Barriers in omni-channel retailing returns: a conceptual framework. Int J Retail Distrib Manag 49(1):121–143
Improving Amharic Handwritten Word Recognition Using Auxiliary Task Mesay Samuel Gondere, Lars Schmidt-Thieme, Durga Prasad Sharma, and Abiot Sinamo Boltena
Abstract Amharic is one of the official languages in the Federal Democratic Republic of Ethiopia. It uses an Ethiopic script derived from Ge’ez which is an ancient and currently a liturgical language. Amharic is also one of the most widely used literature-rich languages in Ethiopia. There are very limited innovative and customized research works in Amharic optical character recognition (OCR) in general and Amharic handwritten text recognition in particular. In this study, Amharic handwritten word recognition was investigated. State-of-the-art deep learning techniques including convolutional neural networks together with recurrent neural networks and connectionist temporal classification (CTC) loss were used to make the recognition in an end-to-end fashion. More importantly, an innovative way of complementing the loss function using the auxiliary task from the row-wise similarities of the Amharic alphabet was tested to show a significant recognition improvement over a baseline method. The findings of this study promote innovative problem-specific solutions as well as open insights into generalized solutions that emerge from problem-specific domains. Keywords Convolutional recurrent neural networks · Handwritten word recognition · Auxiliary task · Amharic handwritten recognition
M. S. Gondere (B) Faculty of Computing and Software Engineering, Arba Minch University, Arba Minch, Ethiopia e-mail: [email protected] L. Schmidt-Thieme Information Systems and Machine Learning Lab, 31141 Hildesheim, Germany e-mail: [email protected] D. P. Sharma AMUIT MOEFDRE under UNDP and MAISM under RTU, Kota, India A. S. Boltena Ministry of Innovation and Technology, Addis Ababa, Ethiopia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_29
385
386
M. S. Gondere et al.
1 Introduction Amharic is the official language of the federal government of Ethiopia and other regional states in Ethiopia. It is one of the languages that uses an Ethiopic script which is derived from Ge’ez, an ancient and currently a liturgical language. Amharic script took all of the symbols in Ge’ez and added some new ones that represent sounds not found in Ge’ez. Amharic is one of the most widely used literature-rich languages of Ethiopia. It is also the second widely spoken Semitic language in the world after Arabic. There are many highly relevant printed as well as handwritten documents available in Ethiopia. These documents are primarily written in Ge’ez and Amharic languages covering a large variety of subjects including religion, history, governance, medicine, philosophy, astronomy, and the like [1–3]. Hence, due to the need to explore and share the knowledge in these languages, there are further established educational programs outside Ethiopia as well, including Germany, the USA, and China recently. Also nowadays, Amharic document processing and preservation is given much attention by many researchers from the fields of computing, linguistics, and social science [4]. Far from the fact that rich written document resources in Ge’ez and Amharic, the technological and research advancements are limited. Especially, innovative and customized research works supporting these languages are very limited. Except from language model perspective, technically addressing Amharic written documents will address Ge’ez documents since the Amharic alphabet extends the Ge’ez alphabet. Even though optical character recognition (OCR) for the unconstrained handwritten documents itself is still an open problem, Amharic OCR, in general, and Amharic handwritten text recognition, in particular, are not widely studied. There are some research works growing with the introduction and potential of recent deep learning techniques addressing OCR problems. However, those works focus on directly adapting off-the-shelf existing methods for the case of Amharic scripts. That is, most of the studies are only adapting state-of-the-art techniques which are mainly designed for Latin scripts. Even though such studies will have their own benefit by showing how universally proposed solutions will fit specific problem domains, on the other hand however will hinder innovations that could emerge because of specific problems. In this regard, while contextualized and innovative techniques to address Amharic script specifically are very important, it is overlooked in emerging research works [5–7]. Offline handwritten text recognition is the hardest of OCR problems. It is offline because the document is taken or scanned separately beforehand without anticipating any recognition technology. Accordingly, one cannot get or extract any relevant feature during the writing process, unlike online OCR. Another issue is the challenge with the complexity and varied handwriting style of people which is not the case with printed documents. Recent OCR problems are addressed using deep learning techniques and could be either character-based, word-based, or sequence-based. Character-based methods focus on finding specific locations of individual characters and recognizing them. Whereas word-based methods solve text recognition as a word
Improving Amharic Handwritten Word Recognition …
387
classification problem, where classes are common words in a specific language. The current state-of-the-art methods use sequence to sequence methods. These methods treat OCR as a sequence labeling problem [8].
1.1 Related Works It is worth mentioning some related works and notable attempts in OCR in general and Amharic OCR in particular. A notable and innovative contribution by Assabie and Bigun [9] for Amharic word recognition is one of the earlier works. In this paper, writer-independent HMM-based Amharic word recognition for offline handwritten text is presented. The underlying units of the recognition system are a set of primitive strokes whose combinations form handwritten Ethiopic characters. Similar to recent sequence-based deep learning methods, in this paper, the recognition phase does not require segmentation of characters [9, 10]. Other recent works have adapted deep learning methods for character-based recognition [2, 11] and sequence-based recognition [4, 5]. Sequence-based methods particularly address the problem of OCR in an end-to-end fashion using a convolutional neural network as feature extractor from the text image, recurrent neural network as sequence learner, and connectionist temporal classification as a loss function and transcriber. The combination of convolutional neural network (CNN) and recurrent neural network (RNN) is termed as recurrent convolution neural network (CRNN). A very recent work Abdurahman et al. [5] designed a custom CNN model and investigated different state-of-the-art CNN models and make available the first public Amharic handwritten word image dataset called HARD-I. The authors also conducted extensive experiments to evaluate the performance of four CRNN-CTC-based recognition models by analyzing different combinations of CNN and RNN network architectures. In this paper, the state-of-the-art recognition accuracy for handwritten Amharic words is reported. Similarly, Belay et al. [4] presented an end-to-end Amharic OCR for printed documents. Apart from adapting off-the-shelf deep learning techniques, there are some innovative works proposing solutions for the problems that arise from deep learning techniques like computational cost and large labeled dataset requirements. Puigcerver [12] proposed the usage of only one dimensional RNN and data augmentation to help better recognition and faster computation. Looking for ways that can make improvement through the process of the recognition pipeline is also important. Yousefi et al. [13] proposed to skip the binarization step in the OCR pipeline by directly training a 1D long short-term memory (LSTM) network on gray-level text-lines for binarization-free OCR in historical documents. Transfer learning is another important technique that enables the transfer of learned artifacts from one problem to solve another problem. This is particularly relevant during a shortage of datasets for the intended problem [14, 15]. Granet et al. [15] dealt with transfer learning from heterogeneous datasets with a ground-truth and sharing common properties with a new dataset that has no ground-truth to solve handwriting recognition on historical doc-
388
M. S. Gondere et al.
uments. Jaramillo et al. [14] propose the boosting of handwriting text recognition in small databases with transfer learning by retraining a fixed part of a huge network. Wang and Hu [16] proposed a new architecture named gated RCNN (GRCNN) for solving OCR by adding a gate to the recurrent convolution layer (RCL), the critical component of RCNN. The gate controls the context modulation in RCL and balances the feed-forward information and the recurrent information. Finally, relevant input to this study is explored in the work of Gondere et al. [2] on how auxiliary tasks are extracted from the Amharic alphabet and improve Amharic character recognition using multi-task learning. A related and typical example of how such a way of addressing specific problem helps in innovative and generalized problem solution for multi-script handwritten digit recognition which is demonstrated in [17]. Recent papers continued to emphasize the superiority of deep convolutional neural networks along with their challenges. Fathima et al. [18] proposed a very deep convolutional neural network architecture to automate the classification of handwritten digits based on empirical comparison with shallow networks. In contrast, Jain [19] revealed how even machine learning algorithms suffer the issue of overfitting requiring regularization and early stopping. A recent work by Srinivasa and Negi [20] implemented the state-of-the-art method using convolutional recurrent network (CRNN) for Telugu scene text recognition. The author has used weights pretrained on large English scene text dataset to address the difficulty arising due to small datasets. Dixit et al. [21] performed optical character recognition (OCR) using convolutional neural networks coupled with voice-over utility that reads out the text recognized and conversion to Braille to help visually impaired people. This also further confirms the continued trust on deep learning algorithms. In this study, Amharic handwritten word recognition was investigated in an endto-end fashion using CRNN and CTC loss. The word level image datasets were built from character level image datasets of the Amharic alphabet by referring to compiled Amharic words extracted from news and concatenating the character images using writer id. In this study, the usual CRNN model will serve as a baseline model, and the study presents an innovative way of complementing the loss function using the auxiliary task from the row-wise similarities of the Amharic alphabet as a proposed model. As can be seen in Fig. 1 showing part of the Amharic alphabet, the row-wise similarities are exploited as auxiliary classification task to show the significant recognition improvement than a baseline method. The contribution of this study can be summarized in two folds: (i) showing an easy and meaningful way of organizing datasets for Amharic handwritten texts and (ii) demonstrating an innovative way of improving Amharic handwritten word recognition using an auxiliary task. Finally, the methods and materials used for the study are presented in the following section. Experimental results with the discussions are covered in section three, and conclusions are forwarded in the last section.
Improving Amharic Handwritten Word Recognition …
389
2 Methods and Material 2.1 Datasets Preparation One of the challenges that makes the effort of Amharic OCR research work limited is the unavailability of datasets. Very recent and parallel work by Abdurahman et al. [5] currently made a great contribution by making the Amharic handwritten text dataset available. However, in the current study, first, the dataset for Amharic handwritten characters was organized from Assabie and Bigun [22] and Gondere et al. [2]. It is organized to 265 Amharic characters by 100, 10, and 10 three groups of separate writers for training, validation, and test sets, respectively. Each character image has its label, row label, and writer id. Hence, the row label is used for training the auxiliary task, and the writer id is used to generate word images from the same writer by concatenating its character images and referring to the compiled Amharic words extracted from the BBC Amharic news Website. Throughout the process of organizing the datasets, Python scripts are written and used. Figure 2 shows how the dataset organization was done. In such data shortage scenarios, it is important to create a reasonable way of organizing datasets to make machine learning experiments possible. Accordingly, the usage of electronic texts and the corresponding character image from the alphabet written by different writers to generate word images allows the inclusion of a large number of characters and a varied natural sequence of characters due to unrestricted coverage of real-world electronic texts. This method is manageable to get a large number of writers while getting only a fixed number of Amharic characters written by respondents. More importantly, the sequence learning method employed in this study and the nature of Amharic writing which is not cursive makes this dataset organization feasible and reasonable. Hence in this study, 256 and 100 word images are used for training set as 2561 words written by each of the 100 writers. Likewise, two sets of 3200 word images are used as 320 words written by each of the 10 writers for validation and test sets. The minimum and maximum word lengths are two and
Fig. 1 Parts of Amharic alphabet. Source Omniglot
390
M. S. Gondere et al.
Fig. 2 Organization of Amharic handwritten word dataset
twelve, respectively, and 223 characters were incorporated out of the 265 Amharic characters. Connectionist temporal classification (CTC) is the key to avoiding segmentation at a character level, which greatly facilitates the labeling task [14]. Accordingly, recent works employ CRNN with CTC loss in an end-to-end fashion [4, 5]. The deep CNN allows strong and high-level representation from the text image. The RNN (LSTM) allows exploiting content information as a sequence labeling problem [23]. In this study, two models are experimentally tested: the baseline method and the proposed method. The architecture of these models remains the same except an additional head which is added at the end of the network for the auxiliary task in the case of the proposed method. As shown in Fig. 3, the model comprises three major components: the CNN for automatic feature extraction, RNN for sequence learning, and CTC as output transcriber. In the proposed model, due to the promises of multi-task learning from the nature of the Amharic characters [2], the sum of the CTC losses for the character label and row label is optimized. To give emphasis to the difference between the competing models, basic configurations are set for CNN and RNN components as described in Fig. 3. CTC losstotal = CTC losschar + CTC lossrow CTC loss = −
(1)
log p(l|x)
(2)
p(π |x)
(3)
yπt t
(4)
(x,l)∈D
p(l|x) =
π∈F −1 (l)
p(π |x) =
T t=1
Improving Amharic Handwritten Word Recognition …
391
Fig. 3 Architecture of the baseline and proposed models
In this study, Eqs. (1)–(4) are used for defining the CTC loss. Eq. (1) presents the total loss to be optimized which is implemented by the proposed method. It comprises the sum of the two CTC losses: CTC for character-based transcription (baseline method) and CTC for row-based transcription. The objective of the CTC loss as shown in Eq. (2) is to minimize the negative log probability. p(l|x) is the conditional probability of the labels for a given training set (D) consisting input (x) and label (l) sequences.The conditional probability p(l|x) of a given mapped label l is calculated using the sum of probabilities of all the alignment paths π mapped to that label using Eq. (3). The conditional probability p(π |x) of a given alignment path as shown in Eq. (4) is calculated by the product of all labels including blank characters occurring at each time step along that path. Where πt is the label occurred at time t, yπt t is probability of the label, and T is the input sequence length. It should be noted that paths should be mapped using a function (F) into correct label sequences by removing the repeated symbols and blanks for providing the correct label path for the set of aligned paths. Further details about the baseline method can be found in [5], and a broader insight of the proposed model is presented in [2, 17].
3 Experimental Results and Discussion All the experiments in this study are implemented using the PyTorch machine learning library on the Google Colab and the computing cluster of Information Systems and Machine Learning Lab (ISMLL) from the University of Hildesheim. Each character image is 32 × 32 pixel, and hence, the word images through concatenation will have a height of 32 and width of 32 times the number of characters in that word. Accordingly, to address the variety of word lengths and maintain the natural appearance of the word images, models are trained using stochastic gradient descent.
392
M. S. Gondere et al.
Fig. 4 Loss curves of the models on validation set
Due to high computation costs, experiments are set to run for only fewer epochs by examining the learning behaviors. Hence during training, both the baseline and proposed models run for 30 epochs which are taking a training time of 4 h. In each and several trials, a significant superiority was observed from the proposed model over the baseline. That is a largely better result is achieved by the proposed model even in the earlier training epochs. Hence, we let the baseline model run up to 30 more epochs to further examine the differences. Finally, the trained models from both setups were tested using a separate test set. The word error rate (WER) and character error rate (CER) are used as evaluation metrics. Further, the loss and accuracy curves of the models are presented to show the differences in the learning behavior. As can be seen in Fig. 4, the proposed model has converged much earlier since the 20th epoch. The loss of the baseline method has reached 0.8035 at the 30th epoch on the validation set. However, the loss of the proposed model reached 0.0664 at the 30th epoch on the validation set. Similarly, the training behaviors of the models as shown in Figs. 5 and 6 demonstrate the superiority of the proposed model. The losses of the proposed and baseline models on the training set during the 30th and 60th epochs are 0.0638 and 0.1774, respectively. This result demonstrates how the proposed model quickly optimizes the loss during training. Likewise, the accuracy of the baseline model on the training set during the 60th epoch (83.39%) is still less than that of the proposed model (96.61%) during 30th epoch. As shown in Table 1, the WER and CER of the baseline model on the test set are 16.67 and 5.52, respectively. The proposed model on the other hand outperformed with WER of 4.21 and CER of 1.04 with only up to 30th epoch trained model. In this study, the significant supremacy of the proposed model can be another empirical evidence for the relevance of multi-task learning through the related tasks
Improving Amharic Handwritten Word Recognition …
393
Fig. 5 Learning behavior (loss curve) of the models on training set
that emerged from the nature of the Amharic alphabet in Amharic handwritten character recognition which was explored in [2]. More importantly, in this study which applied sequence-based learning, the integration of multi-task learning that allowed the exploitation of the information contained within characters to complement the whole recognition performance is an interesting finding. Even though the datasets and architectures used by other similar studies vary, the significant improvement by the proposed model over the baseline model in this study implies the importance of the proposed model. That is, while the baseline model recognition result is less than the results reported by other studies, the proposed model, however, has surpassed all of them as shown in Table 1. The result reported by Abdurahman et al. [5] is a WER of 5.24 and a CER of 1.15. The authors used the various CRNN configurations with CTC loss like the baseline model in this study. The authors have compiled 12,064 Amharic handwritten word images from 60 writers and have done manual augmentation of up to 33,672 handwritten Amharic word images. Belay et al. [4] reported a CER of 1.59 for printed Amharic optical character recognition using a related architecture formulated in CRNN and CTC loss. From the total text-line images (337,337) in their dataset, 40,929 are printed text-line images written with the Power Ge’ez font; 197,484 and 98,924 images are synthetic text-line images generated with different levels of degradation using the Power Ge’ez and Visual Ge’ez fonts, respectively.
394
M. S. Gondere et al.
Fig. 6 Learning behavior (accuracy curve) of the models on training set Table 1 Comparison of results on test set Model Dataset Belay et al. [4] Abdurahman et al. [5] Our baseline Our proposed
337,337 printed Amharic text-lines 33,672 handwritten Amharic words 256,100 constructed handwritten words 256,100 constructed handwritten words
Method
WER (%)
CER (%)
CRNN + CTC
–
1.59
CRNN + CTC
5.24
1.15
CRNN + CTC
16.67
5.52
CRNN + 2CTC
4.21
1.04
4 Conclusion Several studies are emerging to address text image recognition as a sequence learning problem. This was possible due to the integration of convolutional neural networks, recurrent neural networks, and connectionist temporal classification. Hence, various works have addressed the OCR problem in an end-to-end fashion using CRNN and CTC loss. They have also demonstrated the suitability of this approach and the improved results. From the advantages of different deep learning innovations and opportunities gained in specific languages, it is important to propose novel approaches which can again further be generalized to different problems. Hence, in this study due to the nature of the Amharic alphabet organization allowing different parallel classification tasks and the advantages of multi-task learning, a new way of improving the traditional CRNN with CTC loss approach is proposed. The traditional
Improving Amharic Handwritten Word Recognition …
395
CRNN with CTC loss model is implemented in an end-to-end fashion as a baseline model to address the problem of Amharic handwritten word recognition. The proposed model on the other hand is implemented by complementing the loss function using the auxiliary task from the row-wise similarities of the Amharic alphabet. The results of the study demonstrated significant improvements by the proposed model both in performance and learning behavior. Finally, in this study, for a quick demonstration of the proposed model, only the row-wise similarity was explored as a related task which is very promising since characters in the same row in the Amharic alphabet have a very similar shape. However, a similar setup with column-wise similarity or integrating both can be explored as important future research work.
References 1. Belay BH, Habtegebirial T, Liwicki M, Belay G, Stricker D (2019) Amharic text image recognition: database, algorithm, and analysis. In: 2019 International conference on document analysis and recognition (ICDAR), IEEE, Sydney, NSW, Australia, pp 1268–1273. https://doi.org/10. 1109/ICDAR.2019.00205 2. Gondere MS, Schmidt-Thieme L, Boltena AS, Jomaa HS (2019) Handwritten amharic character recognition using a convolutional neural network. Arch Data Sci Ser A (Online First). 6(1):1– 14. https://doi.org/10.5445/KSP/1000098011/09 3. Abebe RT (2013) Biological thoughts as they are enlightened in the Ethiopian commentary tradition: annotation, translations and commentary’, MSc thesis, Addis Ababa University, Addis Ababa 4. Belay B, Habtegebrial T, Meshesha M, Liwicki M, Belay G, Stricker D (2020) Amharic OCR: an end-to-end learning. Appl Sci 10(3):1117. https://doi.org/10.3390/app10031117 5. Abdurahman F, Sisay E, Fante KA (2021) AHWR-Net: offline handwritten Amharic word recognition using convolutional recurrent neural network. SN Appl Sci 3(8):1–11. https://doi. org/10.1007/s42452-021-04742-x 6. Belay BH, Habtegebrial T, Liwicki M, Belay G, Stricker D (2021) A blended attention-CTC network architecture for Amharic text-image recognition. In: ICPRAM, pp 435-441. https:// doi.org/10.5220/0010284204350441 7. Yohannes Obsie E, Qu H, Huang Q (2021) Amharic character recognition based on features extracted by CNN and auto-encoder models. In: 2021 The 13th international conference on computer modeling and simulation, association for computing machinery, Melbourne, VIC, Australia, pp 58–66. https://doi.org/10.1145/3474963.3474972 8. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 9. Assabie Y, Bigun J (2009) Hmm-based handwritten Amharic word recognition with feature concatenation. In: 2009 10th International conference on document analysis and recognition, IEEE, Barcelona, Spain, pp 961–965. https://doi.org/10.1109/ICDAR.2009.50 10. Assabie Y, Bigun J (2011) Offline handwritten Amharic word recognition. Pattern Recog Lett 32(8):1089–1099. https://doi.org/10.1016/j.patrec.2011.02.007 11. Belay BH, Habtegebrial TA, Stricker D (2018) Amharic character image recognition. In: 2018 IEEE 18th International conference on communication technology (ICCT), IEEE, Chongqing, China, pp 1179–1182. https://doi.org/10.1109/ICCT.2018.8599888 12. Puigcerver J (2017) Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International conference on document analysis and recog-
396
13.
14.
15.
16. 17.
18.
19.
20.
21.
22. 23.
M. S. Gondere et al. nition (ICDAR), vol 1, IEEE, Kyoto, Japan, pp 67–72. https://doi.org/10.1109/ICDAR.2017. 20 Yousefi MR, Soheili MR, Breuel TM, Kabir E, Stricker D (2015) Binarization-free, OCR for historical documents using LSTM networks. In: 2015 13th International conference on document analysis and recognition (ICDAR), IEEE, Tunis, Tunisia, pp 1121–1125. https://doi. org/10.1109/ICDAR.2015.7333935 Jaramillo JCA, Murillo-Fuentes JJ, Olmos PM (2018) Boosting handwriting text recognition in small databases with transfer learning. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR), IEEE, Niagara Falls, NY, USA, pp 429-434. https://doi. org/10.1109/ICFHR-2018.2018.00081 Granet A, Morin E, Mouchère H, Quiniou S, Viard-Gaudin C (Jan 2018) Transfer learning for handwriting recognition on historical documents. In: 7th International conference on pattern recognition applications and methods (ICPRAM), Madeira, Portugal Wang J, Hu X (2017) Gated recurrent convolution neural network for OCR. Adv Neural Inf Process Syst 30 Curran Associates, Inc., Gondere MS, Schmidt-Thieme L, Sharma DP, Scholz R (2022) Multi-script handwritten digit recognition using multi-task learning. J Intell Fuzzy Syst 43(1):355–364. https://doi.org/10. 3233/JIFS-212233 Fathima MD, Hariharan R, Ammal M (2022) Handwritten digit recognition using very deep convolutional neural network. In Congress on intelligent systems, Springer, Singapore 111:599–612. https://doi.org/10.1007/978-981-16-9113-3_44 Jain N (2021) Optimization of regularization and early stopping to reduce overfitting in recognition of handwritten characters. In Intelligent learning for computer vision, Springer, Singapore 61:305–323. https://doi.org/10.1007/978-981-33-4582-9_24 Srinivasa Rao N, Negi A (2022) Improved Telugu scene text recognition with thin plate spline transform. In: Congress on intelligent systems, vol 111. Springer, Singapore, pp 891–900. https://doi.org/10.1007/978-981-16-9113-3_65 Dixit S, Velaskar A, Munavalli N, Waingankar A (2021) Text recognition using convolutional neural network for visually impaired people. Intell Learn Comput Vis 61:487–500. https://doi. org/10.1007/978-981-33-4582-9_38 Assabie Y, Bigun J (2009) A comprehensive dataset for Ethiopic handwriting recognition. In: Proceedings SSBA ’09: symposium on image analysis, Halmstad University, pp 41–43 He P, Huang W, Qiao Y, Loy CC, Tang X (Mar 2016) Reading scene text in deep convolutional sequences. In: Thirtieth AAAI conference on artificial intelligence, North America
Computational Drug Discovery Using Minimal Inhibitory Concentration Analysis with Bacterial DNA Snippets K. P. Sabari Priya, J. Hemadharshini, S. Sona, R. Suganya, and Seyed M. Buhari
Abstract We live in a well-ordered consistent ecosystem largely populated with both beneficial and detrimental microbes whose forthright ancestors were present at the origins of life on this planet around 3.5 billion years ago. One of the most pressing medical challenges of our time is the antimicrobial resistance (AMR) crisis where the antibiotic medications used to inhibit bacterial interaction turn out to be dormant and allow the bacteria to thrive in its existence. Moreover, drug discovery and computation are an intricate, and dynamic process that on average takes about 8–12 years and costs over £1 billion. This adds extra cost to drug pricing and laymen may be coerced to remunerate for it or doctors may be reluctant to stipulate it. This paper aims to develop a machine learning model to identify the bacterium species and project the MIC of drugs/antibiotics used to restrain the bacterial interaction with the human body. The system integrates the bacterial species classification from genetic histogram snippets using principal component analysis and the computation of Minimum Inhibitory Concentration (MIC) of the antibiotics for the bacterium species identified using neural networks. Diagnosis of these pathogenic bacteria and drug computation will intend to eliminate certain phases of laborious pre-laboratory procedures and fetch reliable test results which were time consuming, complex and labor-intensive. Keywords Minimum inhibitory concentration · Principal component analysis · Genetic histogram · Neural network
K. P. S. Priya · J. Hemadharshini · S. Sona · R. Suganya (B) · S. M. Buhari School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, TamilNadu, India e-mail: [email protected] S. M. Buhari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_30
397
398
K. P. S. Priya et al.
1 Introduction Bacteria are regarded as the foremost forms of life to appear on Earth which is multifarious, superabundant, and omnipresent. The microbial world comprising bacteria, fungi, and viruses is the base of the global ecology. They colonize all environmental nooks, taking in the surface, cavity, and cellular milieu of every organism on the Earth. Few strains of bacteria can yield fatal diseases in humans and other living organisms, such as diphtheria, typhoid, cholera, plague, dysentery, tuberculosis, pneumonia, and innumerable infections. Exposing the human body to hazardous bacteria causes arduous threats to the immune system. Bacterial diseases have a considerable effect on public health, and the resistance imposed by bacteria onto antibiotics or drugs to impede the bacterial infection is an extensively developing problem with significantly devastating impacts on global health, food safety, and the development of mankind. Antibiotic resistance arises when certain bacteria transform in reaction to the usage of certain antibiotics as mentioned by Taiyari et al. [1]. Trivial diseases like flu, pneumonia, gonorrhea, tuberculosis, and cholera are growing harder, and occasionally impossible, to cure as antibiotics become less efficacious. The United Nations has stated that the population of the world is projected to make up to 8.5 billion in 2030 reaching 9.7 billion in 2050 and 11.2 billion by 2100 which also contributes to an increase in the incidence of untreatable infectious diseases concerning humans at a global scale. More than 1.2 million people died in the year 2019 due to the global effect of AMR. This new resistance mechanism of microbes or antimicrobial resistance (AMR) crisis is directed at various debilitating and lethal ailments. The relation between AMR and Minimum Inhibitory Concentration (MIC) has been studied through works of Xiao and You [2], and Kromer-Edwards et al. [3]. AMR is caused by three predominant factors. • The rising frequency of AMR phenotypes amidst the microbes is an evolutionary reciprocation to the overall usage of antimicrobials • The vast and globally linked human inhabitation permits microbes in any conditions access to all of humankind • The expansive and frequent nonessential usage of antibiotics by humans enhances the intense selective pressure that is compelling the evolutionary reciprocation in the microbial ecology. Ineffective means for the prevention of drug-resistant diseases result in futile clinical workflows where researchers are incompetent in delivering effective antibiotics at higher speeds and accuracy. The process of drug computation and development is complex, and ever evolving which can take about 8–12 years and cost on average, over £1 billion. According to the “Consumer Price Index”, medical goods and services have witnessed the most considerable increase in the previous 5 years. In the successful production of life-saving medications, the conventional system must
Computational Drug Discovery Using Minimal Inhibitory …
399
be enhanced which will optimize the clinical trial research and extra cost imposed on drugs. Clinical decision-making is counted on statistical insights, structured machine learning algorithms can help interact with a wide range of genetic and AMR phenotype data of distinctive target populations and be more explicit and error-free than manual outcomes of biochemists, genetic engineers, or pharmacologists.
2 Related Works In the paper by Nguyen et al. [4] presents an extreme gradient boosting (XGBoost)based machine learning model to predict the antimicrobial Minimum Inhibitory Concentration and its associated genomic attributes of nontyphoidal Salmonella for 15 antibiotics. A dataset of 5278 genomes of non-typhoidal salmonella was handled in this paper, was gathered across 15 years in the USA. About 95% of overall accuracy is acquired. Without any primary information regarding the basic gene features or resistant phenotypes, a highly accurate MIC is predicted by choosing training datasets containing diverse genomes. A similar model is devised to determine the clinical diagnosis of viruses by Long et al. [5]. A graph convolution network model with conditional random field fine-tuning to predict the microbe-drug association is developed. The customized GCN model was employed on experimentally verified datasets such as MDAD, a biofilm, and drug-virus to predict the associations. A restart-based scheme was applied on the similarity networks of both drug and microbes to effectively select more valuable features. An attention mechanism in the CRF layer will assign greater weights to more similar nodes and neighborhoods to conserve the similarity information. The aforementioned GCNMDA method has experimentally surpassed the performance of existing state-of-the-art technologies on all three datasets. Alternatively, neural network architecture and SDG-ADAM methodology were proposed by Barros [6] to build machine learning models to predict Minimum Inhibitory Concentration (MIC) for the corresponding genomes and the susceptibility profiles for nine antibiotics. The training of the model was performed using AMR metadata from PATRIC and a reference-free k-mer analysis of the whole genome sequence (WGS) considering the k-mer count and twofold dilution series. The classification of the susceptibility category classes of the antibiotic interaction has taken the account of MIC breakpoints. The efficiency of the model has been verified by accuracy metrics like F1-score, recall, and precision, and the results are presented in tenfold cross validation. The highest accuracies are recorded for ciprofloxacin (96%), chloramphenicol (94%), and Very Major Error (VME) less than 20%. The following Table 1 gives a summary of approaches found in different journals.
400
K. P. S. Priya et al.
Table 1 Literature survey of related works and corresponding accuracies Proposed year
Author
Traditional machine learning algorithm used
Accuracy (%)
2019
Marcus Nguyen, S. Wesley Long, Patrick F. McDermott, Randall J. Olsen, Robert Olson, Rick L. Stevens, Gregory H. Tyson, Shaohua Zhao, James J. Davis
1. Antimicrobial susceptibility testing 2. Deep learning 3. Machine learning
95–96
2020
Yahui Long, Min Wu, Chee Keong Kwoh, Jiawei Luo, Xiaoli Li
1. Graph convolutional network (GCN) 2. Conditional random field (CRF)
90
2021
Cristian C. Barros
1.Neural network (NN) architecture 2. SGD-ADAM algorithm 3. Machine learning (ML)
95
2008
Leonardo G. Tavares, Heitor S. Lopes and Carlos R. Erig Lima (Tavares et al. [7])
1. Data classification 2. Machine learning
96.22
3 Objectives The main objectives the proposed methodology aim to achieve are: • To identify the bacterium species and subspecies/variants from genetic DNA segments. • To use computation of Minimum Inhibitory Concentration (MIC) of antibiotics/drugs in retarding and suppressing the interaction of the bacteria identified.
4 Proposed Methodology In this work, we propose a novel framework to classify the bacterium species based on the DNA snippets and compute the MIC value for suitable antibiotics to impede the microbial growth. The genetic framework consists of three primary modules as in (see Fig. 1). Initially, an exploratory data analysis is performed to visualize the datasets and determine the patterns and trends, in which the data is aligned by leveraging the microbe-drug data available. Genetic DNA snippet is the input which initiates the distribution of bacterium species and correlation between histograms and most correlated gene pairs are determined. PCA module performs the dimensionality reduction of compressed measurements of DNA sequences which results in minimizing information loss and run-time. The processed data from the bacterial dataset is normalized
Computational Drug Discovery Using Minimal Inhibitory …
401
Fig. 1 Workflow diagram
and transformed into normal distributions. A PCA model is built and gene importance of top principal components is visualized graphically. Based on the principal component identified, the bacterial species are projected onto the corresponding axes. Using the top hundred components identified, the classification of species is performed by employing three different classification models. The MIC computation includes feature scaling and feature selection, target encoding, neural network model building, model evaluation, and mapping susceptibility.
4.1 Data Processing The bacterial DNA snippet dataset from ChemBL and antimicrobial resistance (AMR) phenotype dataset from PATRIC database is used in the training and testing of the system. The DNA snippet training set contains around 200,000 bacteria’s
402
K. P. S. Priya et al.
compressed measurements of the spectrum of histograms with snippet length 10 of the bases adenosine, thymine, guanine, and cytosine (A-T-G-C). Ten different bacterium species like Salmonella enterica, Klebsiella pneumoniae, Streptococcus pneumonia, Campylobacter jejuni, etc., as the class labels, and their corresponding 286 histogram possibilities from A0T0G0C10 to A10T0G0C0 are taken as independent variables. Duplicate tuples and rows with many missing attributes are dropped. The antimicrobial resistance (AMR) phenotype metadata from PATRIC database consists of 7000 Salmonella genome information and their corresponding Minimum Inhibitory Concentration (MIC) values of a twofold dilution series and AMR phenotype (“resistant”, “intermediate”, and “susceptible”) to 12 most used antibiotics worldwide. From the knowledge, transfer of k-mer analysis is then performed on the nucleotide assemblies by determining the k-mer count.
4.2 Principal Component Analysis Principal components are identified by searching linear combinations of original variables in the optimal orthogonal direction such that the maximum variance is captured in the data projected along the axis. The maximum variance along the orthogonal axis implies less similar attributes; less similar attributes signify more information retained. According to Guicheteau et al. [8], the PCA model is sensitive to scale and the range of attributes. Skewed data distribution and outliers can negatively affect the performance of the model. Hence, the data is transformed into normal distribution using the BoxCox transformation which follows power law where the relative change in one variable varies as the power of another. The data frame is then standardized by fitting it with StandardScaler which transforms the data frame with mean of 0 and standard deviation 1. The cumulative explained variance and individual explained variance with the increase in the number of principal components over the x-axis are shown in (see Fig. 2). A cumulative variance of around 70% is achieved in the graph as early as 10 principal components and each successive component captures less than 1% of the variance in the data.
4.3 Exploratory Data Analysis The bacterial genome dataset and the AMR phenotype dataset have the perspectives to explore the data in depth and visualize their characteristics using statistical graphics. Distribution of 10 bacterium species Escherichia fergusonii, Enterococcus hirae, Escherichia coli, K. pneumoniae, S. enterica, Bacteroides Fragilis, C. jejuni, Staphylococcus aureus, S. pneumoniae, and Streptococcus pyogenes is visualized. It is evenly distributed thus making it a balanced dataset that can achieve a higher
Computational Drug Discovery Using Minimal Inhibitory …
403
Fig. 2 Principal components’ explained variance
accuracy model. The top 10 principal components are determined and their importance is statistically weighed. The scatter plot in (see Fig. 3) displays the projection of bacterium species onto the first 2 principal components, i.e., principal component 1 and principal component 2 on X and Y axis, respectively. The component 1 exhibits an explained variance of around 31% and the component 2 with an explained variance of 20% which ultimately produces a cumulative variance of 51% in the original only from the first two components.
4.4 Bacterial Species Classification The classification of bacterial species is performed with top 100 principal components obtained from principal component analysis. The top 100 components have a cumulative variance around 87% over the original dataset. Three tree classifier algorithms, random forest, extra trees classifier, and decision tree classifier are chosen and trained with the newly obtained feature-extracted reduced data. The process of deciding the parameters and fine-tunings is done by referring to Sunuwar et al. [9] and Li et al. [10]. Decision Tree Classifier Decision tree classifier algorithm builds a decision tree where each node describes a test that is performed on an attribute and the branch that drops down from the node represents one of the possible values of the attribute. The leaf node denotes the class label that is to be predicted. The training data is classified based on the pathway from the root node to the leaf according to the decision made at each node.
404
K. P. S. Priya et al.
Fig. 3 Bacterium species projected onto principal component 1 and principal component 2
Extra Trees Classifier Extra trees or extremely random trees classifier is a decision tree-based ensemble learning algorithm that builds several trees and the nodes are split using a random subset of attributes. The split points of the trees are chosen at random among random subsets of attributes rather than choosing the best split. These characteristics of the extra trees classifier reduce the variance during training because the algorithm does not learn from a single pathway of decisions unlike the decision tree. Random Forest Classifier Random forest classifier constructs multiple decision trees on various subset of attributes in a given data. The result of the classifier is the majority in the trees in case of classification and the average value of the trees in regression. In a random forest, the samples are chosen with replacement; hence, the bootstrap is set to true. The split points of each node are based on the best split among a random subset of attributes which introduces randomness in the algorithm, thereby reducing the risk of overfitting.
Computational Drug Discovery Using Minimal Inhibitory …
405
4.5 K-mer Analysis K-mer analysis is performed on the nucleotide assemblies of the AMR phenotype dataset by determining the k-mer count. The k-mer count is calculated by the formula count C = L−k + 1, where L refers to the total biological sequence length and k is the substring length within the biological sequence. K-mer analysis transmutes the complex genetic data into a comparatively lighter data. The processing time and the accuracy from the consolidation of k-mer analysis depend upon the length of k chosen for the process. After k-mer analysis, the vector dimension being 65,537 × 7000 with a size of around 5 GB needs high memory and processing requirements. To resolve this, SelectKBest is used for feature selection to select the best 200 features of the k-mer dataframe. This reduces the data with the most relevant k-mers to determine resistance phenotype.
4.6 Neural Network Neural network architecture is implemented in Python using the Keras deep learning framework. Keras is an open-source neural network library that supports distributed training and evaluation of deep learning models. The input data for the neural network is the k-mer counts that are computed and the output is the AMR phenotype categorical variable. The k-mer counts are larger in range and hence require normalization for efficient training. The scikit-learn StandardScaler is fitted to the k-mer input data for data standardization. The output variable being categorical, the variable should be encoded to numerical value to be fed into the neural network. The scikit-learn OneHotEncoder is used to encode the target variable. The neural network architecture is chosen by the study of works of Marouf et al. [11] and Tamiey et al. [12]. The data is then splitted into train and test datasets using sklearn in the ratio of 9:1. The neural network architecture is a sequential model with an input layer and a dense layer of 64 neurons with nonlinear activation function ReLU. The AMR phenotype data is imbalanced and certain classes are not represented enough in the dataset for the model to train effectively. To resolve this, class weights are assigned inversely proportional to the frequency of samples in the classes. This ensures more efficacious learning of the under-represented classes. The model is fitted to the training data while splitting 30% of the data for validation and it is shuffled beforehand.
406
K. P. S. Priya et al.
4.7 Susceptibility Mapping The susceptibility map summarizes the resistance of the antibiotic MIC that was calculated for each sequenced strain in three categories, i.e., “susceptible”, “intermediate”, and “resistant” taking MIC and antibiotic label in the X and Y axis, respectively. The number of genomes per antibiotic and MIC is counted and the numbers are depicted in the susceptibility map. This representation of the distribution of strains and their MIC values along with the susceptibility map describes the number of strains that are “susceptible”, “intermediate”, and “resistant”. The overall accuracy is obtained from the MIC values and the SIR terms of the MIC prediction model to the 12 antibiotics.
5 Experimental Results After performing various experiments on the test data, the experimental results are recorded and analyzed for different classification models in Table 2. Table 3 compares the accuracy, F1-score, and area under the curve for the classification models. Table 2 Accuracy of random forest classifier model F1-score (%)
Precision (%)
Recall (%)
Support
K._pneumoniae
97.1
96.6
97.6
2484
C._jejuni
96.8
96.7
96.9
2494
S._aureus
95.2
93.6
96.8
2483
B._fragilis
94.9
93.7
96.2
2505
E._hirae
94.5
95.7.
93.4
2475
S._enterica
94.1
95.1
93.2
2478
S._pneumoniae
93.8
94.3
93.3
2483
E._fergusoni
93.7
94.2
93.1
2457
S._pyogenes
93.5
93.8
93.2
2481
E._coli
93.2
93.1
93.2
2459
Table 3 Comparison of performance metrics for Classification algorithms Algorithm
Accuracy
F1-score
AUC
Random forest classifier
99.8
99.3
99.91
Extra trees classifier
99.6
98.89
99.84
Decision tree classifier
94.5
89.0
98.18
Computational Drug Discovery Using Minimal Inhibitory …
407
Fig. 4 Average accuracy of the MIC prediction model (8-mer Analysis)
The susceptibility map (see Fig. 4) analyzes the relationship between MIC values of respective antibiotics and SIR (“susceptible”, “intermediate”, and “resistant”) classes. The comprehensive accuracy acquired from a fivefold cross validation is determined by four main parameters: Total number of samples present in a Minimum Inhibitory Concentration class; balance allying the Minimum Inhibitory Concentration classes; cardinality of Minimum Inhibitory Concentration classes and length of k-mers. The model is presumed to predict at an accuracy rate of 91% and above for certain antibiotics.
6 Conclusion and Future Works The microbial world accommodates all the antimicrobial treatments and restoratives which result in the AMR crisis. This mandates a fundamental global shift in the usage of antimicrobials where both the existent and prospective antimicrobials will be saved and extended for a longer run. One such method is proposed incorporating different data-rich computer algorithms which are similar and more competent to a humanlike intellect, also able to perceive and process immense data with high degrees of precision. The model is trained with considerable features of distinct bacterium species and AMR phenotype data which results in a consistent classification of bacteria with 96.06% and the MIC of appropriate antibiotics to deter the bacterial growth with 91% accuracy. Training the model using ample data with heterogeneous features of different types of microbes such as viruses, bacteria, fungi, and protests will result in a consistent and eminent system.
408
K. P. S. Priya et al.
References 1. Taiyari H, Faiz NM, Abu J, Zakaria Z (2021) Antimicrobial minimum inhibitory concentration of Mycoplasma gallisepticum: a systematic review. J Appl Poult Res 30(2) 2. Xiao X, You Z (2015) Predicting minimum inhibitory concentration of antimicrobial peptides by the pseudo-amino acid composition and Gaussian kernel regression. In: 2015 8th International conference on biomedical engineering and informatics 3. Kromer-Edwards C, Castanheira M, Oliveira S (2020) Year, location, and species information in predicting MIC values with beta-lactamase genes. In: 2020 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1383–1390 4. Nguyen M, Long SW, McDermott PF, Olsen RJ, Olson R, Stevens RL, Tyson GH, Zhao S, Davis JJ (2019) Using machine learning to predict antimicrobial MICs and associated genomic features for nontyphoidal Salmonella. J Clin Microbiol 57(2) 5. Long Y, Wu M, Kwoh CK, Luo J, Li X (2020) Predicting human microbe–drug associations via graph convolutional network with conditional random field. Bioinformatics 36(19) 6. Barros CC (2021) Neural network-based predictions of antimicrobial resistance in Salmonella spp. using k-mers counting from whole-genome sequences. bioRxiv 7. Tavares LG, Lopes HS, Erig Lima CR (2008) A comparative study of machine learning methods for detecting promoters in bacterial DNA sequences. In: ICIC 2008: advanced intelligent computing theories and applications. With aspects of artificial intelligence, vol 5227, pp 959–996 8. Guicheteau J, Christesen SD (2016) Principal component analysis of bacteria using surfaceenhanced Raman spectroscopy. In: Proceedings SPIE 6218, chemical and biological sensing VII, vol 6218 9. Sunuwar J, A1 Azad RK (2021) A machine learning framework to predict antibiotic resistance traits and yet unknown genes underlying resistance to specific antibiotics in bacterial strains. Briefings Bioinform 22(6) 10. Li L-G, Yin X, Zhang T (2018) Tracking antibiotic resistance gene pollution using machinelearning classification. Microbiome 6 11. Marouf AM, Abu-Naser SS (2018) Predicting antibiotic susceptibility using artificial neural network. Int J Acad Pedagogical Res (IJAPR) 2(10). ISSN: 2000-004X 12. Tamiev D, Furman PE, Reuel NF (2020) Automated classification of bacterial cell subpopulations with convolutional neural networks. PLOS ONE 15(10)
Optimized CNN Model with Deep Convolutional GAN for Brain Tumor Detection Mure Vamsi Kalyan Reddy, Prithvi K. Murjani, Sujatha Rajkumar, Thomas Chen, and V. S. Ajay Chandrasekar
Abstract Brain tumors are viewed as a deadly form of cancer. Early and precise detection of brain tumors is critical to tumor cure. Magnetic resonance imaging and computed tomography are two widely used methods of examining brain tissue that is abnormal in size, location, or shape, which can aid in the early detection of tumors in the brain. More data is required in the field of artificial intelligence, particularly in medical imaging; thus, in order to find a faster way to diagnose a specific disease using AI, we first require a large set of image-accurate datasets to feed into a neural network. Convolutional Neural Networks achieve superior computational diagnostics by using well-annotated training data. Most medical image datasets, on the other hand, are fragmented and small. In this case, Generative Adversarial Networks can generate realistic or a variety of additional brain tumor data for training to fill in the gaps in the actual image distribution by following its respective objective function. Deep Convolutional Generative Adversarial Network was used in this paper to improve the accuracy of the original Convolutional Neural Network model. The Deep Convolutional Generative Adversarial Networks are used to generate new data with reference to the existing training data distribution. Using this method to generate the images, we increased the dataset size by 700 images, increasing the model’s accuracy from 97.26 to 98.85%.
M. V. K. Reddy · P. K. Murjani School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India S. Rajkumar (B) School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] T. Chen School of Science and Technology, Department of Engineering, University of London, Northampton Square London EC1V 0HB, UK V. S. A. Chandrasekar Department of Surgical Oncology, Saveetha Medical College, Thandalam, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_31
409
410
M. V. K. Reddy et al.
Keywords Brain tumor · Magnetic resonance imaging · Computed tomography · Convolutional neural network · Generative adversarial network · Deep convolutional generative adversarial network · Objective function · Data augmentation
1 Introduction A brain tumor is an abnormal growth or development of a mass of cells around or in the brain. It can be either malignant or benign in nature. A few symptoms of a brain tumor involve an onset or change in headaches, increases in frequency and intensity of headaches, strange vomiting or nausea, vision disturbances, fatigue, speech difficulties, and difficulty following simple commands. Brain tumors are primarily divided into primary and secondary. Primary brain tumors, which are less common in adults, start when the normal cells see mutations set about in their DNA. Secondary brain tumors arise when cancer in a different part of the body spreads to the brain. Primary brain tumors do not have a particular known cause; however, risk factors include exposure to ionizing radiation and a family history of brain tumors. A common method to diagnose a brain tumor is through magnetic resonance imaging (MRI). An MRI is an imaging test that aids in visualizing soft tissue and ensures the patient is not exposed to radiation in any manner. Despite its advantages, MRIs take roughly 30–45 min to complete and are also an expensive affair. This makes MRI data hard to collect and implement for various effective methods of diagnosis. With developments in artificial intelligence and improved frameworks for data generation and augmentation, it is now possible to make use of the scarcely available data by manipulating the data itself or even generating more. Generative Adversarial Network (GAN) is a method used in generating data based on existing samples. It involves developing a generative model using deep learning techniques. Thus, the implementation of GANs aid in increasing the volume of data, which otherwise was previously limited in quantity. Brain tumors involve requiring an MRI for diagnosis by professionals. This study aims at assessing the impact of artificially generating new data using GANs, particularly Deep Convolutional Generative Adversarial Networks (DCGANs) to supplement the dataset on an existing CNN (Convolutional Neural Network) model by measuring metrics with a basic CNN model and the CNN model post data supplementation. The dataset used consists of MRI scans with and without brain tumor.
Optimized CNN Model with Deep Convolutional GAN for Brain Tumor …
411
2 Related Work Lot of research has been conducted to improve GANs efficiency and develop improvements with CNNs implementation. Few studies also show the combination of metaheuristics and machine learning, which provides promising results in this novel field for further exploration. Tseng et al. [1] conducted a study to ensure proper regularization of GAN models given that limited data poses problems while training. A more powerful method based on a link between regularized loss and fdivergence improved the generalization and stabilization of the learning dynamics in the model with limited data and also complemented recent augmentation techniques available. One of the challenges we notice with GANs is the image quality we obtain post-training sometimes. Mahapatra et al. [2] generates high-resolution images from low-resolution images by defining each pixel’s significance. Results show that these higher quality images have quality very similar to the original images and perform better than methods where pixel weight is not taken into account. Health care being a vast field has wide use cases of various deep learning techniques that help solve several problems. Pandey and Janghel [3] conducted a study that posed various use cases, the use of techniques like Restricted Boltzmann Machine, Generative Adversarial Networks, Convolutional Neural Networks, and a lot more in the healthcare domain. The advantages and disadvantages of each technique were provided as insights. Generative Adversarial Network (GAN) implementations have picked up rapidly in medical imaging. The ability of a GAN to aid in image synthesis, provide data augmentation, image translation, and the like has made it a popular option in the field of medical research. Xin et al. [4] presented an article which threw the light on the growth of GANs in medical imaging and the use cases along with its variations, while plenty of research has gone into detecting and classifying brain tumors. One such research paper aims at understanding brain tumor segmentation using various methods and then delves into the application of deep learning algorithms, which have recently provided us with state-of-the-art results. Isin et al. [5] make assessments of existing techniques, and future approaches are discussed as well in order to provide further scope for research. Furthermore, the study suggests that generated data aids in model performance but decreases detection time. Osokin et al. [6] in this study looked to synthesize cells by fluorescence microscopy. Since cells in comparison with natural images tend to have more geometric global structure and are simpler, they can facilitate image generation. However, the spatial pattern of different fluorescent proteins shows various biological functions that synthesized images need to capture. GANs are used for this process by interpolating over the latent space and mimicking the changes known in protein localization that are seen over time with the cell cycle which can facilitate them to predict temporal evolution from images that are static. Preventing overfitting is major concern while building a CNN model. Santos and Papa [7] performed a study which involves three areas, data augmentation, internal changes which talk about modifying the feature maps that are generated by the kernels
412
M. V. K. Reddy et al.
or neural network, and label which involves transforming the labels of the given input. Latent space distribution is important for the generation of sharp samples when training data is limited. Here, Fontanini et al. [8] applied meta-learning algorithm to the discriminator in the generator-discriminator GAN and onto a mapping network in order to optimize the arbitrary noise to guide the generator network into producing images belonging to specific classes. This can also aid in producing samples of previously obscured classes by optimizing the latent space and not changing any other parameter in the generator network. Generating realistic synthetic data is important particularly in the case of healthcare applications. Arvanitis et al. [9] address the barriers that come into play while accessing some datasets due to privacy concerns can be addressed by ensuring that the synthetic data generated is applicable in terms of its similarity to the existing data, while also ensuring it is not too similar to it. Reliance of just CNNs has cause for exploration as Srinivas and Rao [10] presented a hybrid model consisting of a Convolutional Neural Network (CNN) and a K-Nearest Neighbors (KNNs) that is used in detecting brain tumors in Magnetic Resonance Images. The CNN plays the role of feature extractor, and the KNN predicts the classes. This experiment provided better metrics than when compared to other models like CNN only, CNN-SVM, CNN-DISCR, and various other hybrids with CNNs. Segmentation of a portion of the image is a big problem in medical analysis of an image. Ratan et al. [11] use watershed in MATLAB environment, and the segmentation occurs once the desired parameters get decided. Evaluations depict this method as effective. Vanilla GAN and Deep Convolutional GAN (DCGAN) are two types of GAN architectures that are used to create images. Alrashedy et al. [12] in this study introduce the BrainGAN framework, which uses GAN architectures to create and categorize brain MRI images. Image-to-image translation involves learning the relation between input image and output by involving a training set of image pairs that are aligned. Zhu et al. [13] propose a method to do the same, but without the image pairs mainly involving an adversarial loss. Their method proves superior to other existing methods. Monitoring the progression of a disease based on annotated data is a challenging task. Schlegl et al. [14] approach this by using AnoGAN, that is an unsupervised method to analyze anomalies in a given image by training the model to understand normal anatomical variability and a score mechanism. Results show promise on testing with retina-based images. Despite the remarkable successes achieved so far, applying GANs to real-world problems still poses significant challenges. Mescheder et al. [15] combine variational autoencoders (VAEs) and generative adversarial networks (GANs) properly by using a supplementary discriminative network that reproduces the maximum likelihood problem as a 2-player game. This method shows to be easy to use and also betters existing similar approaches. Divya et al. [16] in this study provide an overview of the DCGAN architecture and its application as a synthetic data generator and act as a binary classifier that detects real or fake images using the brain tumor magnetic resonance imaging (MRI) dataset.
Optimized CNN Model with Deep Convolutional GAN for Brain Tumor …
413
3 Proposed Methodology Creating a supplementary dataset of MRIs entails data generation. Deep Convolutional Generative Adversarial Networks (DCGANs) are used in this case to teach the model to generate data based on the same distribution of data that was provided to it. The DCGAN model is used to generate data based on the needs of the individual. Once the data for both cases—MRIs with and without brain tumors—is generated, it is shuffled with the original dataset to create a larger and more comprehensive set of MRIs. With limited data at the start of the base CNN model, generating more data will now add volume to the existing dataset being used and reduce the reliance on the limited data. Rather than relying on volume constraints and weak correlations in data obtained with a CNN model without using DCGANs, focusing on providing the CNN model with more artificially generated data using DCGANs may improve accuracy and help track various other important metrics used in the prediction of brain tumors from MRIs. Convolutional Neural Networks (CNNs) are built before and after appending the generated data, and the results are compared to understand the effects of using DCGANs as a method to artificially generate more data (Fig. 1).
Fig. 1 Experiment methodology
414
M. V. K. Reddy et al.
3.1 Data and Preprocessing With the sole aim of classifying between MRIs without (Fig. 2) and with brain tumors (Fig. 3), and not delving too deep into the specifics of the type of brain tumor information from the MRI, the base CNN model, and GAN are fed with 700 images each. This becomes a case of binary classification, with the model having 700 images of each type. Images were resized to 256 × 256 pixels to preserve clarity which is essential for GAN training and CNN modeling. Pixel values are normalized to stand between 0 and 1 to reduce the computation needed on the network. Table 1 shows the number of images used for training and validation with respect to base CNN, and CNN with GAN. Fig. 2 MRI without brain tumor
Fig. 3 MRI with brain tumor
Optimized CNN Model with Deep Convolutional GAN for Brain Tumor …
415
Table 1 Training and validation datasets S. No.
Model
Number of images for training
Number of images for validation
1
Base CNN
1570
392
2
CNN with GAN
2082
520
Fig. 4 CNN architecture
3.2 CNN Model for Brain Tumor Classification 3.2.1
CNN Architecture
A Convolutional Neural Network (CNN) is a deep learning technique (Fig. 4) designed to cater to image-oriented modeling. By design, CNNs are made to work with pixel data and hence come into play for image segmentation, classification, recognition, and the like. The problem set involves building two CNNs—one for recognizing brain tumor scans with the originally present dataset itself and the other for recognizing brain tumor images with the original dataset along with the generated MRIs from the DCGAN. In both cases, the model uses 3 sub-layers of Convolution and Max pooling along with Flatten, Dropout, and Dense layers.
3.2.2
CNN Model Specifications
Max pooling, known as maximum pooling, is a pooling technique that determines the largest or maximum value to each of the feature map patches. The pooling layer then creates the new pooled feature map sets where the same number of features are set by working on each feature map individually. Likewise, flattening transforms the
416
M. V. K. Reddy et al.
data into a single-dimensional array that we use in the subsequent layer to smoothen the output of the convolution layer which leads to build what we know as a singlelong feature vector. We also use a technique called Dropout. It is a technique for avoiding overfitting models which we generate using CNN architecture. Every time the learning phase is updated, the dropout becomes active by setting the output edge of the hidden unit to 0. The last layer of the Dense has the activation function “Sigmoid”, this is since the model involves distinguishing between MRIs with or without brain tumor, and it becomes a binary classification problem with the ‘Sigmoid’ activation function being used in the last layer. The images are processed to 256 × 256 pixels before entering the network and use only the gray channel. The activation function in use in the layers is ‘ReLu’. f (x) =
0 for x < 0 , range(0, + ∞) x for x ≥ 0
We will use two depth layers here. This is because they are individually convoluted in one dimension to give individual results, reducing processing overhead. Again, the main activation functions used are ReLu and Sigmoid. The convolutional layer of feature extraction requires a linear activation function. The ReLu function serves this purpose. When using ReLu with a CNN, the CNN can be used as an activation function for the filter maps itself, followed by a layer of pooling. The Sigmoid function provides the probability distribution for the most probable results. The maximum value of this distribution determines the output of the neural network. f (x) =
1 , range(0, 1) 1 + e−x
When compiling the model (Fig. 5) generated, the usage of the categorical crossentropy is done to monitor losses, with Adam as an optimizer. The dataset is split into training, testing, and validation spits. The model is trained in the training and validation part while being tested on the test dataset. The Early Stopping function is also used to check the validation dataset, to monitor for loss of validation. Early stopping is a technique that allows us to specify the number of training epochs and its stops training when the holdout validation dataset no longer improves the performance of our model. The output of the model is of 2 categorical types—with the presence of Brain Tumor and no presence of Brain Tumor.
3.3 Deep Convolutional Generative Adversarial Network for Artificial Image Generation Deep Convolutional Neural Adversarial Networks (DCGANs) are a generative modeling method to generate images based on the dataset fed to them. By design, GANs involve a discriminator and a generator. A generator generates images, while
Optimized CNN Model with Deep Convolutional GAN for Brain Tumor …
417
Fig. 5 Representation of the structure of the proposed CNN model used for brain tumor classification
418
M. V. K. Reddy et al.
Fig. 6 Depiction of the architecture of a GAN
the discriminator returns feedback with a decision regarding which is a generated image and which is real (Fig. 6).
3.3.1
DCGAN Architecture
The discriminator in a DCGAN (Fig. 7) makes use of convolutional layers with strided convolutions in order to down-sample images passed to it at every layer. Essentially, the discriminator plays the role of a good classifier. The generator in a DCGAN (Fig. 8) makes use of convolutional layers with fractionally strided convolutions in order to up-sample convolution the give images passed to it at every level. Essentially, it involves creating an image back into the same dimensions as the original images, with the noise factor working its way through every layer. The discriminator is made to try and achieve complete accuracy while deciding the legitimacy between real and generated images, whereas the generator is made to try and fool the discriminator (maximize discriminator loss) by generating more realistic images based on the feedback it receives.
Fig. 7 Discriminator down sampling depiction
Optimized CNN Model with Deep Convolutional GAN for Brain Tumor …
419
Fig. 8 Generator down sampling depiction
3.3.2
GAN Objective Function Optimization
The objective function for the discriminator (Fig. 9) is built in order to ensure the discriminator is rewarded for the right decisions it makes and optimized for the incorrect decisions it makes. Here, D(x) is the discriminator’s estimate of the probability that real data instance x is real, E x is the expected value over all real data instances, G(z) is the generator’s output when given noise z, D(G(z)) is the discriminator’s estimate of the probability that a fake instance is real, and E z is the expected value over all random inputs to the generator (in effect, the expected value over all generated fake instances G(z)). The formula derives from the cross-entropy between the real and generated distributions. The objective function for the generator (Fig. 10) is built in order to ensure the generator is optimized to fool the generator. Apart from the up sampling and down sampling nature of the generator and discriminator, the architectural guidelines of DCGANs were followed as: batch norm was used in discriminator and generator, fully connected layers were withdrawn for deeper architectures, and ReLu activation was used in all layers besides the output, where Tanh is used and Leaky ReLu is used in every layer of the discriminator. The study runs a DCGAN model on the original MRI data for 200 epochs and begins to show drastic improvements around the 60th epoch and onwards. With learning rates of both le—04 for the generator and discriminator, the model is used to generate an additional 700 images of both states of the MRI data.
Fig. 9 Objective function to be updated for the discriminator
Fig. 10 Objective function to be updated for the generator
420
M. V. K. Reddy et al.
Once generated, the data is loaded back into the CNN model along with the original data and shuffled for good measure. A new CNN model is built, and appropriate analysis and comparisons are made.
4 Results and Discussions With the DCGAN being generating images, the sharp improvement in clarity is seen as we progress over sequentially through the 10th (Fig. 11), 70th (Fig. 12), and 181st (Fig. 13). The training process with the DCGAN was relatively straightforward with the MRIs without the brain tumor since they did not have as much varying information as the MRIs with brain tumors did, which is evident in the progress over the 100th (Fig. 14) and 175th (Fig. 15) epochs. The base CNN model, without the MRI data generated by the DCGAN, showed a training accuracy of 97.26% and a validation accuracy of 96.88% (Fig. 16), while a training loss of 0.0727 and validation loss of 0.1818 were reported (Fig. 17). The refined CNN model with MRI generated by the DCGAN showed a training accuracy of 98.85% and a validation accuracy of 96.73% (Fig. 18), while a training loss of
Fig. 11 Brain tumor with MRI at 10th epoch
Fig. 12 Brain tumor with MRI at 70th epoch
Optimized CNN Model with Deep Convolutional GAN for Brain Tumor …
421
Fig. 13 Brain tumor with MRI at 181st epoch
Fig. 14 MRI without brain tumor at 100th epoch
Fig. 15 MRI without brain tumor at 175th epoch
0.027 and a validation loss of 0.2 was reported (Fig. 19). Table 2 finally illustrates the accuracy of the model with respect to base CNN and CNN with GAN model types. When compared to (Fig. 16) and (Fig. 17), the base CNN model performs well; however, using GANs as a source of supplementary data to the same model aids in its refinement, as an increase in accuracy is observed. Using GANs to generate data from existing data allows the model to handle more well-rounded edge cases by providing it with more data to learn from.
422
M. V. K. Reddy et al.
Fig. 16 Brain tumor classification accuracy graph without GAN data using Python3
Fig. 17 Brain tumor classification loss graph without GAN data using Python3
5 Conclusion The main goal of the research is accomplished with an increase in accuracy by building a model with the use of additional data to optimise its ability to classify between MRIs with and without brain tumors using images generated artificially. The conducted research has shown that GANs as a generative source of data can aid in improving a model by providing it with more samples to learn from. This becomes essential as healthcare datasets are scarce and expensive in nature. If tasks such as classification can be improved and used in aid with the physician’s analysis,
Optimized CNN Model with Deep Convolutional GAN for Brain Tumor …
423
Fig. 18 Brain tumor classification accuracy graph with GAN data using Python3
Fig. 19 Brain tumor classification loss graph with GAN data using Python3
Table 2 Accuracy comparison
S. No.
Model type
Model accuracy (%)
1
Base CNN
97.26
2
CNN with GAN
98.85
this would save a lot of resources and mishaps in areas where accountability plays a vital role—especially health care. This implementation can be extended to more than just the healthcare domain. It is, however, not yet possible to predict the density accuracy of the evaluated model, and we can say that this image is dense enough to
424
M. V. K. Reddy et al.
proceed with. GANs are sophisticated mechanisms for data generation, but unstable training and unsupervised learning methods make them more difficult to train and generate results. Future steps would include implementing more advanced versions of GANs to improve image resolution which would reduce reliance on existing data for legitimacy, adopting algorithms that work better at generalizing features with limited data—particularly image data which becomes hard to obtain and use improved architectures of CNNs like R-CNN, Fast R-CNN, and so on to improve image detection capabilities. This would provide a foundation to build upon to develop the approach for GAN-CNN usage.
References 1. Tseng HY, Jiang L, Liu C, Yang MH, Yang W (2021) Regularizing generative adversarial networks under limited data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7921–7931 2. Mahapatra D, Bozorgtabar B, Hewavitharanage S, Garnavi R (2017) Image super resolution using generative adversarial networks and local saliency maps for retinal image analysis. In: International conference on medical image computing and computer-assisted intervention (MICCAI), vol 10435. Springer, Cham 3. Pandey SK, Janghel RR (2019) Recent deep learning techniques challenges and its applications for medical healthcare system: a review. Neural Process Lett 50:1907–1935 4. Xin Y, Walia E, Babyn P (2019) Generative adversarial network in medical imaging: a review. Med Image Anal 58:101552. ISSN: 1361-8415 5. I¸sin A, Direko˘glu C, Sah ¸ M (2016) Review of MRI-based brain tumor image segmentation using deep learning methods. Procedia Comput Sci 102:317–324. ISSN: 1877-0509 6. Osokin A, Chessel A, Carazo Salas RE, Vaggi F (2017) GANs for biological image synthesis. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2252–2261 7. dos Santos CFG, Papa JP (2022) Avoiding overfitting: a survey on regularization methods for convolutional neural networks. Association for Computing Machinery 54:1–25 8. Fontanini T, Praticò C, Prati A (2022) Towards latent space optimization of GANs Using metalearning. In: Sclaroff S, Distante C, Leo M, Farinella GM, Tombari F (eds) Image analysis and processing—ICIAP 2022. ICIAP 2022. Lecture notes in computer science, vol 13231. Springer, Cham 9. Arvanitis TN, White S, Harrison S, Chaplin R, Despotou G (2022) A method for machine learning generation of realistic synthetic datasets for validating healthcare applications. Health Inform J 10. Srinivas B, Rao GS (2019) A hybrid CNN-KNN model for MRI brain tumor classification. Int J Recent Technol Eng (IJRTE) 8(2). ISSN: 2277-3878 11. Ratan R, Sharma S, Sharma SK (2009) Brain tumor detection based on multi-parameter MRI image analysis. ICGST-GVIP J 9 12. Alrashedy HH, Almansour AF, Ibrahim DM, Hammoudeh MA (2022) BrainGAN: brain MRI image generation and classification framework using GAN architectures and CNN models. Sensors (Basel) 22(11):4297 13. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycleconsistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2242–2251 14. Schlegl T, Seebock P, Waldstein SM, Schmidt-Erfurth U, Langs G (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Proceedings of the international conference on information processing in medical imaging. Lecture notes in computer science, vol 10265. Springer, Cham
Optimized CNN Model with Deep Convolutional GAN for Brain Tumor …
425
15. Mescheder L, Nowozin S, Geiger A (2017) Adversarial variational Bayes: unifying variational autoencoders and generative adversarial networks. In: International conference on machine learning, arXiv 16. Divya S, Suresh LP, John A (2022) Medical MR image synthesis using DCGAN. In: First international conference on electrical, electronics, information and communication technologies (ICEEICT). IEEE, pp 01–04
Named Entity Recognition: A Review for Key Information Extraction P. Nandini and Bhat Geetalaxmi Jairam
Abstract Named structure identification is the venture to perceive hints of the flexible indicator from textual context attached to prearranged denotation types which include the company, person, location, and many others. Named structure identification usually gives out the inspiration to lots of natural dialects packages like answering questions, summarization of text, and translation of devices. This paper makes an immense survey of variously named entity recognizers in contrast with some of observation which gives statements on previous work from the belief of interest currently supported dialects and adopted documentary genre concern. In addition, some of the approaches are suggested for highlighting the named structure recognition and its classification (NERC) based on the machine learning concepts. Keywords Information extraction · Named entity recognizers · Various classifications
1 Introduction The crucial term named entity (NE) came into picture at the 6th message judgment convention [1]. Named structure identification is a method of identifying named materials like place, individual time, affiliation, medical system, and so on in textual content. Named entity recognizer (NER) early framework is based on high expectation requirements, orthographic highlights, ontologies, and vocabularies. Named structure identification is one of the essential duties of the information extraction (IE) system which is used to remove descriptive structure. Information extraction collectively with natural language plays an important role in dialects modeling and P. Nandini (B) Department of Computer Science and Engineering, The National Institute of Engineering, Mysore, India e-mail: [email protected] B. G. Jairam Department of Information Science and Engineering, The National Institute of Engineering, Mysore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_32
427
428
P. Nandini and B. G. Jairam
Fig. 1 General pipeline construction for NER
dependent information extraction using phonetic, syntactic morphological, and interpretation analysis of dialects. Morphological-rich dialects like English and Russian make information extraction approaches less complicated [2]. Information extraction is quite tough for morphologically negative dialects as these languages want more attempts for their policies to take out nouns and because of non-availability of complete dictionaries [3]. Answering query, translation system, automated textual content, text mining, records, summarization, knowledgebase, opinion mining are some of the fundamental packages of named structure identification [4]. So, the better accuracy and efficiency of named entity systems may bring changed challenges to that structure. In such a scenario, this review explores that challenge and fixes the latest developments. The main goal of language processing is to expand fashion to procedure semantic obligations such as hearing, studying, speaking, and writing. Named structure recognition is an important task of natural language processing (NLP), and it needs recognition of the right name from an unshaped record and categorizes it into a group of prearranged classes. The main objective of NERC is to remove the name sequentially which is helpful to deal with several issues like device translations records, removal, retrieval of record, answering questions, automated summarization of text, and many others. A general architecture of NER is illustrated in Fig. 1. This includes mainly entity linking, a primary NLP task in named structured identification for information extraction. Entity linking assigns a unique identity to each entity in a given document identified by named entity recognition. It links each entity with its corresponding description in a knowledge base. Candidate selection is made among local context-based and collective NER.
2 Literature Review In [5], authors have suggested methodology based on jargon move, and this approach is implemented on a group of jargons where 13% character name, 42% place name, and 17% associations are covered. In these studies, authors provide a standard
Named Entity Recognition: A Review for Key Information Extraction
429
framework, and this approach is confronted with identification of recent people, organization, and place name. In [6], a structure is designed to differentiate the phrases using lowercase and uppercase letters, which assumes that the named entities start with an uppercase alphabet. In [7], the authors mentioned grave neural communities which primarily depend on prototypes for identifying named structures. To carry out named structure identification, many techniques have been suggested over the years. Previous structures commenced through enforcing dictionaries, counting on a list of named entities [8]. Later, rule-based strategies had been delivered to supplement dictionary-based method remedy issues consisting of right name popularity [9]. In [10], the author discussed the multi-modal NER device for noisy-generated information like snap chat, tweets, embeddings phrase, embeddings person, and video features mixed the modality interest. In [11], the authors determined the verbal characteristics that had been frequently ignored in NER neural structures. They suggested a substitute verbal illustration that is educated may be introduced to NER neural machines. The verbal illustration is evaluated for every phrase with 120 dimensional vectors, in which every detail encodes the likeness of the text with structure. In [12], the authors suggested a new dialects representation version known as bidirectional encoder representations from transformers (BERT) two-way encoder representation for transformers.
3 NER Techniques Named entity identification has a plenty of rich literature and a number of entity recognizers developed over past decades. Mainly named entity identifiers are divided into three categories: rule based, statistical, and hybrid.
3.1 Named Entity Identifier Rule Based In [13], authors added an approach known as Phonetic equaling for identifying the entities for Indian dialect. Such method doesn’t follow statistics function as a substitute it plays equaling among strings of various dialects which are English and Hindi using the basic idea of comparable detectable assets. In [14], the authors refined an entity identification structure for Punjabi dialects text abstraction. Authors evolved diverse classification lists which include listing prefix, listing suffix, center list name, ultimate list name, and list right call using generation of frequency listing from collection of Punjabi. Such lists can be used as verbal assets for imposing circumstances, primarily building algorithms having policies like rule suffix, rule prefix, rule call center, closing rule name, and genuine rule name. In [15], the author proposed a named structure identification algorithm for reports of Malay which depend on rule
430
P. Nandini and B. G. Jairam
Fig. 2 Code composer processing flow
part-of-speech (POS) tagging technique and contextual characteristics regulations. In [16], the author has advanced named entity identification machine translation systems for Urdu dialects. Here, numerous regulations and classifications appear to fix up to thirteen named structure tags. The precision of such machines stated properly as to compare device studying totally based on the named entity system for Urdu. Some processing systems work on code composer which is a machine translations system that converts pseudo-code assertions into language of programming. Code composer processing flow consists of many components as depicted in Fig. 2.
3.2 Named Structure Identifier Based on ML (Machine Learning) In [17], the authors have identified four predominant named structure labels which includes area, person, company and divers the usage of guide vector system in Bengali dialects. Authors tried to convert name structure identified Bengali information collections [18] into detailed description format after which implemented function groups which incorporate factor word characteristics, phrase appendix characteristics, phrase appendix function, function for POS, function for digits, named structure label of preceding dynamic feature (word) initial phrase, different dictionary list. In [19], authors have completed their observations on Arabic dialects and detect the use of numerous morphological, contextual, and lexical functions impact the correctness of named structure identification challenge for one-of-a-kind system getting to know algorithms including conditional random fields (CRFs), ML, and support vector machines (SVMs). In [20], the author proposed SVM algorithm which is totally based on the kernel composite function which is the aggregate of affiliation feature class and ranking phrase kernel clustering function [21] for biomedical and Hindi named structure reputation project. The creators observed three named structures such as individual, area, and company of Hindi record and five named entity structures of cell type, cell line from statistics of biomedical.
Named Entity Recognition: A Review for Key Information Extraction
431
In [22], the authors delivered a named structure identification system with 6680 English terms (one twenty five files) chosen from a tree bank collection in Natural Language Toolkit (NLTK) and also used the hidden Markov model (HMM) for its identification. General eight named structure together with week, country, organization, character, magazine month, computer (non-public), and vicinity are diagnosed. The rightness is calculated thru F-computation that is 74% and higher than 70% of precision for names of people. In [23], authors recognized the structure of Nepali textual context and the use of support vector machine (SVM). Diverse function has chosen to hit up five named structures along with man or woman, place, company, mixed, and different. The machine is analyzed with the use of inherent overall production computation (accuracy, don’t forget, and f-rating). In [24], the author discussed an infused margin at ease set of rules through margin-infused relaxed algorithm (MIRA) to remove named structure of Bengali dialects. The author used different languages with structured and unbiased functions which are at the lowest with respect to the success of the algorithm. Overall execution of MIRA is observed to be the quality prototype because of its greater expansion approach.
3.3 Named Structure Hybrid Identifier In [25], authors brought a named structure hybrid reputation machine for five South Asian dialects, i.e., Urdu, Hindi, Bengali, Telugu, and Oriya. Authors showed that this hybrid technique reflects better consequences when compared with the statistically handiest approach. In [26], the author brought an automatic named structure identification system for Bengali dialects. The authors also observed that the three steps proposed a technique for named structure recognition. It uses the named structure dictionary, order primarily giving the consequences identical like the manual assessment. In [27], authors have proposed a classifier entity based approach to get unique arrangements that are allowed to cast vote for each of the output classes in keeping with their responsibility of each class. For categorization, a multitasking optimization approach called archived multitasking simulated annealing (AMOSA) was used which increased the performance of the system. The observations are carried out for three dialects such as Telugu, Bengali, and Hindi. In Language-based machine translation technique either grammar oriented or lexeme oriented translations are obtained by depending on the principle of lexical equality where units of words or phrases translations are done. In grammar approach, translations are for structural attributes of the test and are limited to intra-sentential structures. The standard knowledge-based machine translation is illustrated in Fig. 3.
432
P. Nandini and B. G. Jairam
Fig. 3 Machine translations system knowledge based
3.4 Comparison of Different NER Techniques Table 1 makes comparative study on different approaches of named structure identification. Also, the results are presented in terms of F-scale, accuracy, recall, precision, and evoke which are concluded by various authors in different languages.
3.5 The Advantages and Disadvantages of Query Expansion (QE) NLP adds advantages to ask a query about any subjects and give direct fast response within seconds and also offers exact answers to any query. It avoids unwanted and unnecessary information. It also helps computer systems to communicate with humans for any language and is very time efficient. It improves the efficiency and accuracy of documentation and recognizes the information for a huge database. In terms of disadvantages, it doesn’t show context and is unpredictable in nature. The requirement of keystrokes is large and unable to adapt to the new domain with limited functions. In addition, the advantages and disadvantages of QE techniques discussed in Table 2. Query expansion is a common preprocessing step in the field of information retrieval, with the goal of increasing the hit rate of retrieving relevant documents from a database for a given query. Query expansion improves efficiency by supplementing the query with relevant keywords.
Named Entity Recognition: A Review for Key Information Extraction
433
Table 1 Comparison of different NER techniques Category
Title
Named structure identification-Rule based
Name structure Knowledge base identification rely approach on control-based technique
Named structure identifier based on ML
Methodology
Metrics
Results
Knowledge acquisition, F-score, precision
Accuracy = 85%, F-rate = 88.46%, Evoke = 93.44%
Named structure control-based identification for medical and offense documents
Rule-based approach Knowledge neural network as acquisition well as rule-based and lexical technique
Accuracy = 84%, F-rate = 88.47%, Evoke = 94.43%
Rule-based methodology for Kannada named recognition
Rule based
Recall, precision, F-rate, etc.
Precision = 86%, Recall = 90%, F-measure = 87.95%
Recall, precision, F-rate, etc.
Precision: 63.0%, Recall: 57.3%, F1: 60.0%, F2: 58.3%
Mixed dictionary Dictionary-based technique for approach identification over electronics medical data for NER Named structure identification approach for micro-text in service for social networking like Twitter system with utilization
Online NER approach F-score
Named structure Support vector identification machine using machine teaching approach
F-score
F-score: 90.3%
Bengali NER Accuracy = 91.66%, F-rate = 91.65%, Evoke = 91.66% Hindi NER Accuracy = 90.22%, F-rate = 89.81%, Evoke = 89.41% (continued)
434
P. Nandini and B. G. Jairam
Table 1 (continued) Category
Named structure hybrid identifier
Title
Metrics
Results
Kannada named MNB classifier entity recognition and classification (NERC) based on a multinomial Naïve Bayes (MNB) filter
Methodology
F-score
F-score = 81%
Kannada named recognition using the gazette list with conditional random fields
Conditional Random Fields
Recall, precision, F-rate etc.
Precision = 96.23%, Recall = 87.84%, F-measure = 91.84%
Named structure identification using hybrid approach for South Asian dialects
Hybrid model approach
F-score
Hybrid prototype model using HMM proves to give better accuracy than CRF verbal with F-rate of 48.84%, 44.75%, 46.84%, 46.58%, 39.77% using Urdu, Hindi, Oriya, Bengali, Telugu
F-score
F-score = 87.43%
Named structure Extraction within text identification mining approach using hybrid prototype for unshaped medical word taken from ICSSE
4 Kannada Languages Challenges and Issues • The most formal language of Karnataka is Kannada, which is a dialect with wealthy agglutination and diagnosis like other Indian dialects while nouns of Kannada language are greater in mass with prefixes which made it quite tough to discover the source nouns.
Named Entity Recognition: A Review for Key Information Extraction
435
Table 2 QE techniques for semantics with its advantages and disadvantages QE techniques
Advantages
Disadvantages
Linguistic techniques
Natural languages are used for effective processing of any query
Lack of semantic relationships
Supply term that is alike to query form
Lack of domain query
Contextual knowledge provided similar to user’s query
Comprehensive ontology construction is difficult
Search query is effective
Preferred domain is specific ontology
Result is user-centric
Need identical matching between query and ontology
Ontology-based techniques
Precision rate is high Mixed mode techniques
Semantic and linguistic techniques strength will be exploited
Increase in expansion terms
Disambiguate word search query and sense easy
Increase and complex number of operations
• Usually, a couple of words joined in a sentence which is of the form blended phrases is hard to separate orthographically, and it is difficult to separate negation phrases from the nouns. • Furthermore, individual words in Kannada are written in a lot of phonetic bureaucracy which is hard to find. • Very less POS tagger NERs. • Kannada dialects are far script of Brahmi with excessive phonetic traits.
5 Conclusions In this paper, different NER categories are considered, and results in terms of F1score, recall, and precision are compared. Based on the observation made especially in the field of named structure identification brings out an idea in removal of named structure from unshaped records. This paper conducts a vast survey in the associated vicinity of natural dialects garbling. The approaches were examined and revealed that conditional random fields with F-measure of about 91.84% was found to be best suited to NERC for information extraction in Kannada language in order to achieve appropriate accuracy and optimization. In addition, this paper includes the successful features of query expansion techniques in which QE is one of the prominent preprocessing steps used in the field of information extraction.
436
P. Nandini and B. G. Jairam
Future Scope Named structure identification is a class of natural dialect garbling which is crucial, and a lot of tough problems need to be examined. In these circumstances, this paper finds out that less work is executed in named structure identification of Kannada dialects. One of the major challenges is overcoming the lack of a proper exhaustive keyword list for lexicon spoken in the different districts of the southern part of India. Another was the absence of an easily available Kannada POS tagger and a translation tool with high accuracy. An application can be created by employing classification methods on the keyword database to allow the intelligent, data-driven classification of user opinions. Further, this application can be extended to other local widely spoken native Indian languages like Tamil, Telugu, and Malayalam since the semantics and basic structure of phrases are similar among these languages.
References 1. Guo J, Xu G, Cheng X, Li H (2009) Named entity recognition in query. In: SIGIR, pp 267–274 2. Abdallah ZS, Carman M, Haffari G (2017) Multi-domain evaluation framework for named entity recognition tools. Comput Speech Lang 43:34–55 3. Sazali SS, Rahman NA, Bakar ZA (2016) Information extraction: evaluating named entity recognition from classical Malay documents. In: 2016 Third international conference on information retrieval and knowledge management (CAMP), pp 48–53 4. Goyal A, Gupta V, Kumar M (2018) Recent named entity recognition and classification techniques: a systematic review. Comput Sci Rev 29:21–43 5. Shaalan K, Raza H (2008) Arabic named entity recognition from diverse text types. In: Advances in natural language processing, Springer Berlin Heidelberg, pp 440–451 6. Gupta V, Lehal GS (2011) Named entity recognition for Punjabi language text summarization. Int J Comput Appl 33(3):28–32 7. Singh U, Goyal V (2012) and Lehal. G. S, Named entity recognition system for Urdu, In COLING, pp 2507–2518 8. Alfred R, Leong LC, On CK, Anthony PP (2014) Malay named entity recognition based on rule-based approach. Int. J. Mach. Learn. Comput. 4(3):300 9. Rahem KR, Omar N (2015) Rule-based named entity recognition for drug-related crime news documents. J Theor Appl Inf Technol 77(2) 10. Ekbal A, Bandyopadhyay S (2008) Bengali named entity recognition using support vector machine. In: Proceedings of the IJCNLP-08 workshop on NER for South and South East Asian languages, pp 51–58 11. Ekbal A, Bandyopadhyay S (2008) A web based Bengali news corpus for named entity recognition. J Lang Res Eval 42(2):173–182 12. Goyal A (2008) Named entity recognition for South Asian languages. In: Proceedings of IJCNLP-08 workshop on NER for South and Sound East Asian languages, pp 89–96 13. Brown PPF, Pietra VJD, DeSouzaPP V, Lai JC, Mercer RL (1992) Class-based ngram models of natural language. J. Comput. Ling. 18(4):467–479 14. Saha SK, Mitra PP, Sarkar S (2012) A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition. Knowl Based Syst 27:322–332 15. Ekbal A, Saha S, Singh D (2012) Active machine leaning technique for named entity recognition. In: Proceedings of international conference on advances in computing, communications and informatics (ICACCI), ACM, pp 180–186
Named Entity Recognition: A Review for Key Information Extraction
437
16. Liu X, Zhou M (2013) Two-stage NER for tweets with clustering. Inf Process Manage 49(1):264–273 17. Keretna S, Lim CPP, Creighton D, Shaban KB (2015) Enhancing medical named entity recognition with an extended segment representation technique. Comput Methods Programs Biomed 119(2):88–100 18. Konkol M, Konopík M (2015) Segment representations in named entity recognition. In: International conference on text, speech, and dialogue, Springer International Publishing, pp 61–70 19. Bhasuran B, Murugesan G, Abdulkadhar S, Natarajan J (2016) Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases. J Biomed Inf 64:1–9 20. Adak C, Chaudhuri BB, Blumenstein M (2016) Named entity recognition from unstructured handwritten document images, In: Document analysis systems (DAS), 12th IAPR workshop on, IEEE, pp 375–380 21. Srikanth PP, Murthy KN (2008) Named entity recognition for Telugu. In: Proceedings of IJCNLP-08 workshop on NER for South and Sound East Asian languages, pp 41–50 22. Guanming Z, Chuang Z, Bo X, Zhiqing L (2009) Crfs-based Chinese named entity recognition with improved tag set. In: 2009 WRI World congress on computer science and information engineering 23. Ekbal A, Saha S (2011) A multiobjective simulated annealing approach for classifier ensemble: named entity recognition in Indian languages as case studies. Exp Syst Appl 38(12):14760– 14772 24. Etkinson J, Bull V (2012) A multi-strategy approach to biological named entity recognition, Exp Syst Appl 39(17):12968–12974 25. Chopra D, Jahan N, Morwal S (2012) Hindi named entity recognition by aggregating rule based heuristics and hidden markov model. Int J Inf Sci Tech 2(6):43–52 26. Saha S, Ekbal A (2013) Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl Eng 85:15–39 27. Munkhjargal Z, Bella G, Chagnaa A, Giunchiglia F (2015) Named entity recognition for mongolian language. In: International conference on text, speech, and dialogue, Springer International Publishing, pp 243–251
A Mathematical Model to Explore the Details in an Image with Local Binary Pattern Distribution (LBP) Denny Dominic, Krishnan Balachandran, and C. Xavier
Abstract Mathematical understanding is required to prove the completeness of any research and scientific problem. This mathematical model will help to understand, explain and verify the results obtained in the experiment. The model in a way will portray the mathematical approach of the entire research process. This paper discusses the mathematical background of proposed prediction of lung cancer with all the parameters. Processes involved analyzing the 2D images, basic quantitative method, from, related equation and fundamental algorithmic understanding with slightly modified versions of prediction are represented in the below section with how the local binary pattern distribution can be modified so that we get reduced run time and better accuracy in the final result. Keywords Proven · Local details pattern · Direction of growth · Quantitative method
D. Dominic (B) · K. Balachandran · C. Xavier Department of Computer Science and Engineering, Christ (Deemed to be University), Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_33
439
440
D. Dominic et al.
1 Introduction 1.1 Local Details Pattern In recent years, there is an increasing interest among researchers to use proven mathematical techniques to explore the details in an image. This is partially because of the growth of emerging technologies such as computer vision but mainly due to the immense applications of extracting fine details from the images such as medical diagnostics, cyber-security and many more. From a given image, the pattern of the details distributed around has been studied by several researchers [1–3]. Recognizing the pattern distribution of details around a pixel has been studied local binary pattern distribution (LBP) [4–6]. In LBP, we scan all the neighboring pixels and find out where each of the eight neighbors has a larger value than the pixel itself [7–9]. This is represented as a binary number and the vector of size 8. By processing this vector, we study the pattern of the image [10–13]. In our research, we extended this study as follows: 1. Instead of binary values, we capture the actual increase of the value from the value at the home pixel. 2. We extended the neighborhood up to K distance and aggregate the behavior of the neighbors when compared to the home pixel.
2 Methodology The above extension enriches the study of the pattern. The local details pattern (LDP) approach consists of three phases. Phase 1: Local Detail Analysis In this phase, we consider the k neighbors of the home pixels in all the eight directions and aggregate the values. This produces the local details vector. L(x) = [l0 , l1 , l2 , . . . , l7 ]T where each l i is a number. Phase 2: Identify the Direction of Growth In this phase, we convert the local details vector L(x) into a probability density function which proposes the probability that the growth is in a particular direction (for eight directions). P(i) =
eri 7 eri i=0
(1)
A Mathematical Model to Explore the Details in an Image with Local …
441
We use soft max function to evaluate the pdf . The result of this phase is a pdf of eight directions P(x) = (P0, P1, P2, P3, P4, P5, P6, P7) where Pi = probability that the growth is in direction i. Phase 3: Validation of the pdf In this phase, we validate if the pdf gives meaningful guidance for the decision making in identifying the growth direction. Note that a pdf need not always give meaningful guidance. If all the probabilities are the same, there is no guidance at all. We use the mathematical tool entropy to accomplish the task of validation. The entropy of a pdf p = (P0, P1, … Pdf ) is defined by H ( p) = −
d−1
Pi log2 pi
(2)
i=0
When the entropy is small, the pdf gives meaningful guidance. Phase 4: Confirmation of the Directions If the pdf is worth considering for guidance, we move in the guided direction, choose the neighboring pixel x i , and find the LDP vector and the corresponding pdf. Now we have two pdf. The pdf p(x) is the pdf of the home pixel x. The pdf p(x ) is the pdf of the neighbor pixel x. Note that phase 3 has recommended the growth direction x → x . We use cross-entropy to verify if p(x) and p(x ) are confirming the pattern. The cross-entropy of two pdf p and p is defined as H ( p, p ) = −
d−1
Pi log2 pi
(3)
i=0
when the cross-entropy is small, they confirm the direction. In case the cross-entropy value is high, the growth direction is not consistent. The overall approach is shown in the flowchart.
442
D. Dominic et al.
3 Working Model Phase 1 Local Details Pattern Analysis Local binary pattern (LBP) approach compares the details in a pixel to the neighbors, arrives in a pixel to the neighbors and arrives at a Boolean vector whether the neighboring value is smaller or bigger. This approach has several limitations and underutilization of the details available, and when binary values 0 or 1 is assigned, there is no consideration of how big or how small. Consider a pixel having a value of 5. Consider the neighboring values. In the details presented in Fig. 1a and b, we get the LBP vector as [1, 1, 1, 1, 1, 1, 1, 1,]T is the split of difference in the details. This shows that the LBP vector does not reflect the fact on the ground. In our approach, we propose a few vectors which represent the “fact on the ground” in a more accurate manner. Rectified Linear Unit (ReLu) Patterns Let the values around the pixel be x 0 , x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 as shown in Fig. 2. The first-order vector L i is defined as T L i = [ L i (xi )]i=0 to 7
where L i (xi ) =
250 250 6
(4)
0 if x > xi xi − x if x ≤ xi
250
250
6
6
6
5
250
6
5
6
6
6
6
6
6
LBP Vector = [1, 1, 1, 1, 1, 1, 1, 1,]T
LBP Vector =[1, 1, 1, 1, 1, 1, 1, 1,]T
(a)
(b)
Fig. 1 LBP vector = [1, 1, 1, 1, 1, 1, 1, 1,]T
Fig. 2 Values around the pixel
x7
x0
x1
x6
x
x2
x5
x4
x3
A Mathematical Model to Explore the Details in an Image with Local …
443
In this approach whenever the pixel value x is smaller, the actual difference of each x i is returned. If x is a bigger value than its neighbor x i , a zero is returned as in local binary pattern (LBP) model. The general ReLU function is shown in Fig. 3. In the case of the ReLU function used in the location details pattern (LDP), we have the advantage of getting details on the amount of growth in each direction. Second-order LDP Vector Consider the pixel x and its neighboring pixels as shown in Fig. 4. Definition The second-order LDP vector of a pixel x of an image given below is defined as the vector L 2 = [l2 (x j )]Tj=0 to 7 where
l2 (x j ) =
0
xoj +xi j 2
(5)
xoi +x1i if x > 2 − x , otherwise
This is illustrated by the growth of the pixel value in each of the eight directions N, NE, SE, SW, W, NW and NW. In each direction, we find the two pixels and find out Fig. 3 General ReLU function
y
ReLu function
X -4
Fig. 4 Second-order LDP vector of a pixel x of an image
-3
-2
0
X17
X15
2
34
X10 X07
X16
1
X11 X01
X06
X00 x
X02
X05
X04
X03
X14
X12
X13
444
D. Dominic et al.
Fig. 5 Growth of the pixel value in each of the eight directions
how much the average exceeds x and applies the ReLU function. This is illustrated in Fig. 5. 30 + 20 − 10 = 15 2 40 + 50 l2 (x1 ) = − 10 = 35 2 35 + 45 l2 (x2 ) = − 10 = 30 2 10 + 5 l2 (x3 ) = 0 because = 7.5 < 10 2 10 + 5 l2 (x4 ) = 0 because = 7.5 < 10 2 10 + 8 l2 (x5 ) = 0 because = 9 < 10 2 10 + 8 l2 (x6 ) = 0 because = 9 < 10 2 20 + 20 l2 (x7 ) = 0 because − 10 = 10 2
l2 (x0 ) =
The second-order LDP vector of a pixel x of an image is defined and illustrated. This can be extended to an order k as follows. Let I denote the image of size m * n. Each pixel I of this image is a number from 0 to 255. Let k be an integer. In the image matrix of size m * n, we shall consider a border of k pixels in all the four sides and consider the internal pixels in the region
A Mathematical Model to Explore the Details in an Image with Local …
445
of size (m−2k) * (n−2k). For example, if the image is of 800 * 700 size and we consider k = 5, the internal area is of (790 * 690). lk (x) = [lk (x j )]Tj=0 to 7 where each lk (x j ) is defined as follows: k−1 for a given j, calculate, X j = k1 i=0 xi j . lk (x j ) =
0
if x > x x − x if x ≤ x
(6)
by choosing k appropriately, we can avoid the errors due to scanning of image or manipulation during storing or format conversion. Image Extension Approach In the above definition of the kth order LDP vector, the details vector cannot be determined at the border k pixels. There is a work around proposed to approximately find the vector at the border pixels. For every cell in the upper border (North), we assign the average of the k-topmost pixel. For every cell in the bottom border, we assign the average of the k-bottom most pixels. Let us assume that the original picture is represented by (xi j )i=0,1,2,3....(m−1) j=0,1,2,3...(n−1) The upper border is first constructed as follows: For i = −1 ,−2 ,− 3 …,− k, we define Xi j =
k−1 1 xin−t , for i = n, n + 1, n + 2, ...., n + k − 1 n t=0
xi j ≤ x The lower border is constructed as follows: For i = m, m + 1, m + 2,…,(m + k−1), we define 1 xm−t j , where j = 0, 1, 2, ..., (n − 1) Xi j = k t=0 k−1
After extending the upper and bottom borders, we have a matrix of size (m + 2k) * n. Now we can extend the left border and right border. They are done along the extended length of (m + 2k), for i = − k to m + k−1. The left border is constructed as follows. For j = −1 ,−2 ,−3 ,…,− k, we define
446
D. Dominic et al.
Xi j =
k−1 1 xt j , for i = −k to (m + k − 1) n t=0
For j = n, n + 1, n + 2,…,(n + k−1), we define Xi j =
k−1 1 xin−t , for i = n, n + 1, n + 2, ...., n + k − 1 n t=0
Now, after extension of the border we have a matrix of (m + 2k) * (n + 2k). The actual index I ranges from – k to m−k + 1 and j ranges from – k to n−k + 1. In this image, we can compute the kth order LDP vector, for all the pixels x ij , i = 0 to m−1 and j = 0 to n−1. Leaky ReLU function We have proposed the ReLU function in the computation of kth order LDP function Lk . By design, the function assigns zero values to the direction where there is no growth (i.e., xi j ≤ x). There is a cascading effect of “vanishing the result” in using zeros. The vanishing gradient issue of deep convolutional neural network is an instance of the same. In an attempt to avoid such damages, a leaky ReLU function has been introduced in research. The leaky ReLU function is defined as follows Fig. (6). f (x) =
0.1x if x < 0 x if x ≥ 0
While we use leaky ReLU function in our computation, we can rewrite our formula as follows: (4) becomes l1 (xi ) =
1 {x −x} if x −x It is total value of harmonic progression of p and q in cross entropy
13.45397488
14.04429156
A Mathematical Model to Explore the Details in an Image with Local …
451
4 Conclusion This paper has discussed how local binary patterns can be efficiently manipulated. Here, instead of binary values, the values that are of home pixels are taken into account for the research. We extended the neighborhood up to K distance and aggregated the behavior of the neighbors when compared to the home pixel. Thus, we could get a higher cross-entropy, where the correlation is between the guidance. When the cross-entropy value is small, they confirm the direction. In case the crossentropy values are high, the growth direction is not consistent. The overall approach is shown in the flowchart. Thus, it can get a higher cross-entropy, where there is a poor correlation between the guidance. The relationship between the entropy and correlation is inverse.
References 1. Ahsan MM (2018) Real time face recognition in unconstrained environment; Lamar UniversityBeaumont: Beaumont. TX, USA 2. Fu Y (2015) Face recognition in uncontrolled environments. Ph.D. Thesis, University College London (UCL), London, UK 3. Gupta KD, Ahsan M, Andrei S (2018) Extending the storage capacity and noise reduction of a faster QR-code. Brain Broad Res Artif Intell Neurosci 9:59–71 4. Gupta KD, Ahsan M, Andrei S, Alam KMR (2017) A robust approach of facial orientation recognition from facial features. Brain Broad Res Artif Intell Neurosci 8:5–12 5. Mukhopadhyay S, Sharma S (29–31 January 2020) Real time facial expression and emotion recognition using eigen faces, LBPH and fisher algorithms. In: Proceedings of the 2020 10th international conference on cloud computing, data science & engineering (confluence), Noida, India, pp. 212–220 6. Vardhini MPR, Suhaprasanna G, Mamatha G, Sri G (2018) Face recognition using student attendance system. Bachelor’s thesis, Manchester Metropolitan University, Manchester, UK 7. Deeba F, Ahmed A, Memon H, Dharejo FA, Ghaffar A (2019) LBPH-based enhanced real-time face recognition. Int J Adv Comput Sci Appl 2019:10 8. Abad BB (2017) Proposed image pre-processing techniques for face recognition using openCV. In: Proceedings of the 3rd SPUP international research conference, Cagayan, Philippines 9. Abuzneid MA, Mahmood A (2018) Enhanced human face recognition using LBPH descriptor, multi-KNN, and back-propagation neural network. IEEE Access 6:20641–20651 10. 5_Celebrity Faces Dataset (2018) Available online: https://www.kaggle.com/dansbecker/5-cel ebrity-facesdataset. Accessed 16 Aug 2019 11. DJI Phantom-4 Info (2018) Available online: https://www.dji.com/phantom-4/info. Accessed 22 March 2020 12. Pietikäinen M, Hadid A, Zhao G, Ahonen T (2011) Computer vision using local binary patterns. Springer 13. Lategahn H, Gross S, Stehle T, Aach T (2010) Texture classification by modeling joint distributions of local patterns with Gaussian mixtures. IEEE Trans Image Process 19:1548–1557
Performance Evaluation of Energy Detection for Cognitive Radio in OFDM System Rania Mahmoud, Wael A. E. Ali, and Nour Ismail
Abstract In this paper, orthogonal frequency division multiplexing-based modulation schemes are used in addition to its advantage in reducing inter-symbol interference (ISI) in cognitive radio (CR). An OFDM modulation scheme for CR, with QPSK symbol mapping techniques, is implemented. The energy detection method is used to perform spectrum sensing for this ISI-free system in both AWGN and Rayleigh fading channels. ROC curves and CROC are used to compare the performance of these systems. According to the findings, QPSK-OFDM performed better in AWGN channel for spectrum sensing using energy detection method. Corruption of execution as resulting from multipath and shadowing is made strides by cooperative range detection. At first, range detecting is compared utilizing diverse channels, and after that change is appeared utilizing collaborative range detecting. Keywords Non-cooperative cognitive network · AWGN · Nakagami channel · Rayleigh channel · Probability of false alarm · Probability of detection · Probability of missed detection
1 Introduction In accordance with Federal Communications Commission (FCC), a cognitive radio is “a radio or system that senses its operational electromagnetic environment and can dynamically and autonomously adapt its radio operating parameters to modify system R. Mahmoud (B) Electronics and Communications Engineering Department, Alexandria Higher Institute of Engineering and Technology, Alex, Egypt e-mail: [email protected] W. A. E. Ali Electronics and Communications Engineering Department, Arab Academy for Science, Technology and Maritime Transport, Alex, Egypt e-mail: [email protected] N. Ismail Electrical Engineering Department, Alexandria University, Alex, Egypt e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_34
453
454
R. Mahmoud et al.
operations such as maximize throughput, mitigate interference, and facilitate interoperability and access secondary markets” (FCC). Wireless applications have grown in popularity as a result of the technological revolution, increasing the demand for radio spectrum. Spectrum, on the contrary, is a limited resource that cannot be divided indefinitely to accommodate all applications. In addition, emerging wireless applications require a large amount of bandwidth to operate, and their bandwidth usage has increased in recent years exponentially. Cognitive radios can detect the spectrum and determine whether something is present or not. Cognitive radios can detect the spectrum and determine the presence or absence of the primary user in a particular subcarrier band. Cognitive radio can specify the frequency spectrum and decide whether or not the primary user is present in a specific subcarrier band. A cognitive radio (secondary user) can occupy the radio spectrum opportunistically when it is vacant, optimizing the radio frequency band. The performance of sensing techniques determines the effectiveness of cognitive radio [1]. Known spectrum-sensing techniques are reviewed, including energy detection, entropy detection, matched-filter detection, and cyclostationary detection. The results show that the matched filter is more complex because it relies on primary user information and has low probability of detection and high missed detection, whereas cyclostationary can improve the complexity in the matched filter [2]. In this paper, the energy-sensing technique is investigated. Simulations are carried out using (MATLAB) to set a benchmark. ISI and reduced data rates have also been observed in current communication systems, in addition to spectrum allocation. To overcome these drawbacks, CRs use the OFDM scheme. In contrast to traditional CR based on single carrier, OFDM uses multiple subcarriers which are orthogonal to each other. ISI is reduced because the subcarriers are orthogonal. This paper uses QPSK symbol mapping techniques to implement OFDM modulation for CRs. The system is implemented with energy detection (ED) spectrum sensing and is resolved in AWGN and multipath fading channel models. The performance is assessed using receiver operating curves (ROC) and area under ROC [3, 4].
2 Cognitive Radio for OFDM System OFDM allows for high data rates with minimal interference as shown in Fig. 1. It is therefore a promising method that is employed in numerous cutting-edge communication systems, including as 5G, CRs, and others. The OFDM model in our work follows the same format as that seen in [5, 6]. Before being sent over an OFDM system, the input bit stream is modulated using quadrature phase shift keying (QPSK). In order to transmit data utilizing orthogonal carriers in OFDM, the inverse fast Fourier transform (IFFT) is utilized [7]. The performance of the system is assessed using AWGN and Rayleigh channel models in additive and fading situations. Highquality transmissions with effective spectrum utilization and little interference can be produced by integrating CR with OFDM. Equation (1) provides the model for spectrum sensing, where y(n) and x(n) are received signals of the primary user (PU)
Performance Evaluation of Energy Detection for Cognitive Radio …
455
Fig. 1 OFDM-based energy detection block diagram
and secondary user (SU), where w(n) stands for the AWGN and the channel gain. One of the various spectrum-sensing techniques employed in CR, energy detection, has the benefit of being minimal complexity as it necessitates no prior understanding of the PU. In ED of CR, energy computation and threshold comparison are used. The implementation of an ISI-free transmission system with OFDM takes place in both the AWGN and Rayleigh environments. After squaring the absolute values of sample values, energy is calculated [8].
2.1 Energy Detector This is the most widely used detection method, with “the straightforward computational and implementation complexity”. Signal detection does not necessitate previous information of the primary user signal; the test statistic is evaluated by comparing the received signal’s energy to a predetermined threshold. The threshold is calculated using noise energy, and its accuracy is critical to the energy detector’s performance. If the received signal energy at the cognitive radio exceeds the threshold, the alternate hypothesis H1 is validated and the primary user is assumed to be present. If the needed energy is lower, the null assumption H0 is investigated as shown in Fig. 2, thus indicating the presence of a hole in the spectrum. The binary hypothesis is presented in Eq. (1) [9]. The stages of the time domain energy detector are depicted in Fig. 2. The cognitive radio defines the signal being detected y(t) within a specific bandwidth (B) with a centre frequency (fc). Once the signal is received via the radio frequency (RF) antennas, it is filtered for channel selection by the band pass filters (BPF). As a result, the analogue signal is sampled using analogue to digital converters to obtain discrete signals (ADCs). The output signal is squared, and the average of N samples is computed to produce the energy test statistic, which is then compared to the threshold of a decision. The absence and presence of the primary user signal are determined by establishing a binary assumption H0 and H1 [10].
Fig. 2 Energy detector block diagram
456
R. Mahmoud et al.
2.2 Spectrum Sensing The test statistic is subjected to the decision rule shown in Eq. 1. The determined threshold in energy detection is supposed to be a Gaussian random process with zero mean and variance and a chi-squared central distribution [11]. Because of its inflexibility, the method has a lot of limitations for narrowband signals. To estimate the spectrum in real time, the periodogram method is used, which is defined by the square of the modulus of the fast Fourier transform (FFT). The periodogram method allows for access to wider bandwidths and the detection of multiple signals at the same time. The frequency resolution and FFT size can be tweaked to improve signal detection. Sensing time is increased by increasing the FFT size, which is analogous to changing the analogue pre-filter. Increasing the number of samples, sensing time is increased by increasing the FFT size, which is analogous to changing the analogue pre-filter. The accuracy of signal detection improves as sample sizes are increased. As a result, the sample size and frequency used are chosen in order to achieve low latency, minimal complexity, and the desired resolution [12]. H0 : y[n] = w[n] if PU is absent H1 : y[n] = w[n] + hs[n] if PU is present
(1)
2.3 Probability of False Alarm u, 21 pf = (u)
(2)
where u is identified by time to bandwidth product and λ is the threshold. Moreover, (u) is named by complete gamma function and u, 21 is the incomplete gamma function. Since p f is clearly independent of SNR in Eq. (2), it follows that p f is also independent of the channel’s characteristics and the degree to which fading and shadowing impair performance.
2.4 Probability of Misdetection The probability of misdetection is proportional to the probability of detection; once the probability of detection is known, pm can be evaluated as follows: pm = 1 − pd
(3)
Performance Evaluation of Energy Detection for Cognitive Radio …
457
2.5 Probability of Detection This strategy can also be used to compute the probability of detection. 1 pd =
Q m (2γ , λ) f γ (x)
(4)
x
where f γ (x) is the channel’s pdf, Q m is the Marcum Q function, and γ is the SNR. For the determination of pd , the method makes use of closed forms. When there is no fading channel considered as ideal AWGN channel, the probability of detection for AWGN channel is calculated as follows [13]: √ Pd = Q m 2γ , 2λ
(5)
Because the exponential distribution is the followed distribution for Rayleigh channel γ , thus the closed form that is derived for pd using (5), the formula can be written as follows[11]: − λ2
pd,ray = e
k u−2 u−2 λ 1 + γ u−1 λ 1 λk 1 λ − 2(1+) + × e − e2 (6) k! 2 γ k! 2(1+) k=0 k=0
Similar to that, the closed form for Nakagami channel that is determined for pd using (6) is shown as follows [9]: 2 −1 N
m − 2σλ2
pd,NAK = A1 + β e
i=1 λβ − 2mσ 2mσ 2 2 and A β= = e 1 2mσ 2 + γ α
λ i 2σ 2 i!
λ(1 − β) 1F1 m; i + 1; 2σ 2
(7)
m−2
i λ(1−β) +(1−β) β m−1 L m−1 − λ(1−β) β L i − 3σ 2 σ2 i=0
where m is referred to as the Nakagami parameter. Allowing users to collaborate is one way to improve fading-related spectrum sensing. Suppose N denotes the number of users collaborating, the false alarm and detection probabilities are driven as follows [13]: Q d = 1 − (1 − pd ) N
(8)
N Q f = 1 − 1 − pf
(9)
458
R. Mahmoud et al.
N is denoted by the number of users and Q d is denoted by collaborative probability of detection. To obtain the result of Q f which is known as collaborative probability of false alarm, pd is named an individual probability of detection and p f is the individual probability of false alarm.
3 Simulation Results The spectrum-sensing algorithm is modelled using MATLAB software, which is a high-level language tool. The PU (modelled in MATLAB) generates a quadrature phase shift keying (QPSK) modulated OFDM test signal. The received signal of the SU is created by adding AWGN to the modulated OFDM signal. The received signal’s energy test statistic is then computed and compared to a predetermined threshold, with a defined probability of false alarm. 10000 Monte Carlo iterations are performed to enhance the detector’s accuracy. Figure 3 depicts a plot of signal strength (SNR = −20 dB to 0 dB) versus probability of detection, with different false alarm probabilities (pf = 0.01, 0.1, and 0.2). An examination of the graph below reveals that increasing the probability of a false alarm increases the likelihood of false alarm detection. At an SNR of − 13 dB, for example, p f is 0.01 and the pd is 0.48, increasing pd to 0.84 by doubling p f to 0.2. It is important to note that, regardless of the probability of false alarm setting, the energy detector performs best at SNR = − 8 dB and can easily distinguish a primary user in the spectrum from the noise. Furthermore, a reduction in signal strength has a significant impact on the detector’s performance, as evidenced by the lower probability of detection for all false alarms. Below − 20 dB, the detector’s performance deteriorates significantly and is hard to distinguish the determinant (PU) signal from the noise signal. Fig. 3 Signal-to-noise ratio versus probability of detection with varying probability of false alarm under Rayleigh channel
Performance Evaluation of Energy Detection for Cognitive Radio …
459
Figure 4 shows that the low SNR value of the Rician model indicates low performance. For 0 dB SNR, the performance increases relative to 4 dB. But as the SNR value increases more, such as 2 dB, the performance changes rapidly with increasing false alarm probability value. Figure 5 depicts the relationship between signal strength (SNR = − 25–0 dB) and detection probability with differing sample size (a = 128, 256, 512). The false alarm probability is set to 0.1. In Fig. 6, the probability of misdetection versus the probability of a false alarm is compared for Rayleigh, Rician, and Nakagami channels. When the Rician and Nakagami channels are compared, it is clear that the Rayleigh channel has a lower chance of misdetection for − 4 dB. However, as SNR rises, the Rayleigh model’s performance rapidly deteriorates. When comparing the Rician and Nakagami channel models, the Rician channel model performs better at lower SNR values, while the
Fig. 4 Comparison between p f and pm for Rician channel with QPSK and BPSK modulation
Fig. 5 Signal-to-noise ratio versus probability of detection with varying sample size ( p f = 0.1) under Rayleigh channel
460
R. Mahmoud et al.
Fig. 6 Comparison between p f and pm for AWGN, Rayleigh and Nakagami, Rician channels with different signal-to-noise ratio
Nakagami channel model performs slightly better at higher SNR values. As a result, the Rician and Nakagami channel models outperform the Rayleigh channel model, with Nakagami becoming more severe at higher SNR values for the energy detection method than both. Figure 7 depicts p f versus the pd in the Nakagami channel for various SNR values, where m Nakagami parameter was set 2. From the figure, it can be noticed that with low value of p f , pd reaches the highest value, and with increasing more p f , pd becomes saturated for SNR 2 dB. As previously discussed, higher SNR values outperform lower SNR values. Figure 8 shows that when the value of false alarm is low, the value of misdetection is high. The probability of detecting the presence of the primary user incorrectly decreases as p f increases. It is also observed that the performance of the Nakagami channel is better at higher SNR values than at lower SNR values.
Fig. 7 Comparison between pd and pm for Nakagami channel with QPSK and BPSK modulation
Performance Evaluation of Energy Detection for Cognitive Radio …
461
Fig. 8 Comparison between p f and pm for Nakagami channel with QPSK and BPSK modulation
4 Conclusion In this paper, the cognitive radio ROC curves under Rayleigh, Rician, and Nakagami channel models are discussed. The individual performance of each model for energy detection method at different values of SNR is investigated. Furthermore, the performance of OFDM CR system is evaluated, when the CR device uses compressed sensing with cyclostationarity detection to continually sense the channel to decide whether it is idle or not, and then reconstructs the signal if communication is possible for the specific CR receiver from its intended CR transmitter. When assessing spectrum sensing, the likelihood of false alarm and the likelihood of misdetection are compared for different SNR and channels, and the results show the outstanding performance of Rayleigh channel model.
References 1. Arjoune Y, Kaabouch N (2019) A comprehensive survey on spectrum sensing in cognitive radio networks: recent advances, new challenges, and future research direct ions. Sensors 19(1):126 2. Garg R, Nit S ((2016)) Current trends and research challenges in spectrum sensing for cognitive radios. Int J Adv Comput Sci Appl (IJACSA) 16 3. Abd Ellatif W, Abd El Aziz D, Mahmoud R (2016) A 4-elements performance analysis of compact UWB antenna for MIMO-OFDM systems. In: 2016 IEEE International conference on wireless for space and extreme environments (WiSEE), pp 135–139. https://doi.org/10. 1109/WiSEE.2016.7877318 4. Thabet RM, Ali WAE, Mohamed OG (2020) Synchronization error reduction using guard-band allocation for wireless communication systems. Int Conf Innov Trends Commun Comput Eng (ITCE) 2020:308–312. https://doi.org/10.1109/ITCE48509.2020.9047789 5. Anuraj K, Poorna SS, Jeyasree S, Sreekumar V, Origanti CA (2019) Comparative study of spatial modulation and OFDM using QAM symbol mapping. In: 2019 International conference on intelligent computing and control Systems (ICCS), IEEE, pp 1246–1249 6. Origanti CA, Anuraj K, Poorna SS, Gokul Krishnan M, Kat akam Greeshmant H (2020) Performance comparison of orthogonal frequency division multiplexing in white Gaussian and Rayleigh Channels. In: 4th International conference on Inventive systems and control (ICISC-2020)
462
R. Mahmoud et al.
7. Mahmoud R, Abd Ellatif W, Gaafar O, Abd El Aziz D (2018) A calibration method for hybrid technique based on CMA with clipping in MIMO-OFDM system. In: 2018 11th German microwave conference (GeMiC), pp. 203–206. https://doi.org/10.23919/GEMIC.2018.833 5065 8. Singh J, Garg R, Aulakh IK (2016) Effect of OFDM in cognitive radio: advantages & issues. In: 2016 Second international conference on computational intelligence & communication technology (CICT), Ghaziabad, pp. 554–558. https://doi.org/10.1109/CICT.2016.115 9. Ali R, Hussain B, Ahmad A, Khan I, Jan S, Shah IA, Mahmood MA (2013) Cyclostationary spectrum sensing in cooperative cognitive networks. Bahria Univ J Inf Commun Technol (BUJICT) 6(1) 10. Dhaka MK, Verma P â˘AIJA relay based cooperative spectrum sensing selecting maximum value of SNR in multi-channel cognitive radioâ˘A˙I. In: IEEE international conference on recent advances and innovations in engineering (ICRAIE2014) 11. Zeeshan M, Khan A, Dai Z (2017) Cooperative-cognitive radio networks: Performance analysis of energy detection. In: 2017 Progress in electromagnetics research symposium—Spring (PIERS), St . Pet ersburg, pp 320–325. Fu X, Zhu Y, Yang J, Zhang Y, Feng Z (2015) Simplified cyclo stationary detector using compressed sensing. In: Proceedings IEEE WCNC, New Orleans, LA, USA, pp 259–264 12. Alom MZ, Godder TK, Morshed MN, Maali A (2017) Enhanced spectrum sensing based on energy detection in cognitive radio network using adaptive threshold. In: 2017 International conference on networking systems and security (NSysS), Dhaka, pp 138–143 13. Ashrafl M, Khan J, Rasheed H, Ashrafl F, Muhammad Faizan, Anis MI (2017) Demonstration of energy detector performane and spectrum sensing in cognitive radio using a WGN, Rayleigh and Nakagami Channels, IEEE. 978-1-5090-3310-2/17/$3 1.00© 2017
Modified Iterative Shrinkage Thresholding Algorithm for Image De-blurring in Medical Imaging Himanshu Choudhary, Kartik Sahoo, and Arishi Orra
Abstract Computer-assisted medical diagnosis has grown its popularity among researchers and practitioners due to its high applicability and cost-effective applications. The use of learning methods in facilitating the processing and analyses of Medical Images has become necessary. Image de-blurring problem in Medical Imaging applications is a challenging task to improve the quality of the images. The class of Iterative Shrinkage Thresholding Algorithms (ISTA) is considered for handling the image de-blurring problems. These algorithms use first-order information and are tempting due to their simplicity. However, they converge pretty slowly. In this work, first, a faster version of ISTA is proposed by utilizing Nestrove’s updating rule. Later, the proposed approach is used to solve the image de-blurring problems arising in Medical Imaging. A theoretical study is performed to verify the improvements in the convergence rate of the modified fast ISTA. Numerical experiments suggest that the presented method for the image de-blurring problem outperforms the baseline ISTA method. Keywords Image de-blurring · First-order methods · Convergence · Iterative Shrinkage Thresholding Algorithms
1 Introduction Medical Imaging techniques including magnetic resonance imaging (MRI) [1, 2], computed tomography (CT), positron emission tomography (PET), ultrasound, mammography, and X-ray have been widely used for earlier identification, diagnosis, and treat diseases throughout the last several decades [3–5]. Human professionals like doctors and radiologists traditionally have done medical image analysis in hospitals. However, clinicians and researchers have started to benefit from computer-assisted interferences due to the probable tiredness of human expertise and significant variH. Choudhary (B) · K. Sahoo · A. Orra Indian Institute of Technology, Mandi 175005, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_35
463
464
H. Choudhary et al.
ances in pathology. Computerized techniques have the capability to provide advantages, such as precisely matching the information in the various images and offering the tools for viewing the combined images. The process of restoring sharper images from the images that have been deteriorated owing to blurring, noise, or both is known as image restoration. Image de-blurring is one of the most fundamental problems arising in image processing and has become a popular research area in the past few years. Generally, image deblurring problems are commonly encountered in image content identification [6], medical diagnosis and surgery [7], image inpainting [8], surveillance monitoring, astronomy, and remote sensing [9]. Each of these problems can be summarized as follows: Aμ = b + w
(1)
where A ∈ Rm×n and b ∈ Rm×1 are known blurring matrix and blurred image, respectively, μ ∈ Rn×1 and w ∈ Rm×1 are the unknown true image and noise, respectively. Determining μ from the noisy and blurred image b is known as the image de-blurring problem [10–12]. Generally, the image de-blurring problem (1) may be converted into the least square [LS] problem [13], where the estimator is selected in this way such that it minimizes the data error Aμ − b2 , where · is the Euclidean norm. In most image de-blurring problems, the blurring matrix A is frequently ill-conditioned [14], and thus, the LS solution has a large norm that turns it meaningless. Different alternative approaches have been suggested to address this issue, but regularization approaches are one prevalent strategy. One regularization approach that has sparked renewed interest and a significant amount of attention in the image and signal processing literature is l1 regularization approach. From this perspective, Problem (1) can be restated as min{F (μ) = Aμ − b2 + ημ1 } μ
(2)
where η is a positive regularization parameter and · 1 is l1 norm. The inclusion of the l1 norm term in the convex optimization problem (2) supports the sparse solution. Problem (2) is a second-order cone programming problem (second-order CPP) and can be tackled using the interior points methods [15]. However, the de-blurring image problem is large scale and incorporates the dense matrix data, which restricts these methods’ utilization and potential utility. In order to successfully address such issues, several techniques have been suggested. In the recent years, first-order gradient-based optimization algorithms have grabbed the eye of many researchers. These methods are tempting due to their directness, faster convergence, and sufficient for solving large-scale problems even when the data is a dense matrix. The matrix-vector multiplication incorporating A and A T is the fundamental computational effort of these algorithms. As a result, several first-order gradient-based optimization algorithms for solving the problem (2) have been developed, which include the Iterative Shrinkage Thresholding Algorithm (ISTA) [17], two-step ISTA (TwIST) [18], the Fast Itera-
Modified Iterative Shrinkage Thresholding Algorithm …
465
tive Shrinkage Thresholding Algorithm (FISTA) [19], monotone FISTA (MFISTA) [20], and improved FISTA (IFISTA) [21, 22]. ISTA is an iterative proximal forwardbackward approach, in which a shrinkage operator [23] is performed at each iteration. Particularly, the general term of the ISTA algorithm is as follows: μi+1 = Tλθi μi − 2θ A T (Aμi − b)
(3)
where θ > 0 is a suitable step-size, Tξ : Rn → Rn is a shrinkage operator and defined as Tξ (μi ) = (|μi | − ξ )+ sgn(μi )
(4)
Several versions of ISTA have been introduced in the literature to develop fast and more efficient algorithms. To achieve a faster convergence rate, [18] proposed the two-step ISTA, which extends the standard ISTA and uses the last two iterations at each iteration. ISTA can also be accelerated by using the subspace optimization [24]. Beck and Teboulle [19] introduced the FISTA for solving a linear inverse problem by incorporating Nestrove’s rule into ISTA. The outline of this paper is as follows: In Sect. 2, the details of the ISTA and FISTA are presented for both constant and non-constant step-size, along with the theoretical analysis. In Sect. 3, the effectiveness of the presented method is verified with the help of numerical examples. In Sect. 4, the conclusion of the paper is provided.
2 Methodology Consider a non-smooth convex optimization problem min{F (μ) ≡ λ(μ) + h(μ) : μ ∈ Rn }
(5)
where λ : Rn → R is closed, convex, smooth, and continuously differentiable function with Lipschitz continuous gradient L(λ) s.t. ∇λ(μ) − ∇λ(ν) = L(λ)μ − ν, ∀ μ, ν ∈ R n , whereas h : Rn → R is closed, convex but non-smooth function.
2.1 Iterative Shrinkage Thresholding Algorithm (ISTA) The Iterative Shrinkage Thresholding Algorithm (ISTA) [17] belongs to the category of gradient-based first-order algorithms. The simplicity of this method is one of its advantages. However, the convergence rate of ISTA toward the solution is quite slow. In this method, the shrinkage operation [23] is performed at each iteration. The (i + 1)th term of the ISTA algorithm is as follows:
466
H. Choudhary et al.
μi+1 = T (μi − θi (μi − ∇λ(μi )))
(6)
where T is an iterative shrinkage operator and denoted by
1 μ − ν2 + h(μ) T(ν) = argmin 2θ μ
(7)
The above operation is known as the soft-thresholding when h(μ) = ημ1 . Also, when we take h(μ) = 0, the above operation is nothing but the identity operator, implying that ISTA is the gradient method in this scenario. The sequence of function values F (μ) converges to the optimal function value F (μ)∗ with the rate of 1 , commonly known as a “sub-linear” rate of convergence for convergence of O i the gradient method. Theorem 1 Let {μi } be the sequence obtained by the Iterative Shrinkage Thresholding Algorithm (ISTA) with a constant step-size θi = 1/L(λ). Then for any i ≥ 1, F (μi ) − F (μ∗ ) ≤
L(λ)μ0 − μ∗ 2 ∀ μ∗ ∈ M ∗ , 2i
(8)
where M ∗ is set of the optimal solutions. Proof For proof, see [19]. The quadratic approximation of F (μ) := λ(μ) + h(μ) for any positive L at a given point ν is defined as Q L (μ, ν) := λ(ν) + μ − ν, ∇λ(ν) +
L μ − ν2 + h(μ) 2
(9)
Lemma 1 Let ν ∈ Rn and a positive L be such that F (T L (ν)) ≤ Q(T L (ν), ν), then for any μ ∈ Rn , F (μ) − F (T L (ν)) ≥
L T L (ν) − ν2 + Lν − μ, T L (ν) − ν . 2
Remark 1 ISTA produces a sequence of function values F (μi ) which is not increasing (non-increasing). In reality, for every i ≥ 1, F (μi ) ≤ T L i (μi , μi−1 ) ≤ T L i (μi−1 , μi−1 ) = F (μi−1 ) where either L i ≡ L(λ) is a given Lipschitz constant of ∇λ or chosen by the backtracking rule.
Modified Iterative Shrinkage Thresholding Algorithm …
467
Remark 2 (Bounds for L i ) Since F (μi ) ≤ T L (μi , μi−1 ) is satisfied at each iteration. Therewith, the constant L i produced by the backtracking satisfy the following bounds for every i ≥ 1, L 0 ≤ L i ≤ max{ωL(λ), L 0 } In the above inequality, L 0 ≤ L i is obvious. Now come on the inequality L i ≤ max{ωL(λ), L 0 }, here if ωL(λ) ≤ L 0 , then max{ωL(λ), L 0 } = L 0 , i.e., L i = L 0 , else ωL(λ) > L 0 , in this case, ∃ an index 0 ≤ i ≤ i for which the inequality Li replaced with L i . With the F (μi ) ≤ Q L (μi , μi−1 ) not satisfied with i = i and ω Li < L(λ). Hence, L i ≤ max{ωL(λ), L 0 } is help of descent lemma, we can write ω proved. We write bounds in the following way also, ρ L(λ) ≤ L i ≤ σ L(λ)
(10)
where ⎧ ⎨1,
constant σ = L0 ⎩max ω, , backtracking L(λ)
⎧ ⎨1, constant and ρ = L0 ⎩ , backtracking L(λ)
2.2 Fast Iterative Shrinkage Thresholding Algorithm (FISTA) The sequence {μi } obtained by the Iterative Shrinkage Thresholding Algorithm converges quite slowly toward the solution. In Theorem 1, we saw that the Iterative Shrinkage Thresholding Algorithm shares a sub-linear global rate of convergence, 1 . Therefore, we are looking i.e., the rate of convergence of ISTA is of order O ε for new optimization methods with the same simplicity as ISTA but a higher convergence rate. Simplicity means the computational complexity will remain the same as ISTA. However, the global rate of convergence of the presented methods will be significantly better than ISTA. The presented methods developed an update rule for the input to the minimization problem to achieve the goal.
θi+1 = νi+1
1+
1 + 4θi2
2 θi − 1 (μi − μi−1 ) = μi + θi+1
(11)
468
H. Choudhary et al.
This updated rule was introduced first by Yurii Nesterov at 1983. He used this to develop an algorithm for minimizing a smooth convex function [25]. The global 1 convergence rate of this algorithm is of order O 2 , which is better than the i previous one, but the objective function must be convex for this algorithm. The considered problem is convex but non-smooth due to the presence of function h(μ). To overcome this issue, Beck and Teboulle [19] extended Nestrove’s method [25], a faster algorithm than ISTA, called Fast Iterative Shrinkage Thresholding Algorithm (FISTA). If we leave the updated rule (9), the rest of the FISTA is exactly the same as ISTA, which holds its simplicity, but this algorithm’s convergence rate improves. The beauty of this updated rule is that θ (i) is independent of λ(μ) + h(μ) and depends on only θ (i − 1). This sequence of numbers then magically empowers the convergence rate [26] of the algorithm. The iterative shrinkage operator T L (·) is used in FISTA at the point νi , which is chosen smartly and employing a very consecrate linear combination of the prior two points {μi−1 , μi−2 }. Algorithm 1 Fast Iterative Shrinkage Thresholding Algorithm for constant step-size L := L(λ) A Lipschitz constant of ∇λ. Step 0. Choose initial inputs as ν1 = μ0 ∈ Rn , θ1 = 1 Step i. For i ≥ 1 calculate μi = T L (νi )
1 + 1 + 4θi2 θi+1 = 2 θi − 1 (μi − μi−1 ) νi+1 = μi + θi+1
Lemma 2 The sequence {μi , νi } which is generated by FISTA with constant stepsize or backtracking rule satisfy for every i ≥ 1 2 2 2 2 θ Yi − θ Yi+1 ≥ X i+1 2 − X i 2 Li i L i+1 i+1 where Yi = F (μi ) − F (μ∗ ), X i = θi μi − (θi − 1)μi−1 − μ∗ . Proof Applying Lemma 1 at the points μ := μi , ν := νi+1 with L = L i+1 . Since we have ν := νi+1 , therefore T L (νi+1 ) = μi+1 , to get F (μi ) − F (μi+1 ) ≥
L i+1 μi+1 − νi+1 2 + L i+1 νi+1 − μi , μi+1 − νi+1 2
Now, add and subtract F (μ∗ ) in right-hand side, we get
Modified Iterative Shrinkage Thresholding Algorithm …
469
Algorithm 2 Fast Iterative Shrinkage Thresholding Algorithm for backtracking stepsize Step 0. Choose L 0 > 0, some ω > 1, and μ0 ∈ Rn . Initialize ν1 = μ0 , θ1 = 1 Step i. For i ≥ 1 search the smallest non-negative integers m i s.t. with L¯ = ωm i L i−1 F (S L¯ (νi )) ≤ Q L¯ (S L¯ (νi ), νi )
put L i = ωm i L i−1 and find μi = S L i (νi ),
1 + 1 + 4θi2 θi+1 = , 2 θi − 1 (μi − μi−1 ) νi+1 = μi + θi+1
F (μi ) − F (μ∗ ) + F (μ∗ ) − F (μi+1 ) ≥
L i+1 μi+1 − νi+1 2 2
+ L i+1 νi+1 − μi , μi+1 − νi+1 L i+1 μi+1 − νi+1 2 F (μi ) − F (μ∗ ) − F (μi+1 ) − F (μ∗ ) ≥ 2 + L i+1 νi+1 − μi , μi+1 − νi+1 ∵ Yi = F (μi ) − F (μ∗ ), therefore Yi − Yi+1 ≥ 2 L i+1
L i+1 μi+1 − νi+1 2 + L i+1 νi+1 − μi , μi+1 − νi+1 2
(Yi − Yi+1 ) ≥ μi+1 − νi+1 2 + 2νi+1 − μi , μi+1 − νi+1
(12)
Similarly, at the point μ := μ∗ , ν := νi+1 with L = L i+1 , we obtain, L i+1 μi+1 − νi+1 2 + L i+1 νi+1 − μ∗ , μi+1 − νi+1 2 2 F (μi+1 ) − F (μ∗ ) ≥ μi+1 − νi+1 2 + 2νi+1 − μ∗ , μi+1 − νi+1
F (μ∗ ) − F (μi+1 ) ≥ −
L i+1 2 − Yi+1 ≥ μi+1 − νi+1 2 + 2νi+1 − μ∗ , μi+1 − νi+1 L i+1
(13)
Now, multiply Eq. (12) by (θi+1 − 1) and add it to Eq. (13) to find a relation between Yi and Yi+1 ,
470
H. Choudhary et al.
(θi+1 − 1)
2 L i+1
(Yi − Yi+1 ) −
2 Yi+1 ≥ (θi+1 − 1)(μi+1 − νi+1 2 L i+1
+ 2νi+1 − μi , μi+1 − νi+1 ) + μi+1 − νi+1 2 + 2νi+1 − μ∗ , μi+1 − νi+1 2 2 2 2 (Yi − Yi+1 ) − Yi + Yi+1 − Yi+1 θi+1 L i+1 L i+1 L i+1 L i+1 ≥ (θi+1 − 1 + 1)μi+1 − νi+1 2 + 2(θi+1 − 1)(νi+1 − μi ) + νi+1 − μ∗ , μi+1 − νi+1 ∵ c ∈ R is a constant, then cα, β = cα, β or α, cβ , therefore 2
{(θi+1 − 1)Yi − θi+1 Yi+1 } ≥ θi+1 μi+1 − νi+1 2 L i+1 + 2θi+1 νi+1 − (θi+1 − 1)μi − μ∗ , μi+1 − νi+1 Now, multiplying the above inequality by θi+1 , we get 2
2 2 2 − θi+1 )Yi − θi+1 Yi+1 ≥ θi+1 μi+1 − νi+1 2 (θi+1
L i+1 + 2θi+1 θi+1 νi+1 − (θi+1 − 1)μi − μ∗ , μi+1 − νi+1 2 − θi+1 , we obtained From (11), using the relation θi2 = θi+1
2
2 θi2 Yi − θi+1 Yi+1 ≥ θi+1 (μi+1 − νi+1 )2
L i+1 + 2θi+1 θi+1 νi+1 − (θi+1 − 1)μi − μ∗ , μi+1 − νi+1 2 2 2 Yi+1 ≥ θi+1 (μi+1 − νi+1 )2 θi Yi − θi+1 L i+1 + 2θi+1 μi+1 − νi+1 , θi+1 νi+1 − (θi+1 − 1)μi − μ∗ Since, the Pythagoras relation is β − α2 + 2β − α, α − γ = β − γ 2 − α − γ 2 , therefore, from the last inequality, let a := θi+1 νi+1 , b := θi+1 μi+1 and c := (θi+1 − 1)μi + μ∗ , we have 2
2 θi2 Yi − θi+1 Yi+1 ≥ θi+1 μi+1 − (θi+1 − 1)μi − μ∗ 2
L i+1 − θi+1 νi+1 − (θi+1 − 1)μi − μ∗ 2
Again, from the equation (11), we get θi+1 νi+1 = θi+1 μi + (θi − 1)(μi − μi−1 ) and X i = θi μi − (θi − 1)μi−1 − μ∗ , it follows that 2 L i+1
2 θi2 Yi − θi+1 Yi+1 ≥ X i+1 2 − X i 2 ,
Modified Iterative Shrinkage Thresholding Algorithm …
471
∵ L i+1 ≥ L i , which yields 2 2 2 2 θ Yi − θ Yi+1 ≥ X i+1 2 − X i 2 . Li i L i+1 i+1
1 Example 1 If we have a positive real sequence {sn } = 1 + , which is a decreasn ing sequence, i.e., sn+1 ≤ sn , then for every i ≥ 1, sn ≤ s1 . Lemma 3 Let {si , ti } be a positive sequence of real numbers satisfying si − si+1 ≥ ti+1 − ti ∀ i ≥ 1, with s1 + t1 ≤ u, u > 0. Then, si ≤ u for every i ≥ 1. Proof From Lemma 2, we have 2 2 2 2 θ Yi − θ Yi+1 ≥ X i+1 2 − X i 2 Li i L i+1 i+1 2 2 2 X i+1 2 + θi+1 Yi+1 ≤ X i 2 + θi2 Yi L i+1 Li Now, with the help of Example 1, we obtain X i 2 +
2 2 2 2 θ Yi ≤ X 1 2 + θ Y1 Li i L1 1
From Lemma 2 and θ1 = 1, we have X 1 = μ1 − μ∗ and Y1 = F (μ1 ) − F (μ∗ ), thus put these values in the last inequality X i 2 +
2 2 2 F (μ1 ) − F (μ∗ ) θ Yi ≤ μ1 − μ∗ 2 + Li i L1
(14)
To prove the validity of relation s1 + t1 ≤ u, let us define the quantities, si :=
2 2 θ Yi , ti := X i 2 , u := ν1 − μ∗ 2 = μ0 − μ∗ 2 Li i
Now, 2 2 2 θ1 Y1 = Y1 ∵ θ1 = 1, L1 L1 ti := X 1 2 = μ1 − μ∗ 2
s1 :=
Since we have all values, applying Lemma 1 to the points μ = μ∗ , ν = ν1 with L = L 1 , we obtain
472
H. Choudhary et al.
F (μ∗ ) − F (T L (ν1 )) ≥
L1 T L (ν1 ) − ν1 2 + L 1 ν1 − μ∗ , T L (ν1 ) − ν1 2
Thus, F (μ∗ ) − F (μ1 ) = F (μ∗ ) − F (T L (ν1 )) L1 T L (ν1 ) − ν1 2 + L 1 ν1 − μ∗ , T L (ν1 ) − ν1 ≥ 2 L1 μ1 − ν1 2 + L 1 ν1 − μ∗ , μ1 − ν1 = 2 Again, from Pythagoras relation, =
L1 {μ1 − μ∗ 2 − ν1 − μ∗ 2 } 2
consequently, −
2 Y1 ≥ μ1 − μ∗ 2 − ν1 − μ∗ 2 L1 2 Y1 ≤ ν1 − μ∗ 2 − μ1 − μ∗ 2 L1
2 Y1 + μ1 − μ∗ 2 ≤ ν1 − μ∗ 2 L1
(15)
Hence, s1 + t1 ≤ u holds true. Now, combining both of Eqs. (14) and (15) yields 2 2 2 θ Yi ≤ X i 2 + θi2 Yi ≤ ν1 − μ∗ 2 Li i Li 2 2 θ Yi ≤ μ0 − μ∗ 2 {from defined quantity u} Li i si ≤ u ∀i ≥ 1 Thus, if {si , ti } be a positive sequence of real numbers satisfying si − si+1 ≥ ti+1 − ti , ∀ i ≥ 1, with s1 + t1 ≤ u, u > 0. Then, si ≤ u for every i ≥ 1. Lemma 4 (Lower bound of θi ) Let {θi } is a positive sequence generated via FISTA defined by
θ1 = 1, θi+1 = then θi ≥
i +1 for all i ≥ 1. 2
1+
1 + 4θi2 2
, i ≥1
Modified Iterative Shrinkage Thresholding Algorithm …
473
1+1 . Suppose that the claim hold for i, then Proof For i = 1, we have θ1 = 1 ≥ 2 i +1 θi = . Now, we will demonstrate that the claim holds for i + 1 as well, i.e., 2 i +2 θi+1 ≥ . 2 Given,
θi+1 = ∵ claim hold for i, then put θi =
θi+1 θi+1
1 + 4θi2 2
i +1 , we find 2
1 + (i + 1)2 2 1 + (i + 1)2 ≥ 2 1+i +1 ≥ 2 i +2 ≥ 2
θi+1 ≥ θi+1
1+
1+
∵ {θi } is increasing sequence. Hence we see that θi ≥
i +1 for all i ≥ 1. 2
Theorem 2 Let {μi }, {νi } be generated by Fast Iterative Shrinkage Thresholding Algorithm. Then for any i ≥ 1, F (μi ) − F (μ∗ ) ≤
2σ L(λ)μ0 − μ∗ 2 ∀ μ∗ ∈ M ∗ , (i + 1)2
L0 where σ = 1 and σ = max ω, for the constant step-size and the backtrackL(λ) ing step-size settings, respectively. Proof In Lemma 3, we show that, 2 2 θ Yi ≤ μ0 − μ∗ 2 Li i This inequality combined with θi ≥ yields
i +1 (Lower bound of θi ) from Lemma (4) 2
474
H. Choudhary et al.
2 (i + 1)2 Yi ≤ μ0 − μ∗ 2 Li 4 2L i μ0 − μ∗ 2 Yi ≤ (i + 1)2 2L i μ0 − μ∗ 2 F (μi ) − F (μ∗ ) ≤ (i + 1)2 This theorem shows that the required number of iterations to get an ε-optimal√ solution by the FastIterative Shrinkage Thresholding Algorithm is at most C/ ε − 1, where C = 2σ L(λ)μ0 − μ∗ 2 , which clearly improves Iterative Shrinkage Thresholding Algorithm.
3 Experiment and Discussions 3.1 Experimental Setup This section validates the presented approaches for image de-blurring problems. The comparison of FISTA and ISTA is based on performance. A constant step-size rule is used for both methods and applied to a l1 regularization problem (2), i.e., λ(μ) = Aμ − b2 and h(μ) = ημ1 . Neumann boundary conditions are applied in the experiment. ISTA and FISTA are tested to solve problem (2), where A is the product of two matrices: the blur operator R and the inverse of a three-stage Haar wavelet transformation W , i.e., A = RW , and b is the observed image. The regularization parameter is set to η = 2e−5 , and blurred image has been used as initial image. In previous sections, the theoretical proof is given, which shows that the convergence of FISTA is faster than ISTA. In this section, both algorithms are compared with the help of simulation. Also, it is observed that FISTA takes less number of iterations to achieve accuracy than ISTA.
3.2 Numerical Example An image data of 512 × 512 CT scan of the chest is taken as a test image. Every pixel in the original image has been scaled into a range (0, 1). The original image has been blurred with a Gaussian smoothing (Gaussian blur) operation of the size 9 × 9 and standard deviation (s.d.) 3. The original and blurred images are represented in Fig. 1. Now, both the algorithms are run for 100 iterations, and the objective function value of ISTA is 379.057 after 100 iterations, and the image produced by ISTA is
Modified Iterative Shrinkage Thresholding Algorithm …
475
Fig. 1 Blurring of CT scan
Fig. 2 ISTA after 100 iterations
described in Fig. 2. The objective function value of FISTA is 151.31 after the same number of iterations, which is a better one than the ISTA. The image produced after 100 iterations of FISTA is shown in Fig. 3. Again, the algorithms are run for 200 iterations, the objective function value of ISTA is 235.208, and the image produced by after 200 iterations are shown in Fig. 4. The objective function value of FISTA is 139.785 after 200 iterations, which is better than ISTA and produced the image shown in Fig. 5.
476
H. Choudhary et al.
Fig. 3 FISTA after 100 iterations
Fig. 4 ISTA after 200 iterations
The objective function value after 1000 iterations is also obtained for both the algorithms. The objective function value for ISTA and FISTA after 1000 iterations is 158.41 and 135.201, respectively. Notice that FISTA’s objective function value after 200 iterations is more desirable than ISTA’s after 1000 iterations, indicating favorable outcomes. The empirical findings shows that the presented algorithm is a
Modified Iterative Shrinkage Thresholding Algorithm …
477
Fig. 5 FISTA after 200 iterations
Fig. 6 Number of iterations versus objective function value plot for ISTA and FISTA
straightforward and more promising iterative technique and it has the potential to be even quicker than the theoretical rate that has been demonstrated in the previous sections. Now, the comparison of both the algorithms is made with the help of the graph. If simple inputs are taken to the algorithm and run 500 iterations, then Fig. 6) shows that FISTA converges more rapidly than ISTA.
478
H. Choudhary et al.
4 Conclusion An appropriate approach for solving the image de-blurring problem is presented in this paper. The following are the major contributions of our work: First of all, the presented FISTA algorithm can effectively handle the composite regularization problem that includes both the l1 norm term and the TV term, and it can readily be applied to additional Medical Imaging applications. Secondly, if p is the dimension of the actual image, i.e., μ, then the computational cost per iteration of the FISTA algorithm is just O( p log( p)). The presented algorithm also has excellent convergence characteristics. Because of these characteristics, the genuine image de-blurring problem is significantly more feasible than it was before. Finally, we run a series of tests to examine different de-blurring methods. In terms of both convergence and accuracy, the algorithm presented in this paper outperforms the existing algorithms. Its capabilities for designing and analyzing faster algorithms in various application domains with different regularizers, as well as a more comprehensive computational evaluation, is one obvious extension of this work.
References 1. Lang J, Gang K, Zhang C (2022) Adjustable shrinkage-thresholding projection algorithm for compressed sensing magnetic resonance imaging. Magn Reson Imaging 86:74–85 2. Hu Q, Hu S, Ma X, Zhang F, Fang J (2022) MRI image fusion based on optimized dictionary learning and binary map refining in gradient domain. Multimed Tools Appl 1–23 3. Dang N, Tiwari S, Khurana M, Arya KV (2021) Recent advancements in medical imaging: a machine learning approach. Machine learning for intelligent multimedia analytics. Studies in big data, vol 82. Springer, Singapore 4. Brody H (2013) Medical imaging. Nature 502(7473):S81–S81 5. Sharma P, Goyal D, Tiwari N (2022) Brain tumor analysis and reconstruction using machine learning. In: Congress on intelligent systems. Springer, Singapore, pp 381–394 6. Nishiyama M, Hadid A, Takeshima H, Shotton J, Kozakaya T, Yamaguchi O (2010) Facial deblur inference using subspace analysis for recognition of blurred faces. IEEE Trans Pattern Anal Mach Intell 33(4):838–845 7. Tzeng J, Liu CC, Nguyen TQ (2010) Contourlet domain multiband deblurring based on color correlation for fluid lens cameras. IEEE Trans Image Process 19(10):2659–2668 8. Karaca E, Tunga MA (2018) An interpolation-based texture and pattern preserving algorithm for inpainting color images. Expert Syst Appl 91:223–234 9. Ma J, Le Dimet FX (2009) Deblurring from highly incomplete measurements for remote sensing. IEEE Trans Geosci Remote Sens 47(3):792–802 10. Huang L, Xia Y, Ye T (2021) Effective blind image deblurring using matrix-variable optimization. IEEE Trans Image Process 30:4653–4666 11. Dong J, Roth S, Schiele B (2021) Learning spatially-variant MAP models for non-blind image deblurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4886–4895 12. Dong W, Zhang K, Zhu C, Xu G, Fei F, Tao S (2022) Efficient non-blind deconvolution method for large scale blurred image with hybrid regularizations. Optik 169630 13. Bjorck A (1996) Numerical methods for least squares problems. Soc Ind Appl Math 14. Hansen PC, Nagy JG, O’leary DP (2006) Deblurring images: matrices, spectra, and filtering. Soc Ind Appl Math
Modified Iterative Shrinkage Thresholding Algorithm …
479
15. Ben-Tal A, Nemirovski A (2001) Simple methods for extremely largescale problems. Lectures on modern convex optimization: analysis, algorithms, and engineering applications. SIAM, Philadelphia, PA, USA, pp 313–422 16. Elad M (2007) Iterative shrinkage algorithms. Sparse and redundant representations from theory to applications in signal and image processing. Springer, New York, NY, USA, pp 111–136 17. Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun Pure Appl Math 57(11):1413–1457 18. Bioucas-Dias JM, Figueiredo MAT (2007) A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans Image Process 16(12):2992– 3004 19. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202 20. Beck A, Teboulle M (2009) Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans Image Process 18(11):2419–2434 21. Bhotto MZA, Ahmad MO, Swamy MNS (2015) An improved fast iterative shrinkage thresholding algorithm for image deblurring. SIAM J Imaging Sci 8(3):1640–1657 22. Tianchai P (2021) An improved fast iterative shrinkage thresholding algorithm with an error for image deblurring problem. Fixed Point Theory Algorithms Sci Eng 1:1–25 23. Chambolle A, De Vore RA, Lee N-Y, Lucier BJ (1998) Non-linear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans Image Process 7(3):319–335 24. Elad M, Matalon B, Zibulevsky M (2007) Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization. Appl Comput Harmon Anal 23(3):346– 367 25. Nesterov YE (1983) A method for solving the convex programming problem with convergence rate O(1/k 2 ). Dokl. akad. nauk Sssr 269:543–547 26. Chambolle A, Dossal C (2015) On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J Optim Theory Appl 166(3):968–982
A Comprehensive Review on Crop Disease Prediction Based on Machine Learning and Deep Learning Techniques Manoj A. Patil and M. Manohar
Abstract Leaf diseases cause direct crop losses in agriculture, and farmers cannot detect the disease early. If the diseases are not detected early and correctly, the farmer must undergo huge losses. It may lead to the wrong pesticide or over pesticide, directly affect crop productivity and economy, and indirectly affect human health. Sensitive crops have various leaf diseases, and early prediction of these diseases remains challenging. This paper reviews several machine learning (ML) and deep learning (DL) methods used for different crop disease segmentation and classification. In the last few years, computer vision and DL techniques have made tremendous progress in object detection and image classification. The study summaries the available research on different diseases on various crops based on machine learning (ML) and deep learning (DL) techniques. It also discusses the data sets used for research and the accuracy and performance of existing methods. It does mean that the methods and available data sets presented in this paper are not projected to replace published solutions for crop disease identification, perhaps to enhance them by finding the possible gaps. Seventy-five articles are analysed and reviewed to find essential issues that involve additional study for future research in this domain to promote continuous progress for data sets, methods, and techniques. It mainly focuses on image segmentation and classification techniques used to solve agricultural problems. Finally, this paper provides future research scope and challenges, limitations, and research gaps. Keywords Crop disease · Leaf image · ML · DL · Segmentation · Classification
M. A. Patil (B) · M. Manohar Department of Computer Science and Engineering, School of Engineering and Technology, Christ (Deemed to be University), Bangalore 560074, Karnataka, India e-mail: [email protected] M. Manohar e-mail: [email protected] M. A. Patil Department of Information Technology, Vasavi College of Engineering, Hyderabad 500032, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_36
481
482
M. A. Patil and M. Manohar
1 Introduction Agriculture is one of the largest sectors of the economy throughout the globe, and it is the most country’s primary source of revenue and income. A low level of productivity in the agriculture sector has resulted in its employment share decreasing day by day. To uplift this sector, many countries have taken initiatives and recent technologies such as web applications, mobile applications, radio, and call centres are helping farmers, but still, farmers are facing challenges in forecasting crop diseases early [1]. Most researchers are working on the agricultural-based problem, and recent advancement in technologies such as computer vision, machine learning, and deep learning have made it an easy way for disease prediction in the agriculture and healthcare sector. Crop diseases are a common factor that can cause sensitive crops’ efficiency, quality, and productivity. Worldwide, most of the farmers cultivate sensitive crops like grapes, ridge gourds, bitter gourd, bottle gourd, chilli, potato, tomatoes, etc. However, the productivity is not satisfactory due to a lack of efficiency to predict the diseases in the early stage. Crop disease prediction is a very crucial factor in sustainable agriculture [2]. The symptoms of the diseases appear in different parts of a crop, such as a leaf, stem, flower, and root but crop leaves are highly prone to diseases. Most researchers are working on image processing to identify diseases with the support of computer vision, machine learning, and deep learning methods [3, 4]. Therefore, this literature paper reviews several recent ML and DL methods, advanced image identification techniques, and updated applications in the field of agriculture. The generic architecture of image processing using computer vision techniques is shown in Fig. 1. The architecture consists of all steps required to identify the crop leaf diseases such as data collection, pre-processing, segmentation, feature extraction, classification, and disease identification. This paper mainly focuses on crop diseases, methods, technologies, and data sets used for crop disease identification. Different ML methods have been used to identify diseases in agriculture and medical problems. Advancements in DL algorithms like deep neural networks, transfer learning, ResNet, VGG, GoogleNet, and PolyNet etc. have better performance in medical and agriculture problems. Initially, we were searched 321 research papers and downloaded, and then, we have done abstract screening and shortlisted 132 related papers. Thereafter through detail study, 75 papers were selected for systematic literature survey and are presented in the tabular by considering image segmentation and classification techniques. In this survey paper, a systematic literature review helped to identify, select, and critically appraise research, challenges, gaps, and limitations. We have provided the source of references to explore further research by considering advancements in ML, DL, and computer vision techniques.
A Comprehensive Review on Crop Disease Prediction …
483
Fig. 1 Generic architecture of crop disease classification
1.1 Types of Sensitive Crop Diseases The biology of crop diseases is derived from living organisms [5]. Bacteria, fungi, and viruses are the primary problems of various biological disorders. Unpredicted environmental changes may lead to attack virus diseases on the crops, and fungus disease appears due to the nearness of bacteria and fungus on the leaves. Abiotic, in contrast, is formed by inanimate ecological conditions like spring frosts, hail, burning of chemicals, and weather conditions. Abiotic diseases are non-communicable, low dangerous, and often preventable. Common leaf diseases include early blight, late blight, leaf spot, leaf curl, fever, rust, mosaic virus, downy mildew, and anthracnose. The leaves of the sample diseases are given in Fig. 2.
2 Literature Survey Many researchers have developed crop disease classification methods. Among them, 75 papers have been reviewed and analysed. The analysis consists of three sections. The first section, we have referred 24 papers based on ML method for classification
484
M. A. Patil and M. Manohar
Fig. 2 Types of leaf diseases a early blight, b downy mildew, c rust, d powdery mildew, e mosaic virus, f leaf spot, g late blight, h leaf curl, i anthracnose and fire blight
of crop diseases, the second section contains 35 papers based on DL method for crop disease classifications, and the third section consists of 16 papers for crop leaf disease segmentation.
2.1 Crop Disease Classification Using ML Techniques Crop production is severely affected by a variety of crop diseases. We can improve the quality and productivity of the agriculture crops by accurately and timely detecting crop diseases. Traditional approaches to diagnosis and classification involve more time, hard work, and constant monitoring of the farm. Crop diseases affected by viruses, bacteria, and fungi are often avoided by using analytic methods. Crop protection plays an important role in maintaining agricultural products. ML techniques are often utilized to detect affected leaf images. In this section, we have considered 24 papers and discussed the different ML methods used to determine whether a crop is affected or not. Tables 1 and 2 illustrate the survey about crop disease classification using an ML algorithm. In [1], support vector machine (SVM) model was used for rice crop disease classification. They considered four classes for classification, namely healthy leaves, rice blast, sheath blight, and bacterial leaf blight. The colour space model features help for the classification process. They used 14 different colour spaces and separated four features from each colour channel leading to 172 features. Besides, in [6], different types of mango crop disease classification using hybrid neural SVM were developed. Here, they used five classes for classification such as Dag disease, Golmachi, Morcha disease, and shutmold. Each class used twenty images for training and testing. So, they obtained a low-performance output. To improve the performance, they have to improve the training data. In [7], herbal
A Comprehensive Review on Crop Disease Prediction …
485
Table 1 ML-based crop disease classification related works [1, 6–15] References
Crop name
Diseases
Data set
Images (number)
[1]
Rice plant
Rice blast, bacterial leaf blight, sheath blight, and healthy leave
Collected from 619 the real agriculture field
Support vector 94.65 machine
[6]
Mango plant
Golmachi, Dag disease, Moricha disease, and shutmold
Collected images
100
ANN and SVM
80
[7]
Herbs plants
Healthy and diseased leaf
Collected images
1000
Hybrid SVM
99
[8]
Rice plant
Leaf blight, leaf blast, false smut, leaf streak, and brown spot, Aduthurai
Research center at Thanjavur district
500
SVM
98.63
[9]
Graph plant
Healthy, black rot, esca and leaf blight
Images 400 collected from the internet
Fractional97.34 order Zernike moments and SVM classifier
[10]
Graph plant
Grape esca, leaf blight, healthy grape, and grape black rot
PlantVillage data set
ABC+SVM
91.89
[11]
Medicinal plant
Twelve type of Collected from 240 plants Internet
SVM
93.3
[12]
Different set of Healthy leaf, PlantVillage images Alternaria data set alternata, bacteria blight, and cercospora leaf spot
400
Back 97.38 propagation neural network
[13]
Apple plant
Healthy and black rot disease
Collected images
500
K-nearest neighbour classifier
96.4
[14]
A different set of plants
Healthy or diseased
Collected images
–
Fuzzy-based function network and firefly algorithm
80.18
[15]
Graph plant dark lesion spots
Early, middle, large, advanced
PlantVillage
3423
Fuzzy interference system
95.3
350
Methodology
Accuracy (%)
486
M. A. Patil and M. Manohar
Table 2 ML-based crop disease classification related works [5, 16–25] References
Crop name
Diseases
Data set
Images (number)
Methodology
Accuracy (%)
[16]
Pepper, tomato, and potato
Normal or diseased leaf
PlantVillage data set
250
Kuan filtered 92 Hough transformationbased reweighted linear programme boost classification
[17]
Banana plant
Panama wilt, leaf spot, anthracnose, cigar end tip root, crown root, and virus disease
Images are – collected from different district
Adaptive neuro-fuzzy inference system
[18]
Herb plant
Healthy or unhealthy leaf
Collected images
100
Artificial 99 neural network
[19]
Tomato, Late blight, PlantVillage brinjal, mango common rust, leaf curl, cedar apple rust, early flight, and leaf spot
270
Bacterial foraging optimization based (BRBFNN)
[5]
Mango plant
Anthracnose, healthy, gall midge and powdery mildew
Images 450 collected from Vietnam
Artificial 90 neural network
[20]
Maize plant
Rust, and northern leaf blight
PlantVillage
200
SVM
85
[21]
Chilli
–
PlantVillage
2800
Histogram gradient boosting
89.11
[22]
Tomato
Various diseases
Primary data set
1000
Improved K-means clustering
90
[23]
Rice plant
Rice blast disease
Collected images
300
Artificial 90 neural network
[24]
Paddy
Blast and Brown Spot diseases
Collected images
330
KNN classifier 76.59
[25]
Grape
Powdery PlantVillage mildew, downy mildew, and black rot
–
One class support vector machines classifier
97.5
86.21
95.5
A Comprehensive Review on Crop Disease Prediction …
487
crop diseases are classified using different classifiers. This technique was given the high precision in species classification and early disease discovery, however, tragically, there was an expansion in computational time in the testing stage. Moreover, in [8], Thanjavur district rice crop disease classification has been discussed. In this paper, three types of features are extracted, namely SIFT, discrete wavelet transform, and greyscale co-occurrence matrix which was used for the classification process. The extracted attributes were given to the different classifiers to classify different diseases of rice crop leaves. In [9], fractional-order Zernike moments and SVM classifier-based graph crop disease have been classified. For analysis, 400 images were used and they obtained a maximum accuracy of 97.34%. In [10], an artificial bee colony (ABC) optimization algorithm with SVM classifier-based grape leaves disease detection had been discussed. Here, for feature selection, the ABC algorithm was introduced. The Foliar disease detection of grapes has been achieved by using an SVM classifier. This method reached the maximum precision of 91.89% which was high compared to the other methods. Medicinal crops are very much important for Ayurveda medicine. These crops are heavily affected by different diseases. The early detection prevents the crops from losing. Therefore, in [11], they had explained, medicinal plant disease classification. To achieve this concept, an SVM classifier was presented. Here, they analysed twelve types of different diseases. In [12], backpropagation neural network-based leaves disease classification has been explained. Here, initially, the damaged portion of the image was segmented using Otsu’s thresholding technique. Then, the GLCM feature was extracted from each segmented part. Finally, the extracted features are given to the classifier to classify an image as normal or diseased. The multiple disease portion of apple plant leaves classification is explained in [13]. To achieve this concept, K-nearest neighbour classifier was presented. Here, initially, the collected image has been enhanced and segmented. Then, the features of each segmented part have been extracted. Finally, depending on the values of the features, the classification has been done which classifies the various varieties of diseases presented in the segmented part. Fuzzy-based function network-based automatic plant leaf disease classification has been explained in [14]. Here, initially, the images are enhanced, and then SIFT-based attributes are extracted from each image. The fuzzy-based function network has been enhanced by using the firefly algorithm. In [15], graph plant disease classification using a fuzzy interference system had been discussed. Here, four types of disease had been analysed, namely dark lesion spots (early, middle, large, and advance). For experimental analysis, the PlantVillage data set was used, and for analysis, 3423 images are collected. In [16], crop disease classification technique of Kuan filtered Hough transformationbased reweighted linear programme boost classification was explained. Here, they considered three types of input images such as pepper, tomato, and potato. For experimental analysis, 250 images were used which are taken from the PlantVillage data set. This method is applicable for a small number of images and the accuracy of classification should be improved. Banana leaf disease classification based on Adaptive Neuro-Fuzzy Inference System (ANFIS) and case-based reasoning had been explained in [17]. In this process, colour, shape, and texture features were extracted from each image. After feature extraction, the classification was done using ANFIS
488
M. A. Patil and M. Manohar
and case-based reasoning. This method was given a maximum accuracy of 97.5%. An efficient herb plant disease classification using ANN had been explained in [18]. The classification was made using colour and shape features. Here, they used only 100 images for experimental analysis. This method should increase the training and testing images quantity. In [19], plant disease classification based on Radial Basis Function Neural Network (BRBFNN) model was discussed. They focused on fungal diseases such as cedar apple rust, late blight, common rust, early blight, leaf curl, and leaf spot. Using this method, they attained the maximum accuracy of 86.21%. Moreover, feedforward neural network with the adaptive particle—Grey Wolf Optimization (APGWO) algorithm for mango leaves early disease classification had been explained in [5]. The experimental analysis images were collected from Vietnam. They analysed four types of diseased images, namely anthracnose, gall midge, healthy, and powdery mildew. Maize plant disease classification using ML algorithm had been explained in [20]. Here, for analysis, two classifiers were used such as KNN and SVM. Based on the SVM, they obtained the maximum accuracy of 84%, and using KNN, 85% was attained. The chilli is an important vegetable for cooking. This chilli is also affected by many diseases which will affect the quantity of product. To avoid the loss, early detection is recommended. So, in [21], a histogram gradient boosting algorithm-based chilli leaf disease classification was presented. Here, they used 2800 images for experimental analysis and they attained the maximum accuracy of 89.11%. Tomato leaf disease classification based on improved k-means clustering was explained in [22]. Here, they attained the maximum accuracy of 90% for 1000 images. Moreover, rice plant leaves disease classification was explained in [23]. To achieve this concept, the artificial neural network was presented. Here, they used 300 images for the training and testing process. Besides, blast and brown spot diseases on paddy plants were classified using KNN as discussed in [24]. Here, they collect the 330 images from the agricultural land. After analysis, they attained an accuracy of 75.59%. This attained accuracy is very much low and in future, and we will try to increase the accuracy. Moreover, in [25], grape disease classification using a support vector machine had been developed. This method gives a maximum accuracy of 95.5% and this accuracy should be improved.
2.2 Crop Disease Classification Using DL Techniques Visualization of crop disease systematically examined the problem of crop pathogenesis. Difficulties and challenges of crop disease identification and diagnosis systematically examined. Tables 3 and 4 depict the 35 existing works on crop disease classification using DL techniques. Created the larger-scale crop disease data set by considering the 271 crop diseases along with 220,592 images [26]. A lightweight CNN model is utilized to identify various diseases in tomato crops [27]. It does not use complex pre-trained methods, which may contain a huge number of hidden layers and parameters. They have used PlantVillage data along with an additional data set.
A Comprehensive Review on Crop Disease Prediction …
489
Table 3 DL-based crop disease classification related works [26–39] References
Crop name
Diseases
Data set
Images (number)
Methodology
Accuracy (%)
[26]
Cassava, apple, vegetables, corn, and grape
Bacteria, fungal, virus
PlantVillage
13,000
ResNet152
97.4
[27]
Tomato
Nine classes
PlantVillage
18,160
VGG16
98.4
[28]
Banana
Healthy, black sigatoka, and black speckle
PlantVillage
623
CNN
90.3
[29]
Tomato
Seven classes
PlantVillage
1500
Deep convolutional generative adversarial networks
94.33
[30]
Tomato
Early blight, bacterial spot, healthy, leaf mold, late blight, mosaic virus
PlantVillage
22,925
SE-ResNet50
96.81
[31]
Tomato
Five diseases
Primary data set is created
8616
ABCK-BWTR 89 and B-ARNet
[32]
Tomato
Six diseases
Laboratorybased data set and self-collected data from the field
15,216
VGG-19, VGG-16, ResNet, Inception V3
93.70
[33]
Potato
Early blight and late blight diseases
PlantVillage
12,912
CNN and Softmax
96.39
[34]
Potato
Early blight and late blight diseases
PlantVillage
2152
YOLOv5 for 92.75 segmentation and PDDCNN for classification
[35]
Potato
Early blight, PlantVillage late blight, and healthy
55,000
VGG19
91.8
[36]
Grape
Six diseases
Primary data set created
107,366
DICNN
97.22
[37]
Grape
Four diseases
Primary data set created
4449
DR-IACNN
97.22
[38]
Maize
Five diseases
PlantVillage data set
3852
Modified LeNet
97.89
[39]
Grape
Esca
PlantVillage
87,848
VGG13
99.5
490
M. A. Patil and M. Manohar
Table 4 DL-based crop disease classification related works [3, 4, 40–59] References
Crop name
Diseases
Data set
Images (number)
Methodology
Accuracy (%)
[40]
Tomato
15 diseases
Primary data set created
15,000
Yolo V3 algorithm
92.39
[41]
Tomato, apple, – cherry, grape, corn, peach
PlantVillage
5507
Transfer learning
90
[42]
Tomato
9 diseases
Primary data set
10,696
YOLOv3
91.81
[43]
Grape
Black rot, esca PlantVillage measles, leaf spot
8124
Generative adversarial network (GAN)
98.70
[44]
Tomato
Blight, ToMV, powdery mildew, leaf mold fungus
4,178
R-CNN
95.83
[45]
Tomato
12 diseases
PlantVillage
16,004
MobileNetV3
99.81
[46]
Grape
5 disease
PlantVillage
9027
Hybrid CNN
98.7
[47]
Multiple crops Multiple diseases
Primary data set
58,725
MDFCResNet
93.96
[48]
Tomato
Nine diseases
PlantVillage
13,262
AlexNet and VGG16 net
91.3
[49]
Apple
Seven diseases AIChallenger
2462
DenseNet-121
93.71
[50]
Cucumber
Powdery mildew
Primary data set
50
U-Net
96.08
[51]
Tomato
14 diseases
Primary data set
8927
Metaarchitecture
83
[52]
Tomato
12 diseases
PlantVillage
6000
PCA-WOA
94
[53]
Maize
4 diseases
PlantVillage
3852
Modified LeNet
97.89
[54]
Grape
3 diseases
PlantVillage
4063
AlexNet
99.23
[55]
Tomato
Grey leaf spot disease
Primary data set
2385
MobileNetv2YOLOv3
86.98
[56]
Tomato
12 diseases
PlantVillage
14,828
CNN
99.18
[57]
cash crop
22 different diseases
PlantVillage
38,072
MobileNet
84.83
[58]
Grape
Black rot, esca, leaf blight
PlantVillage
4062
DCNN
99.34
[3]
Tomato
Different classes
Collected images
5000
Faster R-CNN
MAP = 0.8306
[4]
Tomato
Different classes
Collected images
8927
Faster R-CNN with VGG16
MAP = 96
[59]
14 plant species
60 diseases
Primary data set
46,409
GoogLeNet
94
AIChallenger
A Comprehensive Review on Crop Disease Prediction …
491
In [28], deep learning algorithm-based banana disease was classified. This method was given 90.3% accuracy, and this method used a less images for the training and testing process. Augmentation and segmentation techniques were utilized to enhance the precision of tomato crop leaf disease identification. A model built from generated images supplemented with deep convolutional generative adversarial networks (DCGAN) [29]. To predict the tomato leaf disease, a DL technique along with an attention mechanism and PlantVillage data set was used [30]. ABCK-BWTR and a B-ARNet structure to detect tomato leaf diseases. The BWTR algorithm enhances the image quality, and tomato crop leaf images are separated from the background using KSW and ABCK [31]. Tomato plant leaf disease accuracy was examined by using four models of VGG-19, VGG-16, ResNet, and Inception V3, and tested with two divergent data sets [32]. An improved effective CNN, transfer learning algorithm is used to identify potato crop diseases, ADAM is used for optimization, and cross-entropy is used for model analysis [33–35]. Many authors made remarkable efforts to recognize grape plant diseases by continuously improving and developing effective and efficient algorithms like DICNN, Inception-ResNet-v2, DR-IACNN, INSE-ResNet, VGG, DenseNet169, AlexNet, NASNetLarge [36, 37, 39]. CNN and modified LeNet protocol-based maize leaf disease detection works efficiently when packet loss chances are more even 50% of image quality lost and manage to detect leaf diseases effectively [38]. To recognize tomato plant leaf disease and insects, they used an enhanced Yolo V3 algorithm. An enhanced Yolo V3 algorithm was compared with the original Yolo V3 Faster R-CNN and SSD [40]. To detect the crop leaf diseases, authors proposed semi-supervised few short learning methods by using the public PlantVillage data set. To identify the unlabelled samples, they used confidence interval [41]. Reviewed the papers 108 on crop diseases and pesticide identification by comparing with existing crop diseases and pesticide identification methods. It provides the research guidelines on crop diseases and pesticide recognition based on DNN methods by following the features of segmentation, classification, and identification [42]. YOLOv3 algorithm finds a better detection network structure, which obtains early detection of tomato leaf diseases. This methodology can significantly enhance the identification of leaf objects and crop leaf occlusion, and it may achieve a better detection effect by various background conditions. Leaf GAN novel model to increase grapevine image data set and overcome the problem of over-fitting in DL. The GAN-based hybrid data set has 8,124 images of grapevine leaves which can expand the data set [43]. To cluster, the data sets used the k-means clustering algorithm and to extract the features of tomato leaf diseases used a residual network [44]. The work assessed various DL methods based on ImageNet databases like MobileNetV3, ResNet50, MobileNetV1, InceptionV3, MobileNetV2, and AlexNet. The results proved that MobileNetV3 achieved greater accuracy than other models [45]. The work is extracting the features from images by pre-processing the digital crop images based on the category of leaf disease. To predict the leaf diseases they used the Hybrid CNN model using the PlantVillage database [46]. Aims to incorporate DL and IoT technologies to develop an IoT agricultural system for crop disease prediction [47]. AlexNet and VGG16 net DL methods are used to classify and detect tomato crop leaf diseases. They have used the PlantVillage data set for exper-
492
M. A. Patil and M. Manohar
imental purposes [48]. Predicted the apple leaf diseases using DenseNet-121 DL and proposed three different techniques, namely methods of the focus loss function, regression, and multi-label classification [49]. This research aims to address the issue of classifying powdery mildew on leaves using visual images. To solve this issue, they have developed a CNN model based on the U-Net framework, which was used for semantic division tasks [50]. Tomato plant disease and pesticide identification used two effective techniques, namely a deep meta-architecture and filter bank, which works on challenging and difficult scenarios [51]. To classify the tomato leaf diseases, the optimization-based PCA whale DL method was used. The public PlantVillage data set was utilized for experimental purposes. To reduce the dimensionality, the onehot encoding method is applied to the data set [52]. The deep convolutional network was used to identify fourteen plant species and twenty-six plant diseases based on a public PlantVillage data set [53]. The AlexNet model is used to pre-train the model using 4063 images and achieved the classification accuracy of 97.60%. They applied the MSVM model to extract the features from the various neural network layers to analyse the performance [54]. MobileNetv2 YOLOv3 model is used to maintain a balance between the real-time tomato crop disease identification and classification accuracy for grey leaves spots. MobileNetv2-YOLOv3 method uses the backbone of the MobileNetv2 network, which assists the migration of the mobile terminal [55]. AlexNet model is used to pre-train the model using 4063image along with nine classes and achieved the classification accuracy of 97.60%. They applied the MSVM model to extract the features from the various layers of the neural networks to analyse the performance [56]. DL-based specific lesions and spots were considered for the research rather than the entire leaf image because every region contains its parameters and this technique can increase data variability without the additional images [57]. To overcome the problem of image background information by retaining leaf image disease spot, implemented an intelligent image segmentation algorithm which is based on the GrabCut model. To extend the data set, multiple sources of data are combined like public available data sets and real-time field images. To train, transfer learning used five DL models Xception, VGG19, ResNet50, VGG16, and Inception V3. They used RGB grape crop leaf images to classify and recognize diseases based on Deep CNN [58]. In [3, 4], leaf disease classification was done based on a faster RCNN-based classifier. In this review paper, we explore the latest advances in in-depth learning, transmission learning, and the use of advanced agronomic image recognition technology. Analysing and comparing these two approaches reveal that existing agricultural disease data sources make transfer learning more effective. Throughout the article, the author explores important issues that need further research in this area, including the creation of image databases, the selection of large data-related domains, and the development of a transfer learning system. Most of the researchers and software industries are using deep learning models to solve the computer vision problems [45]. The concept of transfer learning is to get past the isolated learning paradigm and use the knowledge learned to address one problem to solve others that are similar. The transfer models ResNet50, InceptionV3, AlexNet, and the three versions of MobileNet were trained on the PlantVillage data set. These models trained by using different optimization techniques such as SGD, Adam,
A Comprehensive Review on Crop Disease Prediction …
493
RMSProp, and Adagrad. The data set is split into 80% training and 20% testing. The data set contains ten classes including healthy class. MobileNetV3 achieved best accuracy of 99.81 by using Adagrad optimizer and loss value is noted 0.0088.
2.3 Crop Leaf Disease Segmentation Techniques Image segmentation process assumes an important part to identify decease affected area on crop leaf images. The crop disease image segmentation has always been an essential research domain in the field of plant leaf image processing. Many researchers proposed crop leaf disease segmentation models and we have referred and analysed 16 papers. Table 5 gives the existing work on crop leaf disease segmentation techniques. In [60], the histogram intensity segmentation technique has been presented to segment the leaf blight disease on cotton and solanum nigrum leaves. Based on the threshold value, the segmentation was carried out. This method was not applicable for other complex segmentation applications. In [61], crop leaf disease image segmentation was carried out based on a hybrid clustering algorithm. In this concept, initially, the full colour leaf image was subdivided into several smaller and more uniform super-pixels by super-pixel clustering, which will provide needful clustering values to direct the image segment to speed up the integration process for expectation-maximization (EM) algorithm. The spiral pixels are rapidly and precisely separated from each super-pixel by the EM algorithm. In [62], grape disease leaf image segmentation used to detect the disease area through the adaptive snake algorithm (ASA) model. This method consists of two segmentation models, namely absolute segmentation and common segmentation. By using common segmentation, they achieved better segmentation, and through absolute segmentation, they achieved greater accuracy. Furthermore, to evaluate the performance of this approach, two data sets were used, namely the Plant Level and PlantVillage data set. The leaf disease segmentation process was affected by uneven illumination and a cluttered background. To avoid the problem, a complete colour feature with a multiple resolution network and area growing-based image segmentation approach is proposed. In [2], colour spaces such as the H (hue) module used for the colour-to-grey transition weighted multi-resolution channel. The regional growth method was used to solve irregular background problems by selectively interacting seeds growing under physical field conditions. Using accuracy, the performance measurement was calculated and achieved accuracy of 94%. Similarly, in [63], banana leaf disease segmentation was presented. Here, they analysed various segmentation models, namely zero-crossing methods, K-means, global thresholding, multi-thresholding, Sobel, Prewitt, region growing, Robert, colour segmentation, log, fuzzy C-means, adaptive thresholding, geodesic, and canny. From the analysis, they got a good output on the geodesic method. Nutrient deficiency is an important issue for crops. Segmenting the nutrient deficiency part of the leaf is a challenging task. Less number of research only concentrates on nutrient deficiency problems. In [64], proposed the fuzzy C-means clustering algorithm for segmentation of nutrient deficiency in incomplete crop images using an
494
M. A. Patil and M. Manohar
intuitionistic. In this method, the segmentation was performed based on the missing pixels. This method effectively extracted the appropriate deficiency region from the image. This method was not able to predict the type of deficiency. In [65], RBF neural network-based anthracnose disease on mango trees was segmented. For the segmentation process, the scale-invariant feature transform (SIFT) feature was extracted from each captured image, and by using a bacterial foraging optimization algorithm, most similar features were extracted. Although this method gives a good result, the accuracy should be improved and the segmentation should be focused. A total of 20 training data with 13 features each was used in their study. They need to add more training data to improve performance. Brinjal leaves disease segmentation and detection using k-means clustering and ANN has been explained in [66]. The brinjal leaf diseases such as tobacco mosaic virus, cercospora leaf spot, and bacterial wilt were considered for analysis. Here, the k-means algorithm was used for segmentation, and ANN was used for the classification process. For analysis, different types of parameters such as area, perimeter, centroid, diameter, and mean intensity were considered. The chilli is an important vegetable for cooking. This chilli is also affected by many diseases which will affect the quantity of product. To avoid the loss, early detection is recommended. So, in [67], chilli crop disease detection and segmentation were presented. To achieve this concept, they introduced k-means clustering and an SVM classifier. The k-means algorithm segments the damaged portion and SVM classify healthy and unhealthy images. They used only 200 images for experimental analysis. Nowadays, deep learning algorithms are great attention in the image segmentation process. In [68], the crop damage part was segmented using deep learning models. In this paper, particularly, four plants are analysed, namely apple, grape, potato, and strawberry. Here, initially, they performed the classification process. Finally, using 1D, CNN-based segmentation was performed. During the testing process, they attained the maximum accuracy of 97%. In [69], the dimension reduction approach, namely singular value decomposition was introduced to reduce the complexity during the prediction process. Besides, during the segmentation process, an artificial bee colony-based fuzzy C-means (ABC-FCM) algorithm was presented, in which, the ABC algorithm was used to improve the performance of the FCM algorithm. Similarly, in [70], the PSO algorithm with FCM was used for crop disease segmentation. In [71], pixel-wise mask-region-based CNN was used for the segmentation process. This method maximizes the segmentation accuracy and computation speed by adjusting the RPN network. The RPN structure was the works backbone of the classifier. Similarly, Spatial Pyramid-Oriented Encoder-Decoder Cascade Convolution Neural Network (SPEDCCNN) for crop disease leaf segmentation was presented in [72]. To validate the proposed segmentation methods, they were tested and compared with other different segmentation methods in various scenes. In [73, 74], spot leaf disease on maize crops was analysed. To achieve this objective, a genetic algorithm and maximum entropy were developed. For experimental analysis, they collected the set of images from different fields. In the above literature, a total of fifteen articles are reviewed and each article has different algorithms used. From the analysis, its clear deep learning algorithm-based segmentation attained better accuracy compared to other methods.
A Comprehensive Review on Crop Disease Prediction …
495
Using the modified sunflower optimization (MSFO) algorithm, ERBFNN method is improved [75]. The contrast-limited adaptive histogram equalisation (CLAHE), which is based on contrast enhancement and unsharp masking, the noise in the images is first reduced with a Gaussian filter. In order to segment the disease-related section of the input image, colour characteristics are then collected from each leaf image and provided to the segmentation step. The proposed ERBFNN approach is evaluated using a variety of metrics, including accuracy, Jaccard coefficient (JC), Dice’s coefficient (DC), precision, recall, F-Measure, sensitivity, specificity, and mean intersection over union (MIoU), and its performance is compared to current state-of-the-art techniques such as radial basis function (RBF), fuzzy C-means (FCM), and region growing (RG). According on the findings of the experiments, the suggested ERBFNN segmentation model performed better than existing state-of-theart approaches including RBFNN, FCM, and RG, as well as prior research. According to the experimental findings, the suggested segmentation model has achieved the improved performance metrics of 98.92% accuracy, 96.83% JC, 94.58% DC, 97.93% precision, 97.89 recall, 97.91 F-Measure, 97.89 sensitivity, 97.96% specificity, and 95.88% MIoU.
3 Crop Disease Prediction Challenges and Limitations From the above literature study, current farming methods to predict the crop diseases are ground breaking. Sensitive crops are having various leaf diseases and early predicting these diseases are challenging. The symptoms of sensitive crop diseases appear in different parts of a plant such as leaf, stem, flower, and root. Most of the researches considered front side leaf images for research due to highly exposed to diseases. This affects the growth of the plant, which in turn affects the ecology of the farmer.
3.1 Challenges We have listed following main challenges through this literature review, • There is a lack of study on early detection of sensitive crop leaf diseases. • There is a lack of efficiency for leaf image segmentation and classification methods. • Methods were not clearly established; no satisfactory method for determining the appropriate early leaf diseases. • Existing methods were not efficient to find the probabilities of the leaf having various diseases so that the class with a high probability value is chosen as a predicted disease.
496
M. A. Patil and M. Manohar
Table 5 Crop disease leaf image segmentation related works [2, 60–72, 75] References
Crop name
Diseases
Data set
Images (number)
[60]
Cotton and Solanum nigrum leaf
Leaf blight
Arkansas – Plant Diseases Database and NARO
Indices based 95.64 intensity histogram segmentation
[62]
Grape leaf
Different disease
PlantVillage data set
233
Adaptive snake model
96.05
[2]
Different images
Disease spot
APS image data set
520
Region growing
96.28
[64]
Multiple plants
Yellowcoloured leaves
–
–
Fuzzy C-means
96.5
[65]
Mango tree
Anthracnose disease
Primary data set
530
RBF neural network
91.15
[67]
Chilli
–
Primary data set
120
K-means SVM
90.9
[68]
Apple, grape, Disease spot, potato, and leaf blight strawberry
PlantVillage
–
1D 97.5 convolutional neural network
[69]
Different plants
Multiple diseases
PlantVillage
54,306
Fuzzy C-means
86.53
[70]
Apple, tomato
Rust, cedar, leaf blight
PlantVillage
54,306
PSO with fuzzy C-means
85.5
[71]
Mango, Healthy and pomegranate, diseased Pongamia pinnata
Leaf images data set
4503
Mask R-CNN 87.12
[72]
Corn, wheat, cucumber, cotton
Different diseases
Arkansas 900 Plant Diseases Database and NARO
Cascade convolution neural network
95
[75]
Tomato
10 diseases
PlantVillage
ERBFNN
98.92
10,000
Methodology Accuracy (%)
3.2 Limitations • Lack of sufficiently large data sets: To train DNN for plant disease diagnosis, large, well-annotated data sets with a wide range of variations are essential. Currently, the PlantVillage database and the Crop Disease Symptoms Database (PDDB) are the only data sets available in public domain. As long as there is enough data, DNN techniques are suitable for crop diagnostic tasks. However, the collection of
A Comprehensive Review on Crop Disease Prediction …
•
•
•
•
497
data from the field is difficult, expensive and domain experts are needed for more accurate annotation by using effective tools. Image augmentation: The effectiveness of the system can be considerably improved using data augmentation [43, 44, 59]. In [43], proposed GAN’s method to enhance training images. This work creates a leaf image on a uniform background and this is not possible for complex backgrounds and different lighting conditions. As another way to enhance the data, Barbedo [59] suggested a differential diagnosis based on individual points and lesions. [59] shows that the sampling of individual points on the leaves resulted in a 12% authentication accuracy in part due to the enormous expansion of the data set size. Appropriate techniques for automatically carving leaf images around the affected areas have not yet been developed. Anomalies recognition with visually similar symptoms: In this case, the symptoms presented by the different contraindications are very similar in appearance, it is not possible to distinguish them [59]. Highly accurate predictions can be made using a variety of sources, including geographical location, weather trends, crop growth conditions, and historical pest and disease data [67]. At present, no work with this complementary source is reported in the literature on the diagnostic process. Recognition of diseases in other parts of crops: Researchers have tried to identify diseases in the upper part of the leaf. Not much attention is paid to the diagnosis of disease in stems, fruits, and flowers. Faster R-CNN networks were proposed by Fuentes et al. [3, 4] is a method of diagnosing many diseases in different parts of the tomato crop. Although very accurate, these techniques require rigorous labelling and database annotation. Improvements of compact DL models for leaf disease classification: Many DLbased solutions are used in existing research, namely convolutional neural network (CNN )-based frameworks and RNN-based frameworks. CNN-based models are GoogleNet, AlexNet, VGG, InceptionV3, and ResNet, etc. and recurrent neural network (RNN)-based hybrid frameworks are long-short term memory (LSTM) and gated recurrent unit (GRU). There are not enough large data sets to train new custom CNN models to identify crop diseases. Transfer learning techniques provide a solution to the problem of adequate training data on a known architecture. For crop diagnostic work, these well-known CNN samples are complex and computationally complex.
4 Discussion In this section, analysis and discussion of all the articles taken for the above survey are deliberated. In this section, existing works’ uniqueness is analysed. The analysis is based on three different categories, namely different crops, algorithms, and data sets. In Fig. 3a, we have analysed crop-wise research articles. We have referred seventyfive articles and analysed based on different types of crops. In Fig. 3a, we can observe 18 articles are tomato crop disease classifications, 2 chilli, 5 paddy, 3 potato, 12 articles are grape crop disease classifications, 12 are apple, 3 are mango, 5 articles
498
M. A. Patil and M. Manohar
Fig. 3 a Analysis based on different crops. b Analysis based on different algorithms. c Analysis based on different data sets
are maize crop disease classifications, 3 brinjal, 3 are cucumber, and 6 articles are worked on multiple crops. In Fig. 3b, we have analysed the research articles based on an algorithms used for research. Each article has a novelty based on different algorithms. This analysis helped to find out the best algorithm for the classification process. Mainly, two algorithms are used, namely the ML algorithm and DL. From the analysis, it is clear that ten research works are explained based on CNN, eight research works based on support vector machine, and six research works based on ANN, etc. In Fig. 3c, the data set used in the different manuscripts were analysed. Here, most of the article utilizes a PlantVillage data set and some researchers are collecting the images from the agricultural field directly. The PlantVillage data set consists of all types of crops and diseased leaves.
A Comprehensive Review on Crop Disease Prediction …
499
5 Conclusion A systematic literature review helped to identify, select, and critically appraises research articles and it mainly focuses on crop disease identification using different techniques, methods, and data sets. We have also discussed the research challenges, gaps, and limitations of current techniques and highlighted efficient methods for future implementations. In this survey, we have searched 321 research papers and downloaded thereafter-abstract screening 132 were related papers shortlisted, then through a detailed study, 75 papers were selected for a systematic literature survey and are presented in the tabular form. The above detailed survey, discussion, and analysis show that future research scope to enhance the productivity in agriculture sector by effective identification crop diseases in early stage. Future Research Scope This comprehensive study summarizes several works related to detection and classification of crop leaf diseases. However, few research points that can help improve the current situation. • Generally, a disease has certain stages, but according to the survey, most studies focus mainly on the classification. Therefore, it is significant to improve the image segmentation and classification accuracy. • Existing research focused on leaf disease identification but still there is scope to effectively identify the crop is healthy, has leaf disease, or has leaf rust by considering the following factors, – – – –
To recognizing infected leaves. To measures the affected area of the leaf. To determine the colour of the infected region. To detects the healthy region of plant leaves.
• Researchers were worked on identifying the diseases in early stage but lack of information on disease stage identification and quantification in future. • Existing research mostly focused targeted leaf diseases but failed to address the probabilities of the leaf having various diseases so that the class with a high probability value is chosen as a predicted disease. • Many researchers have focused on the classification of crop diseases, not nutritional deficiencies. Nutritional deficiency is difficult to diagnose. Therefore, it is important to focus on diagnosing nutritional deficiencies in crops in future. • To the best of our knowledge, no study has identified the disease in the real-world situation with acceptable accuracy. Conflict of Interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
500
M. A. Patil and M. Manohar
Acknowledgements This work is supported by the Department of CSE, School of Engineering and Technology, Christ (Deemed to be University), Kengeri Campus, Bangalore, India, that include non-financial support.
References 1. Shrivastava VK, Pradhan MK (2021) Rice plant disease classification using color features: a machine learning paradigm. J Plant Pathol 103(1):17–26 2. Jothiaruna N, Joseph Abraham Sundar K, Ifjaz Ahmed M (2021) A disease spot segmentation method using comprehensive color feature with multi-resolution channel and region growing. Multimed Tools Appl 80(3):3327–3335 3. Fuentes A, Yoon S, Kim SC, Park DS (2017) A robust deep-learning-based detector for realtime tomato plant diseases and pests recognition. Sensors 17(9):2022 4. Fuentes AF, Yoon S, Lee J, Park DS (2018) High-performance deep neural network-based tomato plant diseases and pests diagnosis system with refinement filter bank. Front Plant Sci 9:1162 5. Pham TN, Tran LV, Dao SVT (2020) Early disease classification of mango leaves using feedforward neural network and hybrid metaheuristic feature selection. IEEE Access 8:189960– 189973 6. Mia M, Roy S, Das SK, Rahman M (2020) Mango leaf disease recognition using neural network and support vector machine. Iran J Comput Sci 3(3):185–193 7. Mustafa MS, Husin Z, Tan WK, Mavi MF, Farook RSM (2020) Development of automated hybrid intelligent system for herbs plant classification and early herbs plant disease detection. Neural Comput Appl 32(15):11419–11441 8. Gayathri Devi T, Neelamegam P (2019) Image processing based rice plant leaves diseases in Thanjavur, Tamilnadu. Cluster Comput 22(6):13415–13428 9. Kaur P, Pannu HS, Malhi AK (2019) Plant disease recognition using fractional-order Zernike moments and SVM classifier. Neural Comput Appl 31(12):8749–8768 10. Andrushia AD, Trephena Patricia A (2020) Artificial bee colony optimization (ABC) for grape leaves disease detection. Evolv Syst 11(1):105–117 11. Kan HX, Jin L, Zhou FL (2017) Classification of medicinal plant leaf image based on multifeature extraction. Pattern Recogn Image Anal 27(3):581–587 12. Kumar PL, Vinay Kumar Goud K, Vasanth Kumar G, Shijin Kumar PS (2020) Enhanced weighted sum back propagation neural network for leaf disease classification. Proc Mater Today 13. Singh S, Gupta S, Tanta A, Gupta R (2022) Extraction of multiple diseases in apple leaf using machine learning. Int J Image Graph 22(03):2140009 14. Chouhan SS, Singh UP, Jain S (2021) Automated plant leaf disease detection and classification using fuzzy based function network. Wirel Pers Commun 121(3):1757–1779 15. Nagi R, Tripathy SS (2021) Severity estimation of grapevine diseases from leaf images using fuzzy inference system. Agric Res 1–11 16. Deepa NR, Nagarajan N (2021) Kuan noise filter with Hough transformation based reweighted linear program boost classification for plant leaf disease detection. J Ambient Intell Hum Comput 12(6):5979–5992 17. Athiraja A, Vijayakumar P (2021) Banana disease diagnosis using computer vision and machine learning methods. J Ambient Intell Hum Comput 12(6):6537–6556 18. Ishak S, Rahiman MHF (2015) Leaf disease classification using artificial neural network. J Tek 77(17) 19. Chouhan SS, Kaul A, Singh UP, Jain S (2018) Bacterial foraging optimization based radial basis function neural network (BRBFNN) for identification and classification of plant leaf diseases: an automatic approach towards plant pathology. IEEE Access 6:8852–8863
A Comprehensive Review on Crop Disease Prediction …
501
20. Deshapande AS, Giraddi SG, Karibasappa KG, Desai SD (2019) Fungal disease detection in maize leaves using Haar wavelet features. Information and communication technology for intelligent systems. Springer, Singapore, pp 275–286 21. Devi MB, Amarendra K (2021) Machine learning-based application to detect pepper leaf diseases using HistGradientBoosting classifier with fused HOG and LBP features. Smart technologies in data science and communication. Springer, Singapore, pp 359–369 22. Tian K, Li J, Zeng J, Evans A, Zhang L (2019) Segmentation of tomato leaf images based on adaptive clustering number of K-means algorithm. Comput Electron Agric 165:104962 23. Ramesh S, Vydeki D (2018) Rice blast disease detection and classification using machine learning algorithm. In: 2018 2nd international conference on micro-electronics and telecommunication engineering (ICMETE). IEEE, pp 255–259 24. Wani JA, Sharma S, Muzamil M, Ahmed S, Sharma S, Singh S (2021) Machine learning and deep learning based computational techniques in automatic agricultural diseases detection: methodologies, applications, and challenges. Arch Comput Methods Eng 1–37 25. Pantazi XE, Moshou D, Tamouridou AA (2019) Automated leaf disease detection in different crop species through image features analysis and One Class Classifiers. Comput Electron Agric 156:96–104 26. Li Y, Chao X (2021) Semi-supervised few-shot learning approach for plant diseases recognition. Plant Methods 17(1):1–10 27. Agarwal M, Gupta SK, Biswas KK (2020) Development of efficient CNN model for tomato crop disease identification. Sustain Comput: Inform Syst 28:100407 28. Amara J, Bouaziz B, Algergawy A (2017) A deep learning-based approach for banana leaf diseases classification. Datenbanksysteme für Business, Technologie und Web (BTW 2017)Workshopband 29. Wu Q, Chen Y, Meng J (2020) DCGAN-based data augmentation for tomato leaf disease identification. IEEE Access 8:98716–98728 30. Zhao S, Peng Y, Liu J, Wu S (2021) Tomato leaf disease diagnosis based on improved convolution neural network by attention module. Agriculture 11(7):651 31. Chen X, Zhou G, Chen A, Yi J, Zhang W, Hu Y (2020) Identification of tomato leaf diseases based on combination of ABCK-BWTR and B-ARNet. Comput Electron Agric 178:105730 32. Ahmad I, Hamid M, Yousaf S, Shah ST, Ahmad MO (2020) Optimizing pretrained convolutional neural networks for tomato leaf disease detection. Complexity 2020 33. Lee T-Y, Lin I-A, Yu J-Y, Yang J-m, Chang Y-C (2021) High efficiency disease detection for potato leaf with convolutional neural network. SN Comput Sci 2(4):1–11 34. Rashid J, Khan I, Ali G, Almotiri SH, AlGhamdi MA, Masood K (2021) Multi-level deep learning model for potato leaf disease recognition. Electronics 10(17):2064 35. Tiwari D, Ashish M, Gangwar N, Sharma A, Patel S, Bhardwaj S (2020) Potato leaf diseases detection using deep learning. In: 2020 4th International conference on intelligent computing and control systems (iciccs). IEEE, pp 461–466 36. Liu B, Ding Z, Tian L, He D, Li S, Wang H (2020) Grape leaf disease identification using improved deep convolutional neural networks. Front Plant Sci 11:1082 37. Xie X, Ma Y, Liu B, He J, Li S, Wang H (2020) A deep-learning-based real-time detector for grape leaf diseases using improved convolutional neural networks. Front Plant Sci 11:751 38. Ahila Priyadharshini R, Arivazhagan S, Arun M, Mirnalini A (2019) Maize leaf disease classification using deep convolutional neural networks. Neural Comput Appl 31(12):8887–8895 39. Boulent J, Foucher S, Théau J, St-Charles P-L (2019) Convolutional neural networks for the automatic identification of plant diseases. Front Plant Sci 10:941 40. Liu J, Wang X (2020) Tomato diseases and pests detection based on improved Yolo V3 convolutional neural network. Front Plant Sci 11:898 41. Argüeso D, Picon A, Irusta U, Medela A, San-Emeterio MG, Bereciartua A, Alvarez-Gila A (2020) Few-Shot Learning approach for plant disease classification using images taken in the field. Comput Electron Agric 175:105542 42. Wang X, Liu J, Zhu X (2021) Early real-time detection algorithm of tomato diseases and pests in the natural environment. Plant Methods 17(1):1–17
502
M. A. Patil and M. Manohar
43. Liu B, Tan C, Li S, He J, Wang H (2020) A data augmentation method based on generative adversarial networks for grape leaf disease identification. IEEE Access 8:102188–102198 44. Zhang Y, Song C, Zhang D (2020) Deep learning-based object detection improvement for tomato disease. IEEE Access 8:56607–56614 45. Tarek H, Aly H, Eisa S, Abul-Soud M (2022) Optimized deep learning algorithms for tomato leaf disease detection with hardware deployment. Electronics 11(1):140 46. Yuan Y, Chen L, Wu H, Li L (2021) Advanced agricultural disease image recognition technologies: a review. Info Process Agric 47. Hu WJ, Fan J, Du YX, Li BS, Xiong N, Bekkering E (2020) MDFC-ResNet: an agricultural IoT system to accurately recognize crop diseases. IEEE Access 8:115287–115298 48. Rangarajan AK, Purushothaman R, Ramesh A (2018) Tomato crop disease classification using pre-trained deep learning algorithm. Procedia Comput Sci 133:1040–1047 49. Zhong Y, Zhao M (2020) Research on deep learning in apple leaf disease recognition. Comput Electron Agric 168:105146. https://www.overleaf.com/project/62a34ad1ec9409bba29b64aa 50. Lin K, Gong L, Huang Y, Liu C, Pan J (2019) Deep learning-based segmentation and quantification of cucumber powdery mildew using convolutional neural network. Front Plant Sci 10:155 51. Fuentes A, Yoon S, Park DS (2020) Deep learning-based techniques for plant diseases recognition in real-field scenarios. In: International conference on advanced concepts for intelligent vision systems. Springer, Cham, pp 3–14 52. Gadekallu et al (2021) A novel PCA-whale optimization-based deep neural network model for classification of tomato plant diseases using GPU. J Real-Time Image Process 18(4):1383–1396 53. Zhang ZY, He XY, Sun XH, Guo LM, Wang JH, Wang FS (2015) Image recognition of maize leaf disease based on GA-SVM. Chem Eng Trans 46:199–204 54. Aravind KR, Raja P, Aniirudh R, Mukesh KV, Ashiwin R, Vikas G (2018) Grape crop disease classification using transfer learning approach. In: International conference on ISMAC in computational vision and bio-engineering. Springer, Cham 55. Jun L, Xuewei W (2020) Early recognition of tomato gray leaf spot disease based on MobileNetv2-YOLOv3 model. Plant Methods 16(1):1–16 56. Brahimi et al (2017) Deep learning for tomato diseases: classification and symptoms visualization. Appl Artif Intell 31(4):299–315 57. Brahimi M, Boukhalfa K, Moussaoui A (2020) Identification of cash crop diseases using automatic image segmentation algorithm and deep learning with expanded dataset. Comput Electron Agric 177:105712 58. Math RKM, Dharwadkar NV (2022) Early detection and identification of grape diseases using convolutional neural networks. J Plant Dis Prot 129(3):521–532 59. Barbedo JGA (2019) Plant disease identification from individual lesions and spots using deep learning. Biosyst Eng 180:96–107 60. Kalaivani S, Shantharajah SP, Padma T (2020) Agricultural leaf blight disease segmentation using indices based histogram intensity segmentation approach. Multimed Tools Appl 79(13):9145–9159 61. Zhang S, You Z, Wu X (2019) Plant disease leaf image segmentation based on superpixel clustering and EM algorithm. Neural Comput Appl 31(2):1225–1232 62. Shantkumari M, Uma SV (2021) Grape leaf segmentation for disease identification through adaptive Snake algorithm model. Multimed Tools Appl 80(6):8861–8879 63. Deenan S, Janakiraman S, Nagachandrabose S (2020) image segmentation algorithms for Banana leaf disease diagnosis. J Inst Eng (India): Ser C 101(5):807–820 64. Balasubramaniam P, Ananthi VP (2016) Segmentation of nutrient deficiency in incomplete crop images using intuitionistic fuzzy C-means clustering algorithm. Nonlinear Dyn 83(1):849–866 65. Chouhan SS, Singh UP, Jain S (2020) Web facilitated anthracnose disease segmentation from the leaf of mango tree using radial basis function (RBF) neural network. Wirel Pers Commun 113(2):1279–1296 66. Anand R, Veni S, Aravinth J (2016) An application of image processing techniques for detection of diseases on brinjal leaves using k-means clustering method. In: 2016 international conference on recent trends in information technology (ICRTIT). IEEE, pp 1–6
A Comprehensive Review on Crop Disease Prediction …
503
67. Wahab AHBA, Zahari R, Lim TH (2019) Detecting diseases in chilli plants using K-means segmented support vector machine. In: 2019 3rd international conference on imaging, signal processing and communication (ICISPC). IEEE, pp 57–61 68. Sai Reddy B, Neeraja S (2022) Plant leaf disease classification and damage detection system using deep learning models. Multimed Tools Appl 81:24021–24040 69. Pravin Kumar SK, Sumithra MG, Saranya N (2019) Artificial bee colony-based fuzzy c means (ABC-FCM) segmentation algorithm and dimensionality reduction for leaf disease detection in bioinformatics. J Supercomput 75(12):8293–8311 70. Sumithra MG, Saranya N (2021) Particle Swarm Optimization (PSO) with fuzzy c means (PSO-FCM)-based segmentation and machine learning classifier for leaf diseases prediction. Concurr Comput: Pract Exp 33(3):e5312 71. Kavitha Lakshmi R, Savarimuthu N (2021) DPD-DS for plant disease detection based on instance segmentation. J Ambient Intell Hum Comput 1–11 72. Yuan Y, Xu Z, Lu G (2021) SPEDCCNN: spatial pyramid-oriented encoder-decoder cascade convolution neural network for crop disease leaf segmentation. IEEE Access 9:14849–14866 73. Du MG, Zhang SW (2015) Crop disease leaf image segmentation based on genetic algorithm and maximum entropy. In: Applied mechanics and materials, vol 713. Trans Tech Publications Ltd, pp 1670–1674 74. Patil MA, Adamuthe AC, Umbarkar AJ (2020) Smartphone and IoT based system for integrated farm monitoring. In: Techno-Societal 2018. Springer, Cham, pp 471–478 75. Patil MA, Manohar M (2022) Enhanced radial basis function neural network for tomato plant disease leaf image segmentation. Ecol Inform 70:101752
Sentiment Analysis Through Fourier Transform Techniques in NLP Anuraj Singh and Kaustubh Pathak
Abstract FNET encoder architecture can be compared with traditional transformer models, with limited accuracy loss by replacing self-attention sublayers that mix input information. It have shown for text classification tasks, replacing multihead attention in transformer encoder with standard, un-parameterized Fourier transform is able to achieve 92–97% of accuracy of transformer counterpart on GLUE benchmark, but also train faster with 60% lesser time in our case at input length. Moving forward, we merged both of un-parameterized Fourier transform and multihead attention in transformer encoder which help to improve the accuracy further from transformer base model by 5–10% with increase of 30–40% training time over GPU. Hybrid models tend to outperform transformers base counterparts. They can take advantage of Fourier sublayers mixing better than BERT model. In this paper, we study hybrid transformers model with self-attention, Fourier transformations, and hybrid models to study long-range dependencies in context of classification tasks in natural language processing. Keywords NLP · Fourier transform · Transformers · Attention
1 Introduction Natural language processing although facing many challenges has shown remarkable improvement in last 2 decades. With proposing simple RNN [1] that mimics the transfer of knowledge in serial form with idea of learning in process, implementation of more complex cells like GRU and LSTM’s [2] that have dominated the SOTA scores for many years. Proving the gates to remember long-term knowledge that are important to forgetting the extra information with short memory LSTM works remarkable for many years. But, with proposed model of transformer that replaces A. Singh (B) · K. Pathak ABV—Indian Institute of Information Technology and Management, Gwalior, India e-mail: [email protected] K. Pathak e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_37
505
506
A. Singh and K. Pathak
the LSTM with self-attention and training, the larger models with shorter time have broken many SOTA results. It was achieved by self-attention which try to cross-apply attention with itself which in turn return relevance part and giving attention to part of speech with higher score. Since then many paper have proven to advance the idea of transformer and proposing new models that does improve the accuracy of models, one such paper FNET [3] takes the forward leap by experimenting the replacement of self-attention layer with Fourier transform which let the network to allow to perform simple transformation to flow the attention between the layers without creating the specific variables to train at run time which let the model be simpler without losing the accuracy. This lead to maintain the accuracy of 92–97% of original BERT model with training time of 80% of lesser training time for GPU and 70% for TPU. Despite the fact, the model was not using any learnable parameters and still manage to share the information between input tokens. In this paper, it was attempted to learn and study the use of Fourier transform replacing the attention and study the effect of replacement with changing the layers and checking the effects of number of layers and effect of training time and accuracy. Moving further, we combine both of these to form a hybrid model that can take advantage of both (Fourier and Self-Attention) approaches and study the accuracy for SST2 dataset for GLUE benchmarks. Training of all 3 models were performed for 300 sequence length over Nvidia RTX 3050Ti model for encoder length of 2.
2 Related Work LSTM [4] has came long way in NLP and given many SOTA models until it was finally broken by transformer model. Proposed in 1995 [2], it works on 3 layers recursive model that uses short-term memory and long-term memory to remember to models. Large work have shown use of large space LSTM models that have proven to work in past. With attention models [5] in deep learning models, have shown to work great in past few years. They have shown to provide better results since the use self-attention models that are non-recursive in nature that have shown to train faster which in term allows to train even bigger models that LSTM have previously provided. BERT [6] which stand for bidirectional encoder representations from transformers has improvised the idea for large attention-based transformers model that use deep bidirectional representations from unlabeled text. It achieved the SOTA results by pushing GLUE score to 80.5% and MultiNLI score to 86.7%. Since then, there have been many modified versions for BERT. One such paper uses BERT to provide indepth insight of plagiarism detection system Using BERT [7] which compare BERT scores with other previously used models. Fourier transform is heavily studied for neural network. Used for signal processing, fault detection’s, CNN, RNN, and now with transformers have made it significant studied topic. In applications of Fourier transform, discrete Fourier transform and
Sentiment Analysis Through Fourier Transform Techniques in NLP
507
fast Fourier transform were used for multiple domains including tackle signal processing problems. Example of one such signal processing is shown to identify the abnormal electrical activity [8, 9] that fits Fourier transform with neural network. Other domains of Fourier transform include application of FFT’s in CNN’s to speed up computations [4, 10–12]. It was also applied to recurrent neural network (RNN), often in an effort to fasten up training and lessen the vanishing gradients and exploring gradients. DFTs have been also shown to approx the linear, dense layers in neural networks to decrease the parameters counts and computational complexities [13, 14]. Indirectly, DFTs have been also used in several other works with transformers. FNET uses the entire model of transformer replacing the multihead attention with Fourier transform. Tamkin et al. [15] experiment with other filters in activation function for neurons in BERT models. Since first introduced, attention models have shown state of the art results due to flexibility and capacity of attention models. They have improved for virtually all NLP models and also included in computer vision due to there capabilities of focusing on important part of image and removing the rest. But since then, it have shown over many times that BERT scores can be achieved without extensively using attention in models. Tay et al. [16] have shown non-crucial part of attention, and You et al. [17] have shown to give similar results when attention weights in transformer encoder and decoder with un-parameterized Gaussian distributions. Similar experiment results were concluded when encoder attention layers were replaced by random nonlearnable parameters. FNET doesn’t have any used any learnable parameters but still manage to get the accuracy still compared to BERT scores. This have shown although it is important for model, attention is replaceable with other faster models retaining the accuracy. There have many significant work to reduce the complexity to sequence length, since it has quadratic time and memory with respect to sequence length. According to long-range Arena benchmark [16], it attempts to compare many transformers in a multiple tasks requiring long-range dependencies. They have concluded that the performer image transformer (Local Attention), Linformer [18, 19] are fastest models when trained on 4 * 4 TPU v3chips, and they also require lowest peal memory per device. Since there are multiple tasks to train the model on, we limited our research to train over sentiment analysis. In recent papers, like fake news detection [20] and sentiment analysis over diabetes diagnosis [21] have shown to use sentiment analysis to benchmark there datasets. Models were trained on General Language Understanding Evaluation benchmark over Stanford Sentiment Treebank 2 dataset. This consists of sentences with human annotations of their sentiment. Each sentence has positive or negative sentiment with only sentence labels.
508
A. Singh and K. Pathak
2.1 Contributions Since Fourier transform has proven to sub-stain the accuracy compared to transformer base encoder layers, we further work upon this by implementing the hybrid model that adds Fourier transform with multihead attention in encoder layers. We compared all of these 3 models studying long-range dependencies in context of classification task over SST2 dataset.
3 Methodology 3.1 Architecture Attention mechanism is definitely a cool mechanism, but it needs a lot of memory and compute in fact the attention mechanism needs to decide which information in this layer’s sequence goes to which information in the next layer sequence so where does the information go into the next thing from this token and then from this token does it go here or here who knows the tension mechanism’s job is to figure out what information goes where in routing problem and as such it has a complexity of o(N )2 if n is your sequence length, and also, it has memory requirements of N 2 and that prohibits it from scaling to larger sequence lengths, so we would always be sort of limited in the length of the sequences in which we could input or which we could input which prevented it for example from being applied to computer vision. BERT [6] was first of its kind the implemented the transformer model on large parameters that help to break many SOTA records. It was Seq-2-Seq-based AutoEncoder model that used noise to train the model. It does this by first corrupting the text with noise and forcing the model to reconstruct the original sentence training the model in the process. This was heavily inspired by attention model and become the benchmark since it releases to judge all new models with its benchmark. Attention(Q, K , V ) = softmax
QK T √ dk
V
Self-attention try to cross-apply attention with itself which in turn return relevance part and giving attention to part of speech with higher score. MultiHead(Q, K , V ) = Concat(head1 , . . . , headh )W O headi = Attention(QWiQ , K WiK , V WiV ) Inspired by transformer, team at Google tried with replacing the multihead attention with Fourier transform and was able to get accuracy of 92–97% with time of 80% less time on GPU. They although not seen alternative of transformer, this approach
Sentiment Analysis Through Fourier Transform Techniques in NLP
509
Fig. 1 Multihead attention versus Fourier model versus hybrid model
is seen as alternative of original transformer being smaller and faster to train without losing accuracy. Idea of using self-attention to check the attention between words and then passing over to linear layer. This process has shown great result but limitations of multihead attention being bottleneck taking most of time when implementing. As we have input for every sentence as 2D layer of Embeddings*Tokens, we can see as vectors. Since we already use discrete FFT2 to compress images and audio signals, we apply Fourier transform first on rows and then on columns or vice versa. This gives the representation of Fourier transform and filtering out only top N percentage of data from Fourier plane which give reduced vector space that only have important vectors that represent data, zeroing out extra information. This not only increase the speed over time but also reduce space complexity that was issue in attention mechanism for mixing tokens. This reduces the complexity from o(N )2 to o(N ∗ log N ). The Fourier transform (FT) of the function f (x) is the function F(ω), where: ∞ F(ω) =
f (x)e−iωx dx
−∞
and the inverse Fourier transform is 1 f (x) = 2π Recall that i =
∞ F(ω)eiωx dω −∞
√ −1 and eiθ = cos θ + i sin θ .
510
A. Singh and K. Pathak
3.2 System Architecture Understanding architecture 1, the encoder layer of BERT model that was rolled in 2017 was game changing since it focused the use of multihead attention Fig. 1 breaking many SOTA models that are previously focused on LSTM layers. It focused on Seq2Seq model but rather uses memory-based layer where encoder transform sentence in embedding and passes it to decoder layer over every layer. This helps to improve the accuracy but also training speed of models since multihead can perform operations in parallel in place of LSTM that uses sequential operations. But since then, mutilhead has become bottleneck of training models since these take most of time to train in model. Since then, different variations of multihead have came which reduces some overhead but unable to not to the large extend. But, recent research with Fourier has shown that this has significance important in NLP to transfer data in layer rather than been dependent on multihead to transfer data. This have overall shown that BERT can improve Seq2Seq model over LSTM model. Also, multihead attention in BERT can be replaced with fast Fourier transform and training the model with GLUE benchmark for SST2 dataset with binary crossentropy loss with SGD optim tend to increase the accuracy further. Working on further custom model, we implemented the hybrid model that adds the Fourier transform with multihead attention for each layer. This give the advantages of learning capabilities from FNet model and Fourier transform from FNet Model.
4 Experimental Results For initial testing, 2 layers encoder models were used for each case. As for dataset, SST2 dataset was used to train all models to maintain the comparative results. For the Vanilla transformer encoder, model was trained for 10 epochs over SST2 dataset, and number of heads were set to 6 with sentence length and embedding vector both kept at 300 length. This gives us 18,575,598 parameters for model to train. For second model, we used Fourier transform inside encoder model replacing multihead attention layer for each block. We used to expansion factor of 4 for feed-forward layer. This gives up 16,833,902 parameters for model to train. For hybrid model, we experimented with Vanilla transformer multihead attention layer by adding real of Fourier transform with parallel to it. This gives up the advantage of mixing the input tokens with un-supervised algorithm with attention models. This model, however slower than its original counterpart and Fourier Transform model, is able to perform better than both these models. To keep all 3 models similar, we also train this model for 10 epochs over SST2 dataset for 10 epochs. We used 6 heads for multihead attention with sequence length and embedding vector both to 300 length similar to previous models. This gives us 18,575,598 parameters for model to train.
Sentiment Analysis Through Fourier Transform Techniques in NLP
511
4.1 Comparative Study With training the model over RTX 3050TI, we get the results described in Fig. 2. Subsection 1 of figure represents the difference in validation accuracy which show hybrid model show better accuracy over multihead odel and FNet model. Initially, multihead attention and hybrid were taking the same path, but when trained for longer time, hybrid starts to perform better than multihead attention model. Moving forward, subsection 2, suggests the difference in training accuracy which show hybrid models are better than multihead model that shows better accuracy over Fourier model. For training time, Fig. 3, suggests the difference in training time in which show Fourier model shows lower time over multihead model. As expected, since hybrid model is taking longer to train as they have to calculate the Fourier transform while training. Same results can be seen in accuracy and precision where it shows hybrid model as best working model while Fourier model as worst in all cases despite having same number of learnable parameters in both hybrid and multihead attention model. Concise details about training can be found attached in Table 1 which describes the number of layers used for each model, along with total number of trainable parameters, accuracy, prediction, and average epoch training time is attached. With training the model over Google Colab GPU, we get average of 14.45 s for each epoch of Fourier model to train while for multihead attention we get 30.85 s of batch training time. Hybrid models take even longer to train with average time of 42.82 s.
Fig. 2 Training and validation time
512
A. Singh and K. Pathak
Fig. 3 Training time for each model Table 1 Model parameters with accuracy and precision Model Parameters Layers Accuracy Multihead attention model Fourier model Hybrid model
Precision
Training time (s) (avg.)
18,575,598
2
0.68
0.59
11.54
16,833,902 18,575,598
2 2
0.66 0.69
0.56 0.58
6.255 13.98
5 Conclusion Since Fourier transform reduces space complexity of equation and time complexity losing the details that are learned in process of attention, outcoming models are expected to showing faster learning rates with lower accuracy taking 60% lesser time in our case while maintaining 90% of accuracy. On the other hand, hybrid models are showing great improvement in accuracy but lacking in training time, since they require more accuracy then both attention and Fourier model. Fourier transform-based models are shown to have create capabilities as they keep up the score with lesser parameters and require lesser time to train. Taking roughly 30–40% more time to train for each epochs, these models have capabilities to outperform BERT base model by 5–10%. On the other hand, multihead stand in middle of both, they have shown to perform better and require training time in middle of both. So, this helps us to understand that not only attention can be replaced by other transformation that can help the
Sentiment Analysis Through Fourier Transform Techniques in NLP
513
knowledge to flow but may with controlled interference can even show better results than multihead attentions. Acknowledgements This work is supported by the Mathematical Research Impact Centric Support Scheme (MTR/2020/000477), Science and Engineering Research Board, India.
References 1. Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, Osaka, Japan, Dec 2016. The COLING 2016 Organizing Committee, pp 3485–3495 2. Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenom 404:132306 3. Lee-Thorp J, Ainslie J, Eckstein I, Ontanon S (2021) Fnet: mixing tokens with Fourier transforms 4. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232 5. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need 6. Devlin J, Chang MW, Lee K, Toutanova KB (2019) BERT: pre-training of deep bidirectional transformers for language understanding 7. Bohra A, Barwar NC (2022) A deep learning approach for plagiarism detection system using BERT. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems, Singapore, 2022. Springer, Singapore, pp 163–174 8. Chitsaz K, Hajabdollahi M, Karimi N, Samavi S, Shirani S (2020) Acceleration of convolutional neural network using FFT-based split convolutions 9. Mironovova M, Bíla J (2015) Fast Fourier transform for feature extraction and neural network for classification of electrocardiogram signals. In: 2015 fourth international conference on future generation communication technology (FGCT), pp 1–6 10. El-Bakry HM, Zhao Q (2004) Fast object/face detection using neural networks and fast Fourier transform. Int J Signal Process 182–187 11. Pratt H, Williams B, Coenen F, Zheng Y (2017) FCNN: Fourier convolutional neural networks. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases. Springer, Cham, pp 786–798 12. Cheng Y, Yu FX, Feris RS, Kumar S, Choudhary A, Chang SF (2015) An exploration of parameter redundancy in deep networks with circulant projections 13. Cer D, Yang Y, Kong SY, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Sung YH (2018) Universal sentence encoder 14. Larson S, Mahendran A, Peper JJ, Clarke C, Lee A, Hill P, Kummerfeld JK, Leach K, Laurenzano MA, Tang L, Mars J (2019) An evaluation dataset for intent classification and outof-scope prediction. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Hong Kong, China, Nov 2019. Association for Computational Linguistics, pp 1311–1316 15. Tamkin A, Jurafsky D, Goodman N (2020) Language through a prism: a spectral approach for multiscale language representations 16. Jiao Xiaoqi, Yin Yichun, Shang Lifeng, Jiang Xin, Chen Xiao, Li Linlin, Wang Fang, Qun Liu (2020) Tinybert: distilling BERT for natural language understanding 17. You W, Sun S, Iyyer M (2020) Hard-coded Gaussian attention for neural machine translation
514
A. Singh and K. Pathak
18. Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: self-attention with linear complexity 19. Vyas A, Katharopoulos A, Fleuret F (2020) Fast transformers with clustered attention 20. Fazlourrahman B, Aparna BK, Shashirekha HL (2022) Coffitt-covid-19 fake news detection using fine-tuned transfer learning approaches. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Springer, Singapore, pp 879–890 21. Nagaraj P, Deepalakshmi P, Muneeswaran V, Muthamil Sudar K (2022) Sentiment analysis on diabetes diagnosis health care using machine learning technique. In: Saraswat M, Sharma H, Balachandran K, Kim JH, Bansal JC (eds) Congress on intelligent systems. Springer, Singapore, pp 491–502
Gesture Analysis Using Image Processing: For Detection of Suspicious Human Actions Prachi Bhagat and Anjali. S. Bhalchandra
Abstract In recent years, interest in real-time video sequences for human action recognition has risen rapidly, and discrimination is required to identify which type of human action appears in the videos. The use of video for human action recognition is becoming widespread in many applications like monitoring of traffic, pedestrian detection, anomalous behavior identification of the human being, health monitoring, human–computer interaction, robotics, etc. Various combinations of feature representation are used to implement different feature detectors for the identification of features and to classify human actions. For testing the given data, Machine learning-based techniques such as unsupervised and supervised learning are utilized to estimate the task of identifying human behaviors. For classification, unsupervised learning techniques like data clustering, hierarchical learning, etc., are implemented, whereas to handle action classification, supervised learning techniques like SVM, random forest, KNN, Naive–Bayes, and artificial neural networks are used. In this paper, real-time applications for suspicious action detection, tracking, and recognition are surveyed and classified. We propose a new technology like MediaPipe Holistic which provides pose, face, and hand mark detection models where the suspicious movement of the human being is being detected by the system from a video that may be stored in memory or from real-time monitoring. Keywords Feature detectors · Machine learning · Deep learning · SVM · CNN · Kalman filtering · MediaPipe
P. Bhagat Prof. Department of Electronics Engineering, Government College of Engineering, Yavatmal, India e-mail: [email protected] Anjali. S. Bhalchandra (B) Department of Electronics Engineering, Government College of Engineering, Aurangabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_38
515
516
P. Bhagat and Anjali. S. Bhalchandra
1 Introduction Motion detection, human activity prediction, person identification, anomalous activity recognition, vehicle counting, people counting in congested settings, and other applications can all benefit from video analytics. Human action recognition is difficult in real-world applications, resulting in increased interest. Human motion detection, tracking, recognition, and analysis are beneficial for various prominent applications which rely on visual input-preoccupied representations. Human interactions such as slight contact between two people, head motion detection, hand gesture identification, and estimation are combined to create a system that can discover and recognize suspicious conduct in a crowd. Several approaches for detecting human activity have been developed, with optical flow, frame differencing, foreground and background subtraction, and temporal differencing being the most often employed. According to Ryoo and Agrawal [1], human activities are sorted into gestures, actions, interactions, and group activities. It focuses on high-level activity recognition approaches for evaluating human actions, interactions, and group activities, as well as current action recognition research trends. Analyzing video footage to predict or classify distinct actions performed by the person in the video is known as human action recognition. First, detect distinct body parts in each frame to identify an activity, then assess their movement over time. Boxing, kicking, handshaking, punching, and pushing are among the actions that are considered. To improve the accuracy and performance of human actions, new methods are combined together for feature extractions, depictions, and classification. The most surprising component of distinguishing diverse suspicious behaviors from live video surveillance is human conduct, which is incredibly problematic to evaluate whether an activity is distrustful or usual. In the field of automated video surveillance systems, analyzing abnormal occurrences from the video is a difficult issue. These unusual occurrences have been observed in places such as airports, train stations, banks, offices, examination rooms, and many other areas [2]. Detecting suspicious activity, public safety, and security are done by a grouping of computer vision and video monitoring. Modeling of the environment, motion detection, moving object categorization, tracing, behaviors, comprehension, portrayal, and blending of information are all stages of computer vision procedures. Machine learning and deep neural networks are two types of techniques that may be used to recognize suspicious human activities from provided videos. Preprocessing, feature extraction, and classification are all examples of machine learning-based methodologies. To extract features from various video sequences, extensive preprocessing is required where conversion of raw data is collected in a format that can be supplied to the model. Deep learning-based algorithms extract features and create high-level representations of picture data automatically. Long short-term memory (LSTM) models can discover continuing dependencies from video streams, whereas convolutional neural networks (CNNs) can learn visual patterns straight from picture pixels.
Gesture Analysis Using Image Processing: For Detection of Suspicious …
517
2 Literature Survey For human action recognition, feature representation is used in conjunction with feature extraction and action categorization approaches. Local, global, and feature combinations are the different types of feature depiction approaches. Generally, global feature tactics gather information from the whole human body parts and employ a global bodily composition and dynamics to describe human activities. A local feature technique represents the extraction of interest points or local sub-regions from the video or image sequences, edges, or small picture patches. Combining local and global features computed from images or combining feature vector values derived from distinct image channels, for example, RGB and depth information are two methods for integrating features. Wang et al. employed background subtraction [3] to remove the background from an image and extract the moving foreground from the static background for motion detection. A Gaussian (MoG) background removal approach is used by Chivers and Goshtasby [4] combined with new enhancements utilized to demarcate each frame of a movie that contains a clear shadow of a moving individual. Principal component analysis (PCA) is also employed to recognize essential motion and to generate a fixed-size feature vector. PCA has the benefit of dimensionality reduction without losing variable descriptions. Using PCA, comparable and dissimilar motion feature vectors are used to differentiate various action classes. The advantage of the MoGbackground subtraction technique is robust against the movement, but for object overlapping, shadows, and lighting, changes in the background cannot be subtracted. In [5], Laptev et al. developed the HOG/HOF descriptors, which were used to illustrate local motion and the development of local features and build histograms of spatial gradient and optical flow assembled in the space–time environments of spotted interest points. Kaaniche and Bremond [6] worked on the HOG descriptor to compute the observed individual in the scene using corner points and local motion signatures. The Histogram of Optical Flow Orientation and Magnitude (HOFM), which is created from optical flow information, was used by Schwartz et al. [7], to define the typical patterns in the scene so that the nearest neighbor could be sought to see if the abnormal event could be classified. The HOFM descriptor gathers spatiotemporal information from cuboids and encodes the amount and orientation of optical flow into separate histograms. The Histogram of Optical Flow Orientation (HOFO) descriptor is utilized by Yang et al. [8] to differentiate fighting scenes from normal scenes. HOG displays the knowledge about the contour in addition to its direction, unlike other edge-based feature extraction methods. Zhu et al. [9], have used STIP features and the Histogram of Visual Words, to generate the characteristics from depth maps. For feature representation, Nazir et al. [10] employ a combination of a 3D-Harris space–time interest point detector and 3DSIFT descriptor, and action films are represented by a histogram of visual features using a bag-of-visual-features approach. Zhang et al. [11] utilize optical flow with a Harris detector to detect interest locations in the picture’s motion field, whereas Shafie et al. [12] detect the motion from the video by using optical flow considering
518
P. Bhagat and Anjali. S. Bhalchandra
the apparent velocity and the brightness pattern. Mukherjee et al. [13] employ optical flow to create a posture dictionary using multidimensional pose descriptors derived from a video frame’s gradient field by combining motion and position cues. Local binary patterns in conjunction with the Gaussian mixture model are used by Vijayan et al. [14], to detect moving objects. In a scene, pixels are tagged by the LBP operator and thresholding each pixel’s neighbors together, using the middle value and arraying the output to binary codes, and the Gaussian distribution identifies dependent pixels with comparable texture in the background at a certain location in the image. Chen et al. [15] employ DMMs to record motion cues in an extensive video series, and the LBP operator, which is an active degree of local picture texture, is then experimented on the overlay blocks inside each DMM to signify it densely, enhancing the discriminatory power for action recognition. LBP has excellent computing performance and uses less storage than comparable technologies; however, in an event of noisy images, it is ineffective. Table 1 gives the comparison of feature representation [16, 17]. To extract the feature from the video sequence and for the training process for modern object recognition models, the transfer learning technique will be used for the optimization of a set of categories and restrains from the existing weights for new classes. One of the biggest obstacles to utilizing AI to its full potential is the deployment of machine learning models at scale. As the models become more intricate, this difficulty will only worsen; to overcome this, MediaPipe framework comes into the picture. Model inference, media processing algorithms, and data transformation are a few of the modular elements which may combine to create a perception pipeline using MediaPipe. Table 1 Comparison of feature representation Feature
Author
Classification method
Accuracy (%)
HOG/HOF
Laptev et al. [5]
SVM
91.8
HOG
Wei et al. [18]
MaskFeat model
75
Optical flow
Mukherjee et al. [13]
–
79.17
Dynamic BoW Integral BoW
Ryoo [19]
SVM
70.0 (half video) 85.0 (full video) 65.0 (half video) 81.7 (full video)
DMM-LBP
Chen et al. [15]
KELM
91.94 93.4
Multi-temporal DMM-LBP
Chen et al. [20]
KELM
96.7 96.70 99.39 89
Silhouette-based feature
Chaaraoui et al. [21]
Nearest neighbors’ key pose
85.9
Gesture Analysis Using Image Processing: For Detection of Suspicious …
519
2.1 Machine Learning-Based Approaches The classifier is used to train the model, which is subsequently utilized to categorize your data. The study of computer algorithms that can learn and evolve on their own given experience and data is known as machine learning (ML). Supervised and unsupervised classifiers are the two types of machine learning classifiers. Unsupervised machine learning classifiers are fed just unlabeled datasets that they categorize using pattern recognition, data structures, and anomalies. Supervised and semi-supervised classifiers are taught how to categorize data into specific categories using training datasets. Among other classifiers like K-nearest neighbors (KNN), decision tree, random forest, support vector machine (SVM), and logistic regression can be hired for the classification process. Al-Ali et al. [22] use an SVM classifier with a leave-one-actor-out (LOAO) feature and a nine-fold cross-validation strategy to divide all films into several actors. Only one actor or set is used for training, and the dataset is portioned into nine folds because each fold represents one actor in the dataset movies, and each experiment is repeated nine times. The leave-one-video-out (LOVO) cross-validation approach is utilized in the KNN trails. SVM and KNN are also employed by Sonali et al. [23], with SVM being used for data testing and KNN being used to consider the Euclidian distances between the train and the test features. The KNN algorithm is employed in Kaaniche et al. [6] to pick the k-nearest neighbors; subsequently, gesture labels are assigned to cast the majority vote. The KNN algorithm has the fundamental advantage of being a universal approximator capable of effectively replicating any many-to-one mapping. In Nazir et al. [10], SVM is utilized for training to predict action labels from feature values, and feature vectors are produced during testing by recognizing and describing interest spots in unlabeled films and quantizing using a visual codebook. SVM has drawbacks including lengthy training procedures, memory requirements, computational complexity, and training timing. Confidenceencoded SVM was used by Wang et al. [24], which incorporates context information from marked samples using hard-thresholding confidence scores, ensuing in drifting and inefficient training problems. This problem can be avoided by relying on context cues, which predict inaccurate labels in a small fraction of target samples with high confidence ratings. Confidence-encoded SVM distributes confidence ratings across marked samples along with a graph and corrects faulty labels based on the underlying visual structures of the samples, enhancing transfer learning efficiency. Kim et al. [25] suggested analyzing human joints, a geodesic graph, and SVM retrieved from human feature points within a geodesic distance range from a geodesic graph. For optimization, geodesic graphs are utilized. The random forest classification method is a tree-based strategy for lowering variance. RF was used by Kone et al. [26] to improve performance by integrating multiple decision trees trained on various subclasses of data. Rendering to the bagging principle, each tree is trained on a random subclass of data. Incorporating global motion and pose-based characteristics into image sequences, Ar et al. [27] use novel representation to distinguish activities in test sequences using random forest classification.
520
P. Bhagat and Anjali. S. Bhalchandra
Yang et al. [28] employ an RF classifier to categorize human activities from data acquired by wearable sensors. On the optical flow domain, Yang et al. [8] employ a Gaussian mixture model (GMM) to identify regions of interest (ROI) and categorize violence and non-violence using linear SVM. According to Idrees et al. [29], only when full-body annotations are available, latent SVM is employed to infer the position of body parts using training data to spot the grouping of unique body parts with no annotations. The categorization result provided by the SVM algorithm is used to detect each item activity which is employed by Seemanthini et al. [30]. Hidden Markov model (HMM) is worked to train with the code vectors generated by the vector quantization approach of the multi-fused features, and the forward spotting strategy is utilized to recognize the segmented human activity by Jalal et al. [31] by using the trained HMM-based human activity classifiers. Many scenario-based actions are available in various datasets in real life, which are complicated with the activities, and each action is not always tagged, which can cause complications. These problems can be solved by supervised and unsupervised machine learning methods. Different classification algorithms with their advantages and disadvantages have been given specified in Table 2, whereas the accuracy of distinct classification methods with appropriate feature extraction methods has been compared in Table 3 of machine learning-based approaches. Table 2 Classification techniques with advantages and disadvantages Classification method
Advantages
Disadvantages
SVM [22, 32, 33]
1. Provides an accurate solution
1. Accurate solutions are not preferred in online applications 2. Classification has to be done at great speed 3. Basis functions are usually needed to form the SVM classifier, making it complex and expensive
K-nearest neighbor [6, 17]
1. It is a universal approximator 2. Can model any many-to-one mapping
1. Lack of robustness for high dimension spaces 2. With a huge dataset ,computational complexity is increased
Confidence-encoded SVM [24]
1. No need for thresholding
1. Difficulty in handling variations of illumination and background
Random forest [27]
1. Can handle large dataset and thousands of input variables
1. Misclassification error occurs based on “Skip” action
Latent SVM [29]
1. Detects partially visible humans
1. Not possible to obtain detections through occlusion reasoning
Gesture Analysis Using Image Processing: For Detection of Suspicious …
521
Table 3 Comparison of different machine learning-based approaches Feature
Classification method
Dataset
Accuracy (%)
STIP
SVM
UCSDped-1 UCSDped-2 UMN
97.14 91.13 95.24
Color STIP
SVM
MSR action 3D UT kinect action CAD-60
94.3 91.9 87.5
SDEG feature and R transform
SVM
KTH ARA Weizmann ARA I3Dpost ARA Ballet ARA IXMAS ARA
95.5 100 92.92 93.25 85.5
3D Harris STIP and 3D SIFT
SVM
KTH average UCF sports average Hollywood2
91.8 94 MAP: 68.1
GLCM, HU, HOG
SVM with ASAGA
UCSD ped1
87.2
NA
SVM with GA
KTH HMDB51 UCF YouTube Hollywood2
95.0 48.4 82.3 46.8
Silhouette
SVM-NN
KTH ARA Weizmann ARA
96.4 100
Body shape and skeleton joint-based multi-fused features
HMM
Im-daily depth activity MSR action 3D (CS) MSR daily activity 3D (CS)
74.23 93.3 94.1
3D Haar-like feature
Random Forest
Weizmann (only motion) Weizmann (only pose) Weizmann (combined) KTH (combined)
91.11 87.78 94.44 92.99
HOFME
KN
UCSD (PEDs1) UCSD (PEDs2) Subway (Exit) Subway (Entrance)
EER: 32:0 AUC 0:727 EER:20 AUC:0:875 EER: 17:8 AUC 0:849 EER 22:8 AUC 0:816
2.2 Deep Learning Neural Network-Based Approaches Deep Learning is a subcategory of machine learning which is constructed on artificial neural networks and learning about representations, and it is also computer software that simulated the brain’s neuronal network. Because deep neural networks are used,
522
P. Bhagat and Anjali. S. Bhalchandra
it is also known as deep learning. This learning can take place in a supervised, semisupervised, or unsupervised environment. Deep learning algorithms are made up of connected layers, which include input layers, output layers, and hidden layers. The network absorbs vast amounts of input data and processes it through numerous levels; at each layer, the network can learn increasingly complicated characteristics of the data. Human action recognition has grown in popularity in the real world due to its ability to handle issues such as occlusion, cluttered backdrops, viewpoint fluctuations, and various human actions in films using deep neural models. Recurrent neural networks (RNNs), feedforward neural networks (FNNs), convolutional neural networks (CNNs), LSTM reinforcement learning, and other deep neural networks are divided into numerous types. Convolutional neural networks (CNNs) are multi-layered neural networks with a distinct architecture that extracts progressively sophisticated aspects of the data at each layer to decide the output. According to Ji et al. [34], 3D convolution is employed in the collected video’s convolutional layers to differentiate the features along the spatiotemporal dimensions, effectively including motion information. To extract numerous features, multiple convolutional methods are conducted at the same time on the input [35, 36]. According to Wu et al. [37], high-level skeletal joint features are extracted from 2D multiple channel inputs using a Gaussian–Bernoulli deep belief network (DBN), and these features are extracted from 2D single-channel inputs using a 3D CNN. To combine the appearance and motion features, Heo et al. [38] employ a deep learning framework to distinguish moving objects from a moving camera. In Guo and Wang [39], multifaceted backgrounds and cameras are been eliminated, and information about the feature is extracted multiple times to acquire high-level structures, enhancing the skill to discriminate actions and thorough spatial information even more. The reason behind the discrimination of actions is the characteristics of human kinematics which can be easily found, and the TS-DBN algorithm model minimizes the noises, resulting in good accuracy for recognition of behaviors. In Zhang et al. [40], a DBN-based EMG pattern classifier is cast off to sort the EMG signals into categories in order to classify them. Fragkiadaki et al. [41] use a moving objectness detector to recognize moving objects on trained images and motion fields, and they can discard under or oversegmentation or background elements of the video sequence. Ren et al. [42] projected region proposal network (RPN), an entirely convolutional network that envisages object boundaries and objectness scores at every point and is trained endwise to create regions of high-quality proposals for detection by fast R-CNN. Valles et al. [43] offer an unsupervised hierarchical spiking architecture using an event-based camera to detect motion (direction and speed). Cai et al. [44] develop a network-based model on a neural network and generative adversarial networks (GANs) that can recognize and optimize human action from video sequences by employing a new bottom-up detection technique that improves the accuracy and performance in real time. A deep learning strategy is utilized by Amrutha et al. [45], and the framework is divided into two parts to detect suspicious or normal human actions: first, the features are computed from the video frames, and
Gesture Analysis Using Image Processing: For Detection of Suspicious …
523
second, the classifier forecasts the class based on the data gained. Discrete wavelet transform is used in convolutional architectures by Mei et al. [46] to extract critical features and reduce the size of the feature map. A graph neural network structure for skeleton-based action classification and motion prediction is presented by Mason Li et al. [47]. Nikolova et al. [48] present three methods for identifying actions based on pose-based features: generative adversarial networks (GANs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs). Kajabad et al. [49] use the YOLO model to detect people and manage their conduct and forecast human behavior. Recurrent neural networks (RNNs) are multi-layered neural networks that store information in context nodes and can learn data sequences and output a number or another sequence. RNNs are ideally suited to processing input sequences. An RNN-AE framework inspired by sparse coding is presented by Luo et al. [50], for anomaly detection which simplifies hyperparameter selection and dictionary training in temporally coherent sparse coding (TSC). LSTM-based DRNN is proposed by Murad et al. [51], to improve performance by means of deep layers in a taskdependent and end-to-end manner to extract more discriminative features and DRNN functionality to capture temporal dependencies between input samples in activity sequences. To identify human activities such as cooking, sleeping, and so on, Singh et al. [52] have used recurrent neural network (RNN) in conjunction with long-term memory classifier (LSTM). Hammerla et al. [53] investigated deep, convolutional, and recurrent techniques for capturing movement data with wearable sensors and optimizing performance using hyperparameters. In Li et al. [54], LSTM and CNN were used to accomplish successful recognition, followed by score fusion to capture as much spatial–temporal data as possible. To detect human action from the skeleton data, Si et al. [55] present an attention-enhanced graph convolutional LSTM network (AGC-LSTM) to extract differential spatial–temporal information. A characteristic incorporates joint differences in the spatial–temporal domain to directly represent the dynamics of individual joints and the configuration of distinct joints, and Lang et al. [56] proposes that Eigen joints which are obtained by PCA to joint differences and a non-parametric Naive–Bayes nearest neighbors (NBNN) type of classifiers are employed to recognize multiple action categories. Lui et al. [57] use extensive pyramidal features to characterize poses that capture the intensity, contour information, and orientation; thus, for selecting discriminative poses, the AdaBoost learning method is used and weighted local NBNN is used for action classification. Bag-of-features is used to extract meaningful information using the AdaBoost learning algorithm by Lui et al. [58], while NBNN is used for action classification. Weng et al. [59] employ ST-NBNN to classify actions using the stage-to-class distance obtained from a set of temporal stages made up of 3D poses representing a 3D action instance. The individual stage presents a set of spatial joints. Lu et al. [60] provide an NBNN framework for detecting human activities from skeleton sequences that avoid the need to evaluate computationally complicated or time-consuming action features. To generate compact but discriminative local feature descriptors and to considerably improve the performance and detect the activities,
524
P. Bhagat and Anjali. S. Bhalchandra
Zhen et al. [61] employ local feature descriptors based on the I2C distance in NBNN classifiers. To figure out the feature vector of each video to recognize the human behaviors, Jaouedi et al. [62] use the K-nearest neighbor’s algorithm. By reducing the reconstruction error among prediction and ground truth, Xu et al. [63] use prediction-CGAN to study a transition in feature a space from incomplete to full actions. For action recognition in the videos, Ahsan et al. [64] use GAN as an unsupervised pretraining method. Shen et al. [65] employ imaginative GAN in the trained model that can create realistic samples of novel facts in a similar field as the data with new classes. The adversarial skeleton sequences use the GAN framework to verify anthropomorphic believability in Lui et al. [66]. Bruno et al. [67] employ kinetic GAN to identify diverse activities on local and global body actions, advancing sample superiority and variety. Table 4 gives the summary of deep learning-based approaches.
2.3 Datasets According to the authors, several algorithms have been explored to detect human actions from the videos. These different kinds of datasets are required, which play a crucial role, in which algorithms can be run to improve the system’s performance and accuracy. Other ways of recording the performances of the datasets are non-visual sensors like wearables devices which can be used. The UCF101 dataset contains 13,320 realistic action films for human action identification, including a lot of camera movement, presence and location of an object, illumination conditions, cluttered background, and so on. The HMDB51 dataset is made up of a variety of sources, the majority of which are movies, with a minor percentage coming from community records including YouTube, and Google videos. UCF101 and HMDB51 datasets are utilized in Varol et al. [68], with the first having a resolution of 320 × 240 pixels and a 25-fps frame rate, which measures per clip accuracy, and the second of spatial resolution of 320 × 240 pixels and a 30 fps frame rate, which measures video accuracy. Ke et al. [69] employed UT-interaction and BIT-interaction datasets, with the BITinteraction dataset containing eight types of human interactions with behaviors such as bowling, hugging, kicking, patting, handshaking, high five, boxing, and pushing. The UT-interaction dataset has two sets of backgrounds: Set 1 has a simpler and mostly static background, whereas Set 2 has a more complicated and slightly moving background. KTH dataset consists of six types of human actions with 2391 sequences, out of six, Shi et al. [70] have considered three actions like boxing, hand clapping, and walking, and 25 people were photographed in four different circumstances with a uniform background. HMDB51 and UCF101 datasets were used. The TV human-interaction dataset contains 20 different TV episodes from 300 video clips, with five interaction classes, viz handshake, high five, embrace, kiss, and a “none” class that contains no interactions and proposes annotations for each frame of the video. Ke et al. [71] used TV human-interaction dataset for the interaction
Gesture Analysis Using Image Processing: For Detection of Suspicious …
525
Table 4 Comparison of different deep learning-based approaches Feature
Classification method
Dataset
Accuracy (%)
RGB TpDD
Two-stream ConvNet
HMDM51 UCF101
65.9 91.5
RGB sDTD
Three-stream CNN
KTH UCF101 HMDB51
96.8 92.2 65.2
Skeleton deep features
Multistream CNN
Northwestern UCLA NTU-RGBD(CS) BTU_RGBD(CV) MSRC-12(CS)
92.61 80.03 87.21 96.62
Skeleton-based features
Spatiotemporal Naïve-Bayes nearest neighbors (NBNN)
MSR action 3D UT kinect Berkeley MHAD
94.8 98 100
I2CDDE features
Local Naïve-Bayes nearest neighbors (LNBNN)
KTH HMDB51 YouTube UCF101
94.1 41.7 74.7 88.9
Skeleton deep features
AGCN-LSTM
NTU-RGBD(CV) NTU-RGBD(CS) Northwestern UCLA
95 89.2 93.3
RGB deep features
CNN
UCF101 HMDB51
92.5 65.2
Extensive pyramidal features AdaBoost Algorithm + weighted NBNN
KTH Weizmann IXMAS HMDB51
94.8 100 94.5 49.3
Background subtraction by GMM
KNN
KTH
71.1
Eigen joints
Naïve-Bayes nearest neighbors (NBNN)
MSR action 3D
96.4
Skeleton-based features
GAN
SHREC17 track MSR action 3D
79.8 59.8
occurance or not in each frame, humans’ actions are provided with bounding boxes to obtain Region of Interest (ROI). For experiment purposes, the UT-interaction dataset is also used. For evaluation, Amer and Todorovic [72] used volleyball, VIRAT, UTinteractions, KTH, and TRECVID MED 2011 types of datasets including 11 activity classes because the videos show activities with structural variations and each exercise has 20 VIRAT clips; 50% was used for instruction and 50% was used for testing. Here, TRECVID dataset is utilized, which consists of Internet videos showing 15 events and splits into two sets, viz DEV-T and DEV-O. The volleyball dataset includes 240 videos depicting six distinct types of sets. Table 5 gives the summary of datasets [73–77].
526
P. Bhagat and Anjali. S. Bhalchandra
Table 5 Various datasets Sr. No
Name of the dataset
Number of videos
1
UCF101
9.5 and 3.7 K
2
HMDB51
7 K videos of 51 actions
3
UT-interaction
60 sequences of videos
4
BT-interaction
Eight types of interactions with 50 videos
5
ActivityNet
10,024 training videos
6
YouTube action
11 action categories, and each category was grouped into 25 groups with action clips
7
KTH
2391 sequences
8
VIRAT 1.0 and 2.0
11 activity classes with 20 VIRAT clips
9
TRECVID MED 2011
DEV-T has 10,723 videos, DEV-O has 32,061 videos
10
Volleyball
40 videos per class
3 Conclusion This study gives an overview of different methods for identifying human actions in the video sequence. The different strategies utilized for feature extraction, machine earning-based approaches, and deep Learning approaches for human action recognition are reviewed in a broad framework. All of the algorithms are evaluated on a dataset of fragmented movies, using a number of methodologies, and a known set of activity labels are utilized to differentiate and forecast suspicious and non-suspicious activities. The survey tends to do a comprehensive study about human action recognition and prediction and brought us the different challenges for the detection of human activities. Since these can create a new challenge to detect and predict the prior human actions that can be recognized and predict the suspicious activities, in the succeeding part of the review, the detection of suspicious activities using ML-based and DL-based approaches has been summarized and introduced. A new approach is introduced MideaPipe which provides pose, face, and hand mark detection models.
References 1. Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Sur 43(3) 2. Patil S, Talele K (2015) Suspicious movement detection and tracking based on color histogram. In: International conference on communication, information & computing technology (ICCICT), pp 16–17 3. Wang L, Hu W, Tan T (2002) Recent developments in human motion analysis. J Pattern Recog Soc 13 4. Chivers DS, Goshtasby AA (2012) Human action recognition in videos via principal component analysis of motion curves, Wright State University
Gesture Analysis Using Image Processing: For Detection of Suspicious …
527
5. Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on computer vision and pattern recognition, pp 1–8 6. Kaaniche B, Bremond F (2012) Recognizing gestures by learning local motion signatures of HOG descriptors, Mohamed-IEEE Trans Pattern Anal Mach Intell 34(11) 7. Colque RM, Caetano C, Toledo M, Schwartz WR (2016) Histograms of optical flow orientation and magnitude and entropy to detect anomalously events in videos. IEEE Trans Circ Syst Vid Technol 8. Yang Z et al (2013) Violence detection based on histogram of optical flow orientation. In: Sixth international conference on machine vision (ICMV) 9. Zhu Y, Chen W, Guo G (2014) Evaluating spatiotemporal interest point features for depth-based action recognition. Image Vis Comput 32(8) 10. Nazir S, Yousaf MH, Velastin SA (2018) Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition. Comput Electr Eng 11. Zhang Y, Kiselewich SJ, Bauson WA, Hammoud R (2006) Robust moving object detection at distance in the visible spectrum and beyond using a moving camera. In: Proceedings of the conference on computer vision and pattern recognition workshop (CVPRW’06) 12. Shafie AA, Hafiz F, Ali MH (2009) Motion detection techniques using optical flow, world academy of science, engineering and technology. Int J Electr Comput Eng 3(8) 13. Mukherjee S, Biswas SK, Mukherjee DP (2011) Recognizing interaction between human, performers using “Key Pose Doublet”. In: Proceedings of the 19th international conference on multimedia 14. Vijayan M, Ramasundaram M, Athira AP (2017) Moving object detection using local binary pattern and Gaussian background model, Springer, 21 July 15. Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion maps-based local binary patterns. In: 2015 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1092–1099 16. Ragupathy P, Vivekanandan P (2019) A modified fuzzy histogram of optical flow for emotion classification. J Ambient Intell Humanized Comput 17. Serpush F, Rezaei M (2020) Complex human action recognition in live videos using hybrid FR-DL method. Comput Vis Pattern Recog CV 6 July 18. Wei C, Fan H, Xie S, Wu CY, Yuille A, Feichtenhofer C (2021) Masked feature prediction for self-supervised visual pre-training. Comput Vis 19. Ryoo MS (2011) Human activity prediction: early recognition of ongoing activities from streaming videos. In: 2011 IEEE International conference on computer vision 20. Chen C, Liu M, Liu H, Zhang B, Han J, Kehtarnavaz N (2017) Multi-temporal depth motion maps-based local binary patterns for 3-D human action recognition. IEEE Access 5:22590– 22604 21. Chaaraoui AA, Climent-Pérez P, Flórez-Revuelta F (2013) Silhouette-based human action recognition using sequences of key poses. Pattern Recog Lett 34(15) 22. Al-Ali S, Milanova M, Manolova A, Fox V (2015) Human action recognition using combined contour-based and silhouette-based features and employing KNN or SVM classifier. Int J Comput 9 23. Sonali, Bathla AK (2015) Human action recognition using support vector machine and k-nearest neighbor. Int J Eng Tech Res (IJETR) 3(4) ISSN: 2321-0869 24. Wang X, Wang M, Li W (2014) Scene-specific pedestrian detection for static video surveillance. IEEE Trans Pattern Anal Mach Intell 36(2) 25. Kim H, Lee S, Kim Y, Lee S, Lee D, Juc J, Myung H (2016) Weighted joint-based human behavior recognition algorithm using only depth information for low-cost intelligent videosurveillance system. Exp Syst Appl 26. Kone Y, Zhu N, Renaudin V, Ortiz M (2020) Machine learning-based zero-velocity detection for inertial pedestrian navigation. IEEE Sens J (03) 27. Ar I, Akgul YS (2013) Action recognition using random forest prediction with combined posebased and motion-based features. In: 8th International conference on electrical and electronics engineering (ELECO)
528
P. Bhagat and Anjali. S. Bhalchandra
28. Xu L, Yang W, Cao Y, Li Q (2017) Human activity recognition based on random forests. In: 13th International conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD) 29. Idrees H, Soomro K, Shah M (2015) Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE Trans Pattern Anal Mach Intell 37(10) 30. Seemanthini K, Manjunath SS (2018) Human detection and tracking using HOG for action recognition. In: International conference on computational intelligence and data science (ICCIDS) 31. Jalal A, Kim Y-H, Kim Y-J, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recogn 61:295–308 32. Miao Y, Song (2014) Abnormal event detection based on SVM in video surveillance. In: IEEE workshop on advanced research and technology in industry applications (WARTIA). IEEE, pp 1379–1383 33. Al-Dhamari A, Sudirman R, Mahmood NH (2020) Transfer deep learning along with binary support vector machine for abnormal behavior detection. IEEE Access 34. Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural networks for human action recognition. In: IEEE Transactions on pattern analysis and machine intelligence 35(1) 35. Bui MQ, Duong VH, Tai TC, Wang JC (2018) Depth human action recognition based on convolution neural networks and principal component analysis. In: 25th IEEE International conference on image processing (ICIP) 36. Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2020) Object detection from scratch with deep supervision. IEEE Trans Pattern Anal Mach Intell 42(2) 37. Wu D, Pigou L, Kindermans PJ et al. (2016) Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans Pattern Anal Mach Intell 38(8) 38. Heo B, Yun K, Choi JY (2017) Appearance and motion-based deep learning architecture for moving object detection in moving camera. In: International conference in image processing, ICIP 39. Guo Y, Wang X (2021) Applying TS-DBN model into sports behavior recognition with a deep learning approach. J Supercomput 40. Zhang J, Ling C, Li S (2019) EMG signals based human action recognition via deep belief networks. In: IFAC paper online conference 41. Fragkiadaki K, Arbelaez P, Felsen P, Malik J (2015) Learning to segment moving objects in videos. In: IEEE Conference on computer vision and pattern recognition (CVPR) 42. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6) 43. Paredes-Valles F, Scheper KYW, de Croon GC (2018) Unsupervised learning of a hierarchical spiking neural network for optical flow estimation: from events to global motion perception. IEEE Trans Pattern Anal Mach Intell 44. Cai Z, Yang Y, Lin L (2020) Human action recognition and art interaction based on convolutional neural network. In: Chinese Automation Congress (CAC) 45. Amrutha CV, Jyotsna C, Amudha J (2020) Deep learning approach for suspicious activity detection from surveillance video. In: Proceedings of the second international conference on innovative mechanisms for industry applications (ICIMIA) 46. Mei Y, Jiang T, Ding X, Zhong Y, Zhang S, Liu Y (2021) WiWave: WiFi-based human activity recognition using the wavelet integrated CNN. In: IEEE/CIC International Conference on Communications in China 47. Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2021) Symbiotic graph neural networks for 3D skeleton-based human action recognition and motion prediction. IEEE Trans Pattern Anal Mach Intell 48. Nikolova D, Vladimirov I, Terneva Z (2021) Human action recognition for pose-based attention: methods on the framework of image processing and deep learning. In: 56th International scientific conference on information, communication and energy systems and technologies (ICEST)
Gesture Analysis Using Image Processing: For Detection of Suspicious …
529
49. Kajabad EN, Ivanov SV (2019) People detection and finding attractive area by the use of movement detection analysis and deep learning approach 8th International young scientist conference on computational science. Proc Comput Sci 50. Luo W, Liu W, Lian D, Tang J, Duan L, Peng X, Gao S (2019) Video anomaly detection with sparse coding inspired deep neural networks. IEEE Trans Pattern Anal Mach Intell 51. Murad A, Pyun JY (2017) Deep recurrent neural networks for human activity recognition. Sensor Sig Inf Process 52. Singh D, Merdivan E, Psychoula I, Kropf J, Hanke S, Geist M, Holzinger A (2017) Human activity recognition using recurrent neural networks. In: International cross-domain conference for machine learning and knowledge extraction: CD-MAKE 53. Hammerla NY, Halloran S, Plotz T (2016) Deep, convolutional, and recurrent models for human activity recognition using wearables. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence (IJCAI) 54. Li C, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using LSTM and CNN, IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, pp 585–590 55. Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1227–1236 56. Yang X, Tian Y (2012) Eigen joints-based action recognition using Naïve-Bayes-nearestneighbor, computer vision and pattern recognition workshops (CVPRW). IEEE Comput Soc Conf 57. Liu L, Shao L, Zhen X, Li X (2013) Learning discriminative key poses for action recognition. IEEE Trans Cybernet 43(6) 58. Liu L, Shao L, Rockett P (2013) Human action recognition based on boosted feature selection and Naive Bayes nearest-neighbors classification. Sig Process 59. Weng J, Weng C, Yuan J (2017) Spatio-temporal naive-bayes nearest-neighbor (ST-NBNN) for skeleton-based action recognition CVPR 60. Lu G, Zhou Y, Li X, Lv C (2015) Action recognition by extracting pyramidal motion features from skeleton sequences. Inf Sci Appl 251–258 61. Zhen X, Zheng F, Shao L, Cao X, Xu D (2017) Supervised local descriptor learning for human action recognition. IEEE Trans Multimedia 19(9) 62. Jaouedi N, Boujnah N, Htiwich O, Bouhlel MS (2016) Human action recognition to human behavior analysis. In: 7th International conference on sciences of electronics, technologies of information and telecommunications (SETIT) 63. Xu W, Yu J, Miao Z, Wan L, Ji Q (2019) Prediction-CGAN: human prediction with conditional generative adversarial networks. In: Knowledge processing and action analysis 64. Ahsan U, Sun C, Essa I (2018) DiscrimNet: semi-supervised action recognition from videos using generative adversarial networks. CVPR 65. Shen J, Dudley J, Kristensson PO (2021) The imaginative generative adversarial network: automatic data augmentation for dynamic skeleton-based hand gesture and human action recognition. In: 16th IEEE International conference on automatic face and gesture recognition 66. Liu J, Akhtar N, Mian A (2020) Adversarial attack on skeleton-based human action recognition. IEEE Trans Neural Netw Learn Syst 67. Degardin B, Neves J, Lopes V, Brito J, Yaghoubi E, Proenca H (2022) Generative adversarial graph convolutional networks for human action synthesis. Comput Vis Found 68. Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6) 69. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Leveraging Structural context models and ranking score fusion for human interaction prediction. IEEE Trans Multimedia 20(7) 70. Shi Y, Tian Y, Wang Y, Huang T (2017) Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Trans Multimedia 19(7) 71. Ke Q, Bennamoun M, An S, Boussaid F, Sohel F (2016) Human interaction prediction using deep temporal features In: ECCV 2016 Workshops, Part II, LNCS 9914, pp 403–414
530
P. Bhagat and Anjali. S. Bhalchandra
72. Amer MR, Todorovic S (2016) Sum-product networks for activity recognition. IEEE Trans Pattern Anal Mach Intell 38(4) 73. Liu L, Shao L, Li X, Lu K (2016) Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans Cybernet 46(1) 74. Singh D, Mohan CK (2017) Graph formulation of video activities for abnormal activity recognition. Pattern Recog 65:265–272 75. Xu W, Miao Z, Yu J, Ji Q (2019) Action recognition and localization with spatial and temporal contexts. Neurocomput J 76. Dong Z, Kong Y, Liu C, Li H, Jia Y (2012) Recognizing human interaction by multiple features. In: IEEE The first Asian conference on pattern recognition 77. Jyotsna C, Amudha J (2020) Deep learning approach for suspicious activity detection from surveillance video. In: Proceedings of the second international conference on innovative mechanisms for industry applications (ICIMIA)
Reliability Analysis of a Mechanical System with 3 Out of 5 Subsystems B. Yamuna, Radha Gupta, Kokila Ramesh, and N. K. Geetha
Abstract Optimization of a redundancy allocation problem designed specifically for systems with k-out-of-nsubsystems has been the motive of this work. The main motive is to select the components and redundancy levels to maximize system reliability given system-level constraints. With cost and weight as constraints, 3 out of 5 subsystems have been optimized in the present study for maximizing the reliability of a mechanical system. Component reliabilities in every phase are considered to be unknowns in this model. Evolutionary and Generalized Reduced Gradient method (GRG nonlinear solver) in Excel has been applied here to find the reliability of each component reliabilities, phase reliabilities, and to maximize reliability of system. A comparative study has been done using the methodology in the present paper with the methods already used by the other authors for the similar systems. Keywords Redundancy allocation · k out of n configuration · Evolutionary and Generalized Reduced Gradient method (GRG)
1 Introduction In complex system just by selecting the good quality components, it is very difficult to enhance the reliability of the system. In many cases, to improve or enhance the reliability of the total system, it is desired to have a redundant and backup elements and subsystems. k out of n configuration is a special case of parallel redundancy. This type of configuration requires at least k components to succeed out of the total n parallel components for the system to succeed. In this paper, the main objective is to maximize the system reliability of mechanical system subject to cost and weight constraints using 3 out of 5 parallel configurations. B. Yamuna (B) · R. Gupta · N. K. Geetha Department of Mathematics, Dayananda Sagar College of Engineering, Bangalore, Karnataka, India e-mail: [email protected] K. Ramesh Department of Mathematics, FET, Jain (Deemed-to-be-University), Bangalore, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_39
531
532
B. Yamuna et al.
2 Literature Survey Different type of research activity is continued in solving problems on optimization of redundancy allocation. Barlow and Heidtmann [1] were the first to propose the reliability computations for system having k out of n subsystem. Barlow and Heidtmann [1], Jain and Gopal [2], Risse [3] and Sarje and Prasad [4]. Many others have generalized the k out of n system and applied it to different cases and problems. Chiang and Niu [5] introduced the consecutive k out of n systems and Hwang [6], Derman et al. [7]. Shanthikumar [8], Ge and Wang [9] proposed the optimum algorithms for the system reliability. Yamamoto and Miyakawa [10] have found the reliability finding model with exact reliability, and Koutras et al. [11] have provided two-dimensional consecutive models bounds for rectangular. Pham came out with an optimal design process for k out of n subsystems which are redundant in order to minimize the total cost and also to find the number of units which is optimal for a system consisting of only one k out of n subsystem. Pham [12] also demonstrates k out of n subsystem to optimally determine the most economical number of components in this case to determine the optimal values of k (for fixed n) and n (for fixed k) to minimize the mean total cost of k out of n subsystems. Suich and Paterson [13] also derived several single k out of n subsystems for the cost models and presented a numerical solution for k and n that minimizes the total cost of a k outof n system. Pham and Malon [14] consider the problem of achieving optima system size n and a threshold k for k out of n subsystem with competing failure mods. Describe the problem of determining the optimal number of redundant units in k out of n subsystems with common cause failures (CCFs). Chiang and Chiang studied the method to choose an optimal k in k out of n system with no closed form solution. The Huang et al. [15] proposes different states of system where k varies in multi-state k out of n system. The research of Hunag et al. [15] and Tian et al. [16] shows variable K in different states of the system, in each state, the number of components required to be dynamic in nature performance assessment of general multi-state k out of n system. Zuo and Tian [17] to expand the model of Huang et al. [15] from the generalized multi-state k out of n : G system to generalize multi-state k out of n : F system. The research of Chen and Yang [18] shows reliability of two-stage weighted k-out-of-n systems with shared components. Huang et al. [15] derived practical applications to modify the required number of components of multiple states in each state, whereas Coit and Smith [19] they came out a subsystem having non-identical mixing of components with a binary state. The required number of components for their idea of system is changing for each subsystem which could be compatible for the possible instance which could be an application. Suresh Babu et al. [8] worked on the multiple constraints problem first use Lagrangian multipliers method to provide a real-valued solution which may be practically impossible to implement for finding reliability of the system. Sridhar et al. [20] have made an attempt to maximize the reliability of k-out of n system using redundant integrated reliability model with multiple constraints. Lagrangian approach of solving the reliability system has been considered using dynamic programming.
Reliability Analysis of a Mechanical System with 3 Out of 5 Subsystems
533
Lakshminarayana and Vijaya Kumar [21] for the given mathematical function to establish the results by using Lagrangian multiplier method, but these results is real values of number of components required, and real values are rounded off to the next integer value resulting in major variation in all the constraints. But the dynamic programming results in very little variation in all the constraints. The failure modes effects and critical analysis have been made to find and correct the causes and also to compliment the results obtained through the dynamic programming. Watada and Melo [22] proposed an EDA and RAP to maximize the availability of a system while reducing the cost, volume or weight. Kokila Ramesh and Radha Gupta [23] found the reliability of the system by computing the step reliabilities and the component reliabilities forms an integrated reliability model. By using Lagrangian method and random search technique, Hu et al. [24] analyzed the reliability of non-repairable phased mission system (PMS) with structure phase and requirements. Rykov et al. [25] support preventive maintenance and condition of a subsea pipeline of an unmanned underwater vehicle. (Rykov et al. [25]) Failure of the components leads to increasing the load of other, and this reduces the residual lifetime. Singh and Kumar [3] explained optimization of reliability using hardware redundancy for a parallel BB-BC-based approach. In the present paper, the main motive is to select the components and redundancy levels to maximize system reliability given system-level constraints. With cost and weight as constraints, a 3 out of 5 subsystems have been optimized in the present study for maximizing the reliability of a mechanical system. Component reliabilities in all phases are consider to be unknown in this model. Evolutionary and Generalized Reduced Gradient method (GRG nonlinear solver) in Excel has been applied here to find the reliability of each component, reliability of each phase and to maximize reliability of system. A comparative study has been done using this methodology.
3 Mathematical Model I The reliability model with the objective function of maximizing the system reliability of having the cost and weight constraints is: Maximize Q s =
n
Q j , where Q j =
j=1 n
Subject to :
xj xj q kj (1 − q j )x j −k k k=2
c j x j ≤ C0
j=1 n
w j x j ≤ W0
j=1
x j ≥ 0∀ j where Qs —System reliability
(1)
534
B. Yamuna et al.
Qj —Phase reliability qj —Component reliability x j —Number of components in phase j is assumed to be constant cj —Cost coefficient each component in stage j wj —Weight coefficient of each component in stage j C 0 —Maximum allowable system cost W 0 —Maximum allowable system weight.
4 Problem Formulation The decision variables in reliability optimization problems are the total number of real number redundancies, the real-valued component reliabilities, or a combination of both. In this study, the construction of integrated reliability models under the given mathematical function is being taken into account, with j phases being statistically independent in series and x j components being statistically independent in each phase. Case (i): Coefficient of cost of each component in jth stage is derived using the h1 h c association between reliability and cost given by q j = g jj j i.e.,c j = g j q j j as cost constraint. Using the same relation between the reliability and weight is given v by w j = u j q j j as weight constraint. Since x j is linear in cost constraint, substituting the above relations in the mathematical model I constraints, the following equations are obtained: n
n
h v g j q j j x j − C0 ≤ 0 and u j q j j x j − W0 ≤ 0
j=1
j=1
Hence, the model I becomes xj n xj q kj (1 − q j )x j −k k j=1 k=2
n h g j q j j x j − C0 ≤ 0 j=1
n v u j q j j x j − W0 ≤ 0
Qs = Subjected to
(2)
j=1
with non negativity restriction q j ≥ 0 Case (ii): Coefficient of cost of each component in jth phase is derived using hj
the association between reliability and cost given by c j = g j e 1−q j as cost constraint. Using the same relation between the reliability and weight is given
Reliability Analysis of a Mechanical System with 3 Out of 5 Subsystems
vj
535
by w j = u j e 1−q j as weight constraint. Substituting the above relations in the mathematical model I constraints, the following equations are obtained: n
gje
hj 1−q j
x j − C0 ≤ 0
j=1
v n j u j e 1−q j x j − W0 ≤ 0 j=1
Hence, the model I becomes xj n xj q kj (1 − q j )x j −k Qs = k j=1 k=2 h n j g j e 1−q j x j − C0 ≤ 0 Subject to j=1 v n j u j e 1−q j x j − W0 ≤ 0
(3)
j=1
Non negativity restriction q j ≥ 0
5 Methodology Evolutionary method in Excel solver mimics genetic algorithm. This method tries to find the optimal or near optimal solutions. Evolutionary method deals with population of solutions and huge search space, hence is slow but effective and comparable with GRG nonlinear method. The solver finds the solution of the given problem in two different kinds of reports, viz. answer and population report. In answer report, the values of the objective function, decision variables, constraints and binding details of the problem are given. In population report, the mean value and the SD of the variables and constraints are given. Generalized Reduced Gradient (GRG nonlinear) method is mostly effective in solving the nonlinear optimization problems. The objective function and the constraints are assumed to be smooth variables for nonlinear functions. GRG method is fast compared to evolutionary method. The solver generates three different kinds of reports while finding a solution to a given problem. These reports are sensitivity, answer and limits reports. The answer report exhibits the values of the objective function, decision variables and constraints binding the details of the problem. The sensitivity report shows the constraints reduced gradient and the Lagrange multiplier variables. The limit report shows the upper and lower bounds of the variables.
536
B. Yamuna et al.
Table 1 Cost and weight constants for polynomial function Stage (j)
Cost constants
Weight constants uj
vj
gj
hj
1
25
2
25
2
2
75
3
150
3
3
55
4
155
4
Table 2 Cost and weight constants for exponential function Stage (j)
Cost constants
Weight constants
gj
hj
uj
vj
1
100
0.7
100
0.9
2
120
0.8
150
0.8
3
94
0.5
110
0.7
Case (i): For solving the reliability model II using Evolutionary and GRG h nonlinear methods in Excel solver for the polynomial function c j = g j q j j , data in Table 1 has been taken from [25] to find the system reliability. Case (ii): For solving the reliability model III using Evolutionary and GRG nonlinear methods in Excel solver for the exponential function c j = g j e data in Table 2 has been taken from [25] to find the system reliability.
hj 1−q j
,
6 Numerical Results The present work maximizes the system reliability of a mechanical system with 3 out of 5 subsystems as objective function. The mathematical relation between the constraints (cost and weight) and component reliability is used to maximize the reliability of the problem. The solution of system reliability, component reliabilities and stage reliabilities for model II and III is obtained by using Evolutionary and GRG nonlinear solver method in Excel solver. h
Reliability Model II: For the given polynomial function c j = g j q j j , maximum value of the system cost of Rs. 250 and weight 400 kgs is considered as per the reference given in [25], to determine optimum component reliability q j , phase reliability (Q i ) and the reliability of system (Q s ).
Reliability Analysis of a Mechanical System with 3 Out of 5 Subsystems
537
Table 3 System reliability for reliability model II using Evolutionary method Stage (j)
qj
Qj
xj
cj
wj
xjcj
x j wj
1
0.79
0.99
5
15.67
15.67
78.35
78.35
2
0.63
0.93
5
18.74
37.47
93.68
187.36
3
0.65
0.87
5
9.53
26.86
47.66
134.30
Total cost and total weight
219.68
400.01
System reliability (Q s )
0.87
Table 4 System reliability for reliability model II using GRG nonlinear method Stage (j)
qj
Qj
xj
cj
wj
1
0.79
0.99
5
15.59
15.59
77.93
77.93
2
0.63
0.93
5
18.92
37.85
94.62
189.24
3
0.64
0.87
5
9.43
26.56
Total cost and total weight System reliability (Q s )
xjcj
x j wj
47.13
132.81
219.68
399.98
0.87
7 Results Obtained Using Evolutionary and GRG Nonlinear Method Tables 3 and 4 give the reliability of system related to the cost and weight of each component with their reliabilities for reliability model II, using Evolutionary and GRG nonlinear method, respectively, in Microsoft Excel.
hj
Reliability Model III: For the given exponential function c j = g j e 1−q j , maximum value of the system cost of Rs. 5000 and weight 7500 kgs is considered, as per the reference given in [25], to determine optimum component reliability q j , stage reliability (Q i ) and the system reliability (Q s ).
8 Results Obtained Using Evolutionary and GRG Nonlinear Method Tables 5 and 6 give the reliability of system related to the cost and weight of each component with their reliabilities for reliability model III using evolutionary and GRG nonlinear method in the interval (0.5, 0.95), respectively, in Excel solver. Similarly, Tables 7 and 8 give the system reliability for reliability model III in the interval (0.6, 0.95).
538
B. Yamuna et al.
Table 5 System reliability for reliability model III using evolutionary method in (0.5, 0.95) Stage (j)
qj
Qj
xj
cj
wj
xjcj
x j wj
1
0.5
0.81
5
405.52
604.96
2027.60
3024.82
2
0.5
0.66
5
594.36
742.95
2971.82
3714.77
3
0.5
0.54
5
255.52
446.07
1277.59
2230.36
6277.01
8969.96
Total cost and total weight System reliability(Q s )
0.536377
Percentage variation in cost and weight (%)
25.54
19.59
Table 6 System reliability for reliability model III using GRG nonlinear method in (0.5, 0.95) Stage (j)
qj
Qj
xj
cj
wj
xjcj
x j wj
1
0.5
0.81
5
405.52
604.96
2027.60
3024.82
2
0.5
0.66
5
594.36
742.95
2971.82
3714.77
3
0.5
0.54
5
255.52
446.07
1277.59
2230.36
Total cost and total weight
6277.01
8969.96
System reliability (Q s )
0.536377
Percentage variation in cost and weight (%)
25.54
19.59
Table 7 System reliability for reliability model III using evolutionary method in (0.6, 0.95) wj
Stage (j)
qj
Qj
xj
cj
1
0.6
0.91
5
575.46
948.77
2877.30
4743.87
2
0.6
0.83
5
886.69
1108.36
4433.43
5541.79
3
0.6
0.76
5
328.09
633.01
1640.46
3165.03
Total cost and total weight
8957.71
13,458.84
System reliability(Q s )
0.761136
Percentage variation in cost and weight (%)
xjcj
x j wj
79.15
79.45
Table 8 System reliability for reliability model III using GRG nonlinear method in (0.6, 0.95) wj
Stage (j)
qj
Qj
xj
cj
xjcj
x j wj
1
0.6
0.91
5
575.46
948.77
2877.30
4743.87
2
0.6
0.83
5
886.69
1108.36
4433.43
5541.79
3
0.6
0.76
5
328.09
633.01
1640.46
3165.03
Total cost and total weight
8951.20
13,450.69
System reliability (Q s )
0.760948
Percentage variation in cost and weight (%)
79.02
79.34
Reliability Analysis of a Mechanical System with 3 Out of 5 Subsystems
539
9 Discussion It is observed from Tables 3 and 4, that the system reliability obtained using Evolutionary and GRG nonlinear method in Excel solver is 0.87 for reliability model II. The system reliability obtained this way is comparable with the results obtained using Lagrangean method and random search technique (i.e., 0.81 and 0.82, respectively) as discussed in ref. no.19. The total cost and total weight are nearly same or less the than the given values. Also it is observed from Tables 5 and 6, that the system reliability obtained using Evolutionary and GRG nonlinear method for reliability model III is approximately 0.54, and the percentage variance of cost and weight is 25.54% and 19.59%, respectively. In the similar way, it is seen from the results in Tables 7 and 8, the system reliability from Evolutionary and GRG nonlinear method for the reliability model III is approximately 0.76, with the percentage variance of cost and weight as in Evolutionary method is 79.15% and 79.45%, in GRG nonlinear 79.02% and 79.34%, respectively.
10 Conclusions The reliability models II and III are considered to determine the system reliability, stage reliability and the component reliabilities for mechanical systems with k -out-of- n subsystems using mathematical functions. These reliabilities have been calculated using Evolutionary method and GRG nonlinear method in Excel solver. Both the techniques provide efficient solutions for complex problems. GRG nonlinear method is more conventional and deals with solutions of optimization problems using nonlinear functions, whereas evolutionary algorithm works with a population of solutions and gives a workable solution of all the worst solutions. An additional advantage of working with evolutionary algorithm is the multiple solutions which are obtained during the search of globally best solution. We have applied evolutionary algorithm for 300 population size with 0.0001 precision and the results obtained are workable, though not the best precision. It can be observed from Tables 5, 6, 7 and 8, that we have taken two different solutions for two reliability models using Evolutionary and GRG nonlinear method. One can have a trade-off between the reliability of the system and the cost and weight constraints. Note that in reliability model III when reliability of system is 0.54, the cost and weight variance is less, when reliability of system is 0.76, the cost and weight variance is more. Hence, one must compromise or ease on cost and weight restrictions, if the system reliability has to improve. Hence, it is concluded that both the techniques used to find the system reliability are competent enough to be used for any mechanical system with k out of n subsystems.
540
B. Yamuna et al.
References 1. Barlow RE, Heidtmann KD (1984) Computing k-out-of-n system reliability. IEEE Trans Reliab 33:322–323 2. Sarje AK, Prasad EV (1989) An efficient non-recursive algorithm for computing the reliability of k-out-of-n systems. IEEE Trans Reliab R-38(2):234–235 3. Singh A, Kumar S (2015) Reliability optimization using hardware redundancy: a parallel BB-BC based approach. Int J Electr Commun Eng (IJECE) 2278–9901, ISSN (P) 4. Rykov V, Ivanova N, Kochetkova I (2022) reliability analysis of a load-sharing k-out-ofn system due to its components failure. Mathematics 2457. https://doi.org/10.3390/math10 142457 5. Chiang DT, Niu SC (1981) Reliability of a consecutive-k-out-of-n: F system. IEEE Trans Reliab 30:87–89 6. Hwang FK (1982) Fast solutions for consecutive-k-out-of-n:F system. IEEE Trans Reliab R31(5):447–448. Reliability of a Mechanical System with k-out-of-n Subsystems. SCOPUS Index J 7187. www.tjprc.org ([email protected]) 7. Derman D, Lieberman G, Ross S (1982) On the consecutive-k-out-of-n: F system. IEEE Trans Reliab 31:57–63 8. Shantikumar JG (1982) Recursive algorithm to evaluate the reliability of consecutive k-out-ofn: F system networks. IEEE Trans Reliab 31:84–87 9. Ge G, Wang L (1990) Exact reliability formula for consecutive-k-out-of-n: F systems with homogeneous markov dependence. IEEE Trans Reliab. 39(5) 10. Yamamoto H, Miyakawa M (1995) Reliability of a circular connected -(r, s)-out-of-(m, n): F Lattice system. J Oper Res 39(3) 11. Koutras MV (1997) Consecutive k, r-out-of-n: DFM systems. Microelectr Reliab 37:597–603 12. Pham H (1996) On the optimal design of k-out-of-n: G subsystems. IEEE Trans Reliab 45(2):254–260 13. Suich RC, Patterson RL (1991) k-out-of-n: G systems; some cost considerations. IEEE Trans Reliab. 40:259–264 14. Pham H, Malon DM (1994) Optimal design of systems with competing failure modes. IEEE Trans Reliab 43(2):251–254 15. Huang J, Zuo MJ, Wu YH (2000) Generalized multi-state k-out-of-n: G system. IEEE Trans Reliab 1:105–111 16. Suresh Babu SV, Maheswar D, Ranganath G, Vijaya Kumar Y, Sankaraiah G (2012) redundancy allocation for series parallel systems with multiple constraints and sensitivity analysis. IOSR J Eng 2(3):424–428 17. Zuo M, Tian Z (2006) Performance evaluation of generalized multi-state k-out-of-n systems. Rel IEEE Trans 55(2) 18. Chen Y, Yang Q (2005) Reliability of two-stage weighted k-out-of-n systems with components in common. IEEE Trans Reliab 54(3):431–440 19. Coit DW, Smith AE (1996) Reliability optimization of series-parallel systems using a genetic algorithm. IEEE Trans Reliab 45(2):254–260 20. Sridhar A, Pavan Kumar S, Raghunatha Reddy Y, Sankaraiah G Umashankar C (2013) The k-out-of-n redundant IRM optimization with multiple constraints. IJRSAT 1(2) 21. Lakshminarayana KS, Vijay Kumar Y (2013) Reliability optimization of integrated reliability model using dynamic programming and failure modes effects and criticality analysis. J Acad Indus Res 1(10):622–625 22. Watada J, Melo H (2014) An estimation of distribution algorithm approach to redundancy allocation problem for a high security system. IEEJ 3(4):358–367 23. Ramesh K, Gupta R (2020) reliability of a mechanical system with k-out of n subsystems. Int J Mech Prod Eng Res Develop 10:7179–7188
Reliability Analysis of a Mechanical System with 3 Out of 5 Subsystems
541
24. Jain SP, Gopal K (1985) Reliability of k-to-l-out-of-n systems. Reliab Eng 12:175–179 25. Rykov V, Kochueva I, Farkhadov M (2021) Preventive maintenance of a k-out-of-n system with applications in subsea pipeline monitoring. J Mar Sci Eng 9:85. https://doi.org/10.3390/ jmse9010085
Disaster Analysis Through Tweets Anshul Sharma, Khushal Thakur, Divneet Singh Kapoor, Kiran Jot Singh, Tarun Saroch, and Raj Kumar
Abstract Social media has assumed a huge part in scattering data about these disasters by permitting individuals to share data and request help. During disaster, social media gives a plenty of data which incorporates data about the idea of disaster, impacted individuals’ feelings and aid ventures. This data proliferated over the social media can save great many life by alarming others, so they can make a hesitant move. Numerous offices are attempting to automatically dissect tweets and perceive disasters and crises. This sort of work can be advantageous to a great many individuals associated with the Web, who can be alarmed on account of a crises or disaster. Twitter information is unstructured information; in this manner, natural language processing (NLP) must be performed on the Twitter information to arrange them into classes–“Connected with Disaster” and “Not connected with Disaster.” The paper does an expectation on the test set made from the first informational collection. It does an exactness testing of the classifier model created. This paper involves Naive Bayes classification mechanism for building the classifier model and for making predictions. Keywords Twitter analysis · Natural language processing · Classifier model · Naive Bayes classification
1 Introduction Nearly, 2.9 trillion people were affected by natural disasters during the period 2000 and 2012 causing damages exceeding $1.7 trillion. Damages in 2011 set the record high, reaching whoopingly $371 billion while in 2012 for the third consecutive year the damages crossed $100 billion [1]. Research through Internet-based media in A. Sharma · K. Thakur (B) · D. S. Kapoor · K. J. Singh Electronics and Communication Engineering Department, Chandigarh University, Mohali, Punjab 140413, India e-mail: [email protected] T. Saroch · R. Kumar Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab 140413, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_40
543
544
A. Sharma et al.
disaster occasions is growing, more expressively, interest in miniature publishing the content to a blog, in case of emergencies are on the climb. Early investigation shows that essential modern and on the spot updates can be found in miniature blog messages about a spreading out crisis, hurrying a premium in overwhelming handling and insight mechanical assemblies. Social media are intuitive innovations that work with the creation and sharing of data, thoughts, interests, and different types of articulation through virtual communities and networking [2]. These days there are numerous social media stages like Facebook, Instagram, Twitter, YouTube, Blogger, and so forth and this undertaking just work on Twitter. Thus, Twitter is a well-known miniature publishing content to a blog social media benefits that permits client to share short messages called tweets which are basically of 160 words or less [3]. In the era of social media, miniature online journals are frequently used to share data and track overall population assessment during any human-influencing occasion, for example, sports matches, political decisions, natural disasters, and so on; many relief organizations screen this sort of information routinely to distinguish disasters. This sort of work might be useful to a huge number of clients on the Web, who can be alarmed on account of a crises or disaster. In any case, it is outside the realm of possibilities for people to physically take a look at the mass measure of information and recognize disasters in real time. For this reason, many examination works have been proposed to introduce words in machine-understandable portrayals and apply AI strategies on the word portrayals to distinguish the opinion of a text. Work by Saroj and Pal [4] principally implies two natural perils grassfires in Oklahoma and red stream flood. Information for examination were acquired through Twitter search API. The catchphrases #redriver and #redriver were utilized to get red stream flood tweets though #Oklahoma, #okfire, #grassfire were utilized to recover Oklahoma grassfire tweets. Goswami and Raychaudhuri [5] had accomplished a similar work for identification of disaster tweets utilizing natural language processing, and their end-product incorporates the accuracy of 71.5%. Vieweg had done his research for mainly two disasters, i.e., grassfires and red stream flood, whereas Shriya and Debaditya have done the same research, but were able to achieve the accuracy of 71.5% or 0.715. Furthermore, advances in technologies like Internet of Things, wireless sensor networks, and computer vision can be used to develop newer multi-domain solutions [6–11]. Unlike Vieweg research, the work presented in the paper is designed for all types of disaster, and this work additionally has a principle objective of expanding the accuracy with thinking about the other performance metrics. Present paper presents natural language processing with disaster tweets where the main objective has been kept to identify whether a specific tweet is about a real disaster or not. Present paper is coordinated as follows: Section 2 introduces the foundation of dataset. Section 3 introduces proposed system. Section 4 introduces the outcome of the present work. Section 5 concludes and introduce future extent of our concern proclamation.
Disaster Analysis Through Tweets
545
Table 1 Training dataset headers Id Keyword Location Text
Target
0
0 NaN
NaN
Our deeds are the reason of this #earthquake May ALLAH forgive us all
1
1
2 NaN
NaN
Forest fire near La Ronge Sask. Canada
1
2
3 NaN
NaN
All residents asked to “shelter in place” are being notified 1 by officers. No other evacuation or shelter in place orders are expected
3
9 NaN
NaN
13,000 people receive #wildfires evacuation orders in California
1
4 11 NaN
NaN
Just got sent this photo from Ruby #Alaska as smoke from #wildfires pours into a school
1
2 Background of Datasets Normal language processing (NLP) is a part of artificial intelligence that helps PCs comprehend, decipher, and use human dialects. NLP permits PCs to speak with individuals, utilizing a human language. Normal language processing likewise furnishes PCs with the capacity to understand text, hear discourse, and decipher it. For this venture, NLP for feeling examination or text understanding is utilized [12]. The datasets utilized for this task are taken from Kaggle [13] with more than 10,000 tweets extracted from Twitter through Twitter API. Train informational index contains four elements, i.e., ID, area, text, and target, as shown in Table 1. Location is the component where area for the tweet is given. Text highlight contains the genuine text of the tweet got from Twitter. The target is the element of a dataset concerning which you need to acquire a more profound arrangement. An administered AI calculation utilizes verifiable information to learn designs [14] and uncover connections between different elements of your informational index and the objective. For test dataset, there are three elements, i.e., ID, location, and text, as shown in Table 2. ID highlights contain the essential ID utilized for recognizable proof of the tweet; anyway for this work, there is no prerequisite for this part, and in future, it will be dropped. Location contains the area of the real tweet. Text includes the tweet only, same as in train dataset.
3 Proposed Framework In this section, we discuss in detail the various processes of our system methodology. The system is composed of three different modules: 1. Extraction of data, 2. Sorting of data, and 3. Analysis of data.
546
A. Sharma et al.
Table 2 Test dataset headers Keyword
Location
Text
0
Id 0
NaN
NaN
Just happened a terrible car crash
1
2
NaN
NaN
Heard about #earthquake is different cities, stay safe everyone
2
3
NaN
NaN
There is a forest fire at spot pond; geese are fleeing across the street, I cannot save them all
3
9
NaN
NaN
Apocalypse lighting. #Spokane #wildfires
4
11
NaN
NaN
Typhoon Soudelor kills 28 in China and Taiwan
a. Data extraction Data extraction is mainly done with the help of Twitter API that will provide data for the project in CSV format. Then, it will be easier to fetch and modify data according to the problem statement. b. Sorting of data There are so many superfluous/undesirable text or images, accordingly this interaction requires pre-handling. Need to eliminate #hashtags, numbers, halting words, (for example, a, the, an, there, and so forth), images, and so on; there should be likelihood for similar tweets due to retweets or copies in information. This can be taken out by utilizing specific orders accessible in pandas. The put away tweets are currently examined, and their calamity not really settled [15]. This is finished by really looking at each tweet against a bunch of predefined gauged keywords. The tweets which don’t have a place with any class are promptly disposed of. The tweets which don’t have a place with any classification are promptly disposed of. c. Analysis of data Investigation of information is most significant angle for this issue articulation. This table contains some extraordinary word reference of words which will look for the words in the tweets. Based on looking and matching, tweets will be considered as a genuine or fake disaster tweets. Model tuning permits to alter models, so they create the most dependable results and give profoundly important experiences into information, empowering us to settle on the best choices. Then, at that point, foreseeing out the best models out of them with most noteworthy precision by contrasting various models like Naive Bayes, extra tree classifier, random forest, and so forth and evaluating distinctive execution boundaries like normalization, feature selection, and some outlier removal.
Disaster Analysis Through Tweets
547
3.1 Pseudo-code Step 1: Start Step 2: Installing PyCaret [16] on the machine. Step 3: Install/import other necessary packages (Mentioned below): • • • •
NLTK—All, stopwords Pandas, Numpy, OS Warnings [17]—simplefilter/ignore MatPlotlib [18]—matplotlib/pyplot
Step 4: Import Train Dataset using Pandas. Step 5: Clearing Tweet text (tweets from train dataset) and dropping unnecessary columns. Step 5.1: Removing punctuation Step 5.2: Removing StopWords Step 6: Building Machine Learning model. Step 6.1: importing classification module. Step 6.2: Setting up the environment. Step 6.3: Building different classifier models (Naive Bayes, Extra tree model). Step 6.4: Compare models. Step 7: Evaluate Model. Step 7.1: Building Confusion matrix for Naive Bayes classifier. Step 7.2: Building Confusion matrix for Extra Tree classifier. Step 7.3: Calculating different performances matrices. Step 8: Import the Test dataset using Pandas. Step 9: Again clearing tweet text and dropping unnecessary columns. Step 9.1: Removing punctuation Step 9.2: Removing StopWords Step 10: Creating a prediction model. Step 11: Saving the model. Step 12: Load the model. Step 13: Analyze the model by plotting different charts
4 Algorıthm Used for Classıfıcatıon 4.1 Naive Bayes Classifier Naive Bayes classifiers are a grouping of calculation or estimations subject to Bayes’ theorem. They are among the most straightforward Bayesian association models, yet combined with piece thickness assessment, they can accomplish higher exactness levels. It is not a single algorithm but a group of calculations where every one of them share a typical standard, for example, each pair of elements being characterized
548
A. Sharma et al.
is free of one another. They are among the easiest Bayesian organization models, yet combined with piece thickness assessment, they can accomplish higher exactness levels. Beside this classifier, another classifier known as extra-tree classifier is also used. As both are having same accuracy but considering other factors like time taken, recall was not comparable to Naive Bayes classifier that is why extra-tree classifier is neglected.
4.2 Results Using different classifiers model from PyCaret package, the best model suited for this project was found to be the Naive Bayes classifier with SD and mean accuracy of 0.0248 and 0.7347, respectively, as shown in Table 3. This was the most noteworthy among the wide range of various classifiers, for example, extra-tree classifier, K-nearest neighbor classifier have the mean accuracy of 0.7347 and 0.7118, respectively, and others as shown below. Using different classifiers model from PyCaret package, the best model suited for this project was found to be the Naive Bayes classifier with SD and mean accuracy of 0.0248 and 0.7347, respectively. This was the most noteworthy among the wide range of various classifiers, for example, extratree classifier, K-nearest neighbor classifier have the mean accuracy of 0.7347 and 0.7118, respectively, and others as shown below. Then, different plots like AUC-ROC curves, boundary plot, and precision-recall have been plotted for Naive Bayes classifier. AUC-ROC curve depicts the rate of correctness of the classification model used in the project, i.e., Naive Bayes classification model. Figure 1 depicts that rate for both true and fake disaster tweets is 0.79. As shown in Fig. 2, the initial point (0, 0), the threshold is set to 1.0 so that there would be no discrimination both true or fake disaster tweets. And, at final point (1, 1), the threshold is set to 0.0 so that there will be classification between true and fake disaster tweets. More the curve curved toward (1, 1), more is the precision, and we have to make the curve more toward (1, 1). For evaluating our model, confusion matrix is developed as shown in Fig. 3. TP + TN TP + TN + FP + FN 620 + 1037 = 0.7232 = 620 + 1037 + 347 + 287
From confusion matrix, Accuracy =
From confusion matrix, Precision = From confusion matrix, Recall =
TP 620 = = 0.6411 TP + FP 620 + 347 TP 620 = = 0.6888 TP + FN 620 + 280
Extra tree classifier
Random forest classifier
Decision tree classifier
K neighbors classifier
Gradient boosting classifier
AdaBoost classifier
Quadratic discriminant classifier
Logistic regression
Dummy classifier
Ridge classifier
SVM—linear kernel
Linear discriminant analysis
et
rf
dt
knn
gbc
ada
qda
lr
dummy
ridge
svm
lda
Accuracy
0.2249
0.4731
0.5646
0.5676
0.6232
0.6288
0.6675
0.6731
0.6915
0.7118
0.7315
0.7347
0.7347
GuassianNB (priors = None, var_smoothing = 1e − 09)
Naive Bayes
nb
Model
Table 3 Best fit model evaluation AUC
0.232
0
0
0.5
0.6182
0.5748
0.7059
0.7343
0.7379
0.6994
0.7777
0.7666
0.7937
Recall
0.2175
0.6866
0.069
0
0.2494
0.1758
0.296
0.3295
0.6029
0.6081
0.5252
0.5625
0.694
Prec
0.2001
0.302
0.2404
0
0.4491
0.8509
0.8291
0.7987
0.6567
0.6898
0.7836
0.7627
0.6935
F1
0.2079
0.0.4194
0.0976
0
0.2937
0.287
0.4328
0.4641
0.628
0.6452
0.6281
0.6464
0.6933
Kappa
MCC
0.0479
0.0481
0.0166 − 0.028
0.012
0
0.1755
0.2592
0.3387
0.3426
0.3669
0.4075
0.4523
0.4558
0.4601
− 0.003
0
0.162
0.1649
0.2666
0.284
0.3654
0.4044
0.4306
0.442
0.4596
96.072
7.192
29.943
0.083
2.509
41.663
7.223
28.016
4.585
2.164
15.428
22.47
0.473
TT (s)
Disaster Analysis Through Tweets 549
550
A. Sharma et al.
Fig. 1 ROC curves for Gaussian NB
Fig. 2 Precision recall curve for Gaussian NB
Now, F1Score =
2 × (0.6411 × 0.6888) 2 × (Precision × Recall) = = 0.6640 (Precision + Recall) (0.6411 + 0.6888)
Conclusion obtained from confusion matrix.
Disaster Analysis Through Tweets
551
Fig. 3 a Confusion matrix for Gaussian NB b confusion matrix with values
As can be seen from Table 4, all of them are very close to the result obtained while building model or while comparing model. After creation and tuning of the training dataset, the trained model was used for making predictions. PyCaret builds a pipeline of all the steps and will pass the unseen data into the pipeline and give us the results. Figure 4 shows classification of dataset. As per Fig. 5, among the many calamities, fire received the most tweets. Other catastrophes (whether man-made or natural) include disaster like bombing, accident, flames, etc. Although word clouds may have been used to express the same thing, this graph worked better for this assignment.
552 Table 4 Performance of Gaussian NB
A. Sharma et al. Parameter
In value
In percentage (%)
Accuracy
0.7232
72.32
Precision
0.6411
64.11
Recall
0.6888
68.88
F1-score
0.6640
66.40
Fig. 4 Classification of tweets into disaster/non-disaster
Fig. 5 Most commonly used words in disaster tweets
Disaster Analysis Through Tweets
553
5 Conclusion This task depends on natural language processing (NLP). This task sorts a specific tweet is a real disaster or not. This can be valuable to countless clients on the Web, who can be alarmed on account of a crises or disaster. This work successfully achieved the increment in accuracy to 0.7373. This task as finally working with 0.7303 of precision. It had anticipated 10,875 tweets which has 56.1% non-disaster tweets, and rest 43.9% contains disaster tweets (analyzed in this task). The most common form of disaster found with this analysis is fire. The primary target of this task was to build the accuracy regarding the other performance metrics. There might be an amazing chance to build the precision while working on different boundaries too. With expansion in an Earth-wide temperature boost, there might be the chance of the disaster in a particular region, so this task can help us with seeing whether or not a tweet is a genuine calamity tweet. This task doesn’t manages any sort of area or geo-planning. In future, this can be created utilizing diverse AI and machine learning bundles.
References 1. Rayer Q et al. (2021) Water ınsecurity and climate risk: ınvestment ımpact of floods and droughts. Palgrave Stud Sustain Bus Assoc with Futur Earth 157–188. https://doi.org/10.1007/ 978-3-030-77650-3_6 2. Miño-Puigcercós R et al. (2019) Virtual communities as safe spaces created by young feminists: ıdentity, mobility and sense of belonging. Identities, Youth Belong 123–140. https://doi.org/ 10.1007/978-3-319-96113-2_8 3. Greco F, Polli A (2020) Emotional text mining: customer profiling in brand management. Int J Inf Manage 51:101934. https://doi.org/10.1016/J.IJINFOMGT.2019.04.007 4. Saroj A, Pal S (2020) Use of social media in crisis management: a survey. Int J Disaster Risk Reduct 48:101584. https://doi.org/10.1016/J.IJDRR.2020.101584 5. Goswami S, Raychaudhuri D (2020) Identification of disaster-related tweets using natural language processing. In: International conference on recent trends in artificial ıntelligence, Iot, smart cities & applications (ICAISC-2020), SSRN Electron J https://doi.org/10.2139/SSRN. 3610676 6. Jawhar Q et al (2020) Recent advances in handling big data for wireless sensor networks. IEEE Poten 39(6):22–27. https://doi.org/10.1109/MPOT.2019.2959086 7. Jawhar Q, Thakur K (2020) An improved algorithm for data gathering in large-scale wireless sensor networks. Lect Notes Electr Eng 605:141–151. https://doi.org/10.1007/978-3-03030577-2_12 8. Sachdeva P, Singh KJ (2016) Automatic segmentation and area calculation of optic disc in ophthalmic images. In: 2015 2nd International Conference Recent Advance Engineering Computer Science RAECS 2015. https://doi.org/10.1109/RAECS.2015.7453356 9. Sharma A et al. (2022) Exploration of IoT nodes communication using LoRaWAN in forest environment. Comput Mater Contin 71(2):6240–6256. https://doi.org/10.32604/CMC.2022. 024639 10. Sharma A, Agrawal S (2012) Performance of error filters on shares in halftone visual cryptography via error diffusion. Int J Comput Appl 45:23–30
554
A. Sharma et al.
11. Singh K et al. (2014) Image retrieval for medical imaging using combined feature fuzzy approach. In: 2014 International conference devices, circuits communication ICDCCom 2014—proceedings. https://doi.org/10.1109/ICDCCOM.2014.7024725 12. Chakravarthi BR et al. (2022) DravidianCodeMix: sentiment analysis and offensive language identification dataset for dravidian languages in code-mixed text. Lang Resour Eval 1–42. https://doi.org/10.1007/S10579-022-09583-7/TABLES/15 13. QaziUmair et al (2020) GeoCoV19. SIGSPATIAL Spec 12(1):6–15. https://doi.org/10.1145/ 3404820.3404823 14. Priyanka EB et al (2022) Digital twin for oil pipeline risk estimation using prognostic and machine learning techniques. J Ind Inf Integr 26:100272. https://doi.org/10.1016/J.JII.2021. 100272 15. Vera-Burgos CM, Griffin Padgett DR (2020) Using twitter for crisis communications in a natural disaster: hurricane harvey. Heliyon 6(9):e04804. https://doi.org/10.1016/J.HELIYON. 2020.E04804 16. GitHub—pycaret/pycaret: an open-source, low-code machine learning library in Python. https://github.com/pycaret/pycaret. Last accessed 05 April 2022 17. Warnings—Warning control—Python 3.10.4 documentation. https://docs.python.org/3/lib rary/warnings.html. Last accessed 05 April 2022 18. Matplotlib documentation—Matplotlib 3.5.2 documentation. https://matplotlib.org/stable/. Last accessed 05 April 2022
Design of an Aqua Drone for Automated Trash Collection from Swimming Pools Using a Deep Learning Framework Kiran Mungekar, Bijith Marakarkandy, Sandeep Kelkar, and Prashant Gupta
Abstract Water conservation is of prime importance to sustain life. An ample amount of water is used in swimming pools. Material contamination of water in a swimming pool occurs from various sources, viz., leaves of trees and plants from the surrounding area, plastic bottles, wearables for protection of eyes, ears, and hair left by people who swim in the pools. Removal of this trash is a challenging task. The existing trash collector boats being huge in size are primarily designed for cleaning rivers and seas; moreover, manual intervention is needed for its operation. These boats are not suitable to be used in swimming pools. This paper presents an automatic robotic trash boat for floating trash detection, collection, and finally accumulation of the floating trash using a conveyor machine. The system is portable, user-friendly, environmentally pleasant, and facilitates remote control operation. The system uses a deep learning model based on a modified YOLO framework. The prototype has superior performance compared to existing systems with respect to accuracy and time. Performance indicators, viz., accuracy, error rate, precision, recall F1 score and the quadratic weighted kappa were used to evaluate the prototype. The final trained network obtained 92% accuracy, 0.0167 error rate, 0.23 logloss, 92% precision, 92% recall, 92% F1 score, and 0.9665 kappa quadratic weighted value. The system would help in cleaning swimming pools thus preventing contamination of water due to this water replenishment cycle can be reduced. Keywords Aqua drone · Trash collection · CNN
K. Mungekar ThinkGestalt.Tech, Mumbai, India B. Marakarkandy (B) · S. Kelkar · P. Gupta Prin. L. N. Welingkar Institute of Management Development and Research (WeSchool), Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_41
555
556
K. Mungekar et al.
1 Introduction 1.1 Background and Objectives Accessories used by swimmers’, leaves from plants and trees, cans of beverages, plastic bottles, meal packaging containers, straws, and Styrofoam cups are some of the floating solids usually found in swimming pools particularly after a poolside party. Majority of the trash collector boats used are targeted at large water bodies, and commercially available boats require excessive amount of labor for operation of the system. The enormous size of the currently existing trash collection boats makes it challenging for utilizing the same in tiny areas, such as a swimming pool. The existing aquatic cleaning or trash-collecting boats does not have trash recognizing capacity. This becomes a major challenge as operating these machines require human intervention. The designed system has a reduced size, with automated detection, collection, and conveyor machine facilitates collection of floating solid waste. The boat can be controlled remotely making it very convenient to use. Based on the above facts, the following objectives are defined (1) To design and develop the aqua drone to collect the waste or litter present in a swimming pool. (2) To evaluate and optimize the performance of YOLO algorithms in real time used in the system. (3) To assess the automation running when cleansing the waste or litter which obliquely reduce the amount of manpower and time spent in manually cleaning the swimming pool.
2 Literature Review Considerable amount of research has been done in the field of application of various deep learning techniques for detecting the presence of trash or garbage. Survey of various research papers are done in this section. Researchers have examined a variety of region-based CNN detection methods used in computer vision, viz. R-CNN, Fast R-CNN, Faster R-CNN, HyperNet, SDP-CRC, G-CNN, and SSD. This section includes literature survey of various technical papers along with the advantages and limitations of each system mentioned above [1]. The deep learning model is commonly used in the computer vision and image processing, especially general and domain-specific object detection. To collect features from the input image or video, classification, and localization, most object detectors use deep learning algorithms as their back-end. Object detection is a computer vision and image processing technique that involves identifying instances of similar objects of a certain class in images or videos, such as a ball or a plastic bottle. Object detection aims to decide if an image contains any types of objects from certain categories and, if so, to return the position and classification of each object
Design of an Aqua Drone for Automated Trash Collection …
557
instance. For many real-world challenges, classification seems to be the crucial link for arriving at decisions. Classification is used to categorize objects into different classes based on their attributes. While image classification is giving image a class mark, object localization involves creation of a bounding box around objects in an image. Detection of objects combines the two processes and is more difficult than individual process. Both the problems are referred to as object detection. Regionbased convolutional network is the key to solve object localization and recognition problems [2]. Abdullah et al. [3] designed a boat system whose movement and speed must match the process of gathering floating solids for it to function successfully. However, it was found that if the radio connection is weak, there may be occasional connection loss. The quality of the image must be enhanced for efficient tracking. Their system primarily provides wireless remote control using radio waves but in certain situations, there may be connection losses necessitating better image quality. The problem can be resolved employing net connection open CV and convolutional neural network, and it can be made autonomous. System operates effectively depending on the speed with which floating solids are collected. The gaps identified in their system are connection loss in certain scenarios and quality of captured image. Thiagarajan and Satheesh Kumar [4] developed a method for classification of cups in trash using machine learning classification. They used positive and negative instances with a fully connected neural network to easily identify number of cups (litter) in a beach setting. A sliding window classifier is used on a Gaussian blurred image, and then non-maximum suppression is applied. Their system has an ability to easily identify the number of litters using fully connected neural network. The system however considers only one class for classification and faces a bounding box overlapping problem. Saadou-Yaye and Aráuz [5] the compensation mechanism provided the ability to complete tasks in lower instances, but additionally gave the users enough control to perform finer movements. The compensation mechanism does have its boundaries, and the system always continues to transport within the direction it was moving before the packet losses took place. To overcome this limitation, computer vision, object detection, and object tracking used for determining the most likely direction to move in case of losses. Fukui et al. [6] offered a new method for tracking objects. The particle filter is the main foundation of the tracker. The target item is monitored using the object detector. Similar items of varied sizes that are present close to the target object can be detected by the detector. The steps for determining the likelihood and assessing the circumstances, in which the target object occurs is based on the existence of similar objects. The strategy allows for robust tracking by using statistical data. Experimental results demonstrate that the system can track the objects which are steady. The problem is not about tracking stable objects, but the moving or floating solids over surface water live video streaming from the camera. Jogi et al. [7] proposed a system targeted on modeling, designing, and controlling of a boat operated by a pedal, for making it light in weight and portable. Although this system was able to acquire the rubbish from the lake with human intervention.
558
K. Mungekar et al.
To function, the system needed excessive manpower and this problem can be solved by using the solar panel instead of pedal operation. The prototype boat consumes no fuels such as gasoline or diesel. They made an IoT prototype which was portable, lightweight and small in size whereby collection of trash from narrow spaces like drainage systems became possible. The prototype boat is beneficial for both small and large lakes where there is a lot of trash. The prototype gathers the trash and cleans the lake at the same time. Chen et al. [8] suggested a technique called region proposals for performing selective search to extract 2000 regions in the image obviating the need to select a large number of regions. These 2000 regions are generated using the selective search algorithm, in which the input image first performs the extract region proposals and uses the Region of Interests (ROI) layer to process the features of the candidate regions, then warped regions pass to compute CNN features, and at the end, it gives the classify region. The literature covers a diverse set of small objects in the daily life and used R-CNN algorithm for detecting them. In contrast to the classification algorithms, object detection algorithms attempt to locate the object within the image by attempting to construct a bounding box around it. After the study, detailed experimental validation and analysis of a small object dataset R-CNN algorithm achieves similar performance improvement over the conventional approach for small object detection as it did for big object detection. Chen and Elangovan [9] opined that visual examination technology is the most important resource of product quality. The industry uses quality inspection (QI) to assess whether a material is qualified or not. The robotic process automation (RPA) is needed for the vast repetitive existence of QI. To reduce inspection costs and improve process performance of the visual examination, a fast, reliable, and precise QI process is required. The performance out from neural network is integrated for a robotic arm to select and drop damaged items from a production line in a faster R-CNN-based auto-sorting system. The Faster R-CNN algorithm simplifies the R-CNN process by introducing a Region of Interests (ROI) pooling layer that produces the possibilities of the feature map from the input image. Even though, the Faster R-CNN saves time and improves efficiency. R-CNN has a faster version called Fast R-CNN which does a ConvNet forward pass for each area without sharing computation unlike R-CNN which takes a longer time for classifying data using SVM. Faster R-CNN extracts the features from a whole input image, then the features pass to the ROI pooling layer, which uses the features as the classification data and bounding box regressions as the input for fully connected layers. The features are extracted from the entire image and forwarded to CNN at the same time for classification and localization. Although the model is substantially faster to train and classify, still needs a range input image of materials to be formulated. This study would need to be continued for various materials such as metallic, concrete, plastic, and wood in order to show the impact of materials on the preparation and detection stages using a Faster R-CNN and a YOLO neural network. Cao et al. [10] observed that the ability to detect objects in digital frames or photographs is a vital challenge for many artificially intelligent system (AIS) applications. Accuracy and computation time are two key aspects of such object detection
Design of an Aqua Drone for Automated Trash Collection …
559
tasks, primarily for applications that involve real-time processing. In dynamic environments, the shuttlecock always travels quickly and must be detected precisely in real time so that the robot can prepare and perform its subsequent movements. Using the YOLO model, the robot tracks the shuttlecock and matches its trajectory using a visual subsystem, then travels to the specific location, and finally, hits the shuttlecock with its hitting system. One downside of the following discussion is that most of the data was gathered outside in natural light. However, in indoor settings with numerous light sources, the shuttlecock in the images can shadow, affecting detection efficiency. Another drawback is that the binocular camera is used for 2D shuttlecock detection, as well as the 3D shuttlecock direction may include errors that influence the robot’s behavior. Bai et al. [11] used deep neural network for garbage recognition and navigation to collect garbage on ground. The recognition accuracy reached 95% and had a cleaning efficiency more than traditional methods. Melinte et al. [12] enhanced the performance of convolutional neural network (CNN) for object detection in municipal waste by fine-tuning single-shot detectors and regional proposal networks on TrashNet database. They were able to attain an accuracy of 95.76% and a precision of 97.63%. Yang and Thung [13] used a model based on support vector machines with scaleinvariant feature transform and compared the performance of the same with CNN for classifying garbage into classes for the purpose of recycling. They used 400–500 images for each class. Their findings indicate that SVM performed better than CNN. Wang et al. [14] used the ResNet network algorithm as convolutional layers on the Fast R-CNN object detection framework and found that it improves the accuracy of object detection and location. Extant literature review indicates that various CNN-based object detection architectures like R-CNN, Fast R-CNN, Faster R-CNN, SSD, and YOLO are analyzed by researchers. The comparison of various algorithms is discussed in terms of various parameters. The use of CNN-based object detection as well as its benefits and disadvantages is thoroughly explored. Among the best performing CNN-based object detection models, YOLO is remarkable for its simplicity, computational efficiency, speed, and accuracy.
3 Methodology The system is designed using two modes, namely automation mode and remote control mode. The first mode includes the automation of the aqua drone, which involves the steps of data collection, image annotation, create training and testing dataset, model train and test, calculate the distance of an object, take moving decision, and collect the object for the floating trash collection. With the deep CNN approach, the YOLO architectures are used for the object detection task. The second mode includes the online remote control operation, which includes the client–server architecture.
560
K. Mungekar et al.
3.1 Hardware Modeling and Data Processing The hardware modeling and data processing activities are performed simultaneously since they are not relying on each other, making work on them faster, more efficient and timesaving. It is more important in the design of complex structures, beginning with the hardware modeling process. Such hardware specification and construction methods help in increasing design efficiency and productivity. The design productivity means the structure and behavior of electronic circuits, connection of digital logic circuits, IC’s, communication between the hardware components, command, and large data processing power, time delay or CPU ideal time, and so on. It also helps in the understanding of numerous automated circuit simulation problems such as power usage in amperes and milli-amperes, motor speed and torque, circuit loss, and many other case studies. The various simulation problems are used to illustrate by design productivity. In this task, the following components are finalized for the proposed system. The components are Raspberry Pi as a mini-computer and as a processing unit, a camera for getting the visual inputs from the environment, 20-A motor driver for more power and torque, Jonson 500 rpm DC motors for good speed, 1400 kV brushless motor kit to generate air presser, and battery to provide power to all the hardware components. The hardware used for the system is as shown in Fig. 1. Study of data processing was carried out simultaneously using computer algorithms to perform image processing on digital images. It includes the evaluation of a much broader variety of algorithms to the input data. Improve image details or features by enhancing certain essential image features so that computer vision models can work with this enhanced data. Initially, we use the Gaussian blur method to remove noise or denoise the image in order to smooth it out and reduce image noise. Following that, data augmentation and image annotation methods are applied to the data.
3.2 Model Development, Evaluation and Optimization One of the deep learning models You Only Look Once (YOLO) is used extensively for object detection, classification, and semantic segmentation. YOLO models are developed for accurate and fast object detection at 30 frames per second. The model uses a solitary neural network to process the entire image. The network breaks down the image into its constituent parts, calculates probability, and predicts the bounding boxes for dissimilar regions. The bounding boxes are weighted based on the predicted probabilities [15]. The bounding box parameters are as follows: pc—the probability of the class classification, bx and by—the center point of the box, bw—the width of the box
Design of an Aqua Drone for Automated Trash Collection …
561
Fig. 1 Hardware model of the system
along the x-axis, bh—the height of the box along the y-axis, and c—the predicted class. A 53-layer neural network training architecture called Darknet is used in YOLOv3 an addition of 53 more layers on top of the model makes it a 106-layer convolutional architecture improving the detection process. YOLO has 53 convolutional layers, followed by a batch normalization layer and a leaky ReLU. Several feature maps are produced while preprocessing images using convolutional layer. The feature maps are down sampled using a convolutional layer with stride 2 to avoid pooling. Improvements are made in low-level feature loss typically associated with pooling. The prototype operates in two modes automatic and manual. In automatic mode, it uses the YOLO, and in manual mode, client–server architecture is used [16]. Figures 2, 3, 4, and 5 show the views of the prototype.
3.3 Serial Communication and Object Tracking To connect and communicate to another device, Raspberry Pi sends signals through its pin. To provide the direction to the aqua drone, Raspberry Pi needs to connect and communicate with the motor driver and via the motor driver, Raspberry Pi can control the motors.
562
K. Mungekar et al.
Fig. 2 Side view of the prototype
Fig. 3 Side top view of the prototype
Figure 6 shows the flow of serial communication and object tracking. The center point of the bounding box plays an important role to determine, in which direction the aqua drone needs to move to come back to the vision center point. And also, a set of threshold values that helps to control the speed and rotation of the motor. The process is repeated until the floating trash gets collected in the trash net.
Design of an Aqua Drone for Automated Trash Collection …
Fig. 4 Top view of the prototype
Fig. 5 Three-dimensional side view of the prototype
563
564
K. Mungekar et al.
Fig. 6 Flowchart showing serial communication and object tracking
4 Results Dataset Description The dataset used in this method is freely accessible on the Internet as well as in a realtime environment. The dataset has high-resolution images collected under a variety of imaging conditions and different lighting conditions. The data is collected by scraping the web publicly available data on Google, taking photographs programmatically, and data augmentation. The dataset contains 1000 high-resolution color images divided into 2 classes that correspond to the plastic bottle and sports ball, as given in Table 1. The training was done on 800 images for 5000 epochs. The test set consists of the remaining 200 images. The usefulness of the system was analyzed in order to ascertain its accuracy, speed in frames per second (FPS), ability to gather floating solids from water, and the total amount of waste collected. Class Distribution The dataset is divided into three sections: train, validate, and test. The dataset with 1000 images was split into three subsets, 80% of the dataset was used for the purpose of training, 20% of the dataset is used for validation and 10% randomly selected
Design of an Aqua Drone for Automated Trash Collection …
565
Table 1 Class distribution of dataset Class
Name
Total no. of images
Train
Validate
Test
0
Bottle
500
400
100
100
1
Sports ball
500
400
100
100
images were used for the purpose of testing. The distribution of image classes in the original dataset is given in Table 1. Model Training The configuration file for YOLO is created to set up the YOLO model to train on the customized dataset. All the setup files are stored on google drive and link to Google Colab. After the setup is done, the training part begins. We train the model on Google Colab with a 12 GB RAM and GPU environment. Before start training, Google Drive synchronized with the Google Colab to automatically backup trained weights. The training runs to 5000 iterations which takes 2–3 h, respectively. Model Evaluation After the training is finished, we get three main files, namely ‘yolov3_training_final, weights’, ‘yolov3_testing.cfg’, and ‘classes.txt’. These three files are also called weights file, configuration file, and classes file which are very important to run the YOLO model after training. The files are downloaded from Google Drive, stored in a single folder and then the object detection code on Raspberry Pi to evaluate the performance of the Python coded custom trained model is run. This model performs two methods, detect the object from the input image and draw a rectangle around it, and give the movement direction to the Raspberry Pi. Remote Control Result The remote control mode is implemented with a Python Flask. Flask is a Python web framework used for developing web applications. It is also easier and has less base code to make a web application. The flask-based web application is directly connected to the server, so the command sent by the user goes on the server. The web application controls the automatic mode as well as the manual remote control mode. Performance Metrics To measure model performance, model evaluation metrics are needed. The algorithm evaluates performance based on a variety of parameters. Accuracy, error rate, confusion matrix, precision, recall, F1 score, and Cohens Kappa score are a few parameters to consider. From the confusion matrix which compares actual values with the predicted values, the following terms have been identified True Positives (TP) meaning Predicted positive and are positive, False positives (FP) meaning Predicted positive and are negative, True negatives (TN) meaning Predicted negative and are negative, and False Negatives (FN) meaning Predicted negative and are positive.
566
K. Mungekar et al.
The description of the parameters is as given below. Accuracy = (TP + TN) (TP + TN + FP FN) Error Rate = 1 − Accuracy Precision = TP (TP + FP) Recall = TP (TP + FN) F1 Score : F1 = 2 × (Precision × Recall) (Precision + Recall) Cohens Kappa : K = (Pobserved − Pchance ) (1 − Pchance ) The proposed method’s results are related to the current state-of-the-art work. The tests are carried out on ‘Google Colab’ with the help of GPUs. The TensorFlow deep learning back-end library used with the Keras deep learning package. Figure 7 displays the performance metrics of the system for multiple criteria which includes accuracy, error rate, confusion matrix, precision, recall, and F1 score. As the efficiency metric, the quadratic weighted kappa was used. The final trained network obtained 99% accuracy, 0.0167 error rate, 0.23 logloss, 99% precision, 99% recall, 99% F1 score, and 0.9665 kappa quadratic weighted value.
Fig. 7 Performance metrics
Design of an Aqua Drone for Automated Trash Collection …
567
5 Conclusion The experimental results show that the models are effective at harvesting floating trash from swimming pools. The trained models are installed on a web application-based platform for online remotely operate over the internet successfully. The prototype can be converted into a product and used for cleaning swimming pools. To control the speed of DC motor according to dynamic environment, the gyroscopic acceleration module can be included in further model prototype. Further classes of floating solid objects can be added in the dataset, and we may expect a similar/better performance from the YOLO deep learning algorithms. The camera angle is now limited; this can be increased by using wide-angle camera lens or high-resolution camera. The use of the system would reduce the housekeeping expenses for maintenance of the pools and will also help in conservation of water by promptly removing trash from the pools.
References 1. Kader ASA et al (2015) Design of rubbish collecting system for inland waterways. J Trans Syst Eng 2(2):1–13 2. Gao X, Gu J, Li J (2005) IEEE Transactions on consumer electronics. De-interlacing Algorithms Based Motion Compensation 51(2):589–599 3. Abdullah SHYS, Mohd Azizudin MAA, Endut A (2019) Design and prototype development of portable trash collector boat for small stream application. Int J Innov Technol Explor Eng 8(10):350–356 4. Thiagarajan S, Satheesh Kumar G (2019) Machine learning for beach litter detection. In: Machine intelligence and signal analysis, Springer, Singapore, pp 259–266 5. Saadou-Yaye A, Aráuz J (2015) µAutonomy: intelligent command of movable objects. Proc Comput Sci 61:500–506 6. Fukui S et al. (2015) Object tracking with improved detector of objects similar to target. Proc Comput Sci 60:740–749 7. Jogi NG et al (2016) Efficient Lake garbage collector by using pedal operated boat. Int J Mod Trends Eng Res 2(4):327–340 8. Chen C et al. (2016) R-CNN for small object detection. In: Asian conference on computer vision, Springer, Cham 9. Chen P, Elangovan V (2020) Object sorting using faster r-cnn. arXiv preprint arXiv:2012.14840 10. Cao Z et al. (2021) Detecting the shuttlecock for a badminton robot: a YOLO based approach. Exp Syst Appl 164:113833 11. Bai J et al. (2018) Deep learning-based robot for automatically picking up garbage on the grass. IEEE Trans Consum Electr 64(3):382–389 12. Melinte DO, Travediu AM, Dumitriu DN (2020) Deep convolutional neural networks object detector for real-time waste identification. Appl Sci 10(20):7301 13. Yang M, Thung G (2016) Classification of trash for recyclability status. CS229 Proj Rep. 2016(1):3 14. Wang Y, Zhang X (2018) Autonomous garbage detection for intelligent urban management. In: MATEC web of conferences, vol 232, EDP Sciences
568
K. Mungekar et al.
15. Yahya MA, Abdul-Rahman S, Mutalib S (2020) Object detection for autonomous vehicle with LiDAR using deep learning. In: 2020 IEEE 10th International conference on system engineering and technology (ICSET), IEEE 16. Ranjan C et al. (2019) Design and demonstration of manual operated pedalo water boat for garbage collection from lake. In: AIP conference proceedings, vol 2200. no. 1. AIP Publishing LLC
Design of a 3-DOF Robotic Arm and Implementation of D-H Forward Kinematics Denis Manolescu and Emanuele Lindo Secco
Abstract Robotic prototypes are gradually gaining more academic and research traction, while robots are becoming a more common encounter within the human social-trusted cycles. Generally, in the fields of Robotics—besides the mandatory high standards of safety—engineers are keen to build their concepts with an intuitive, natural-like motion and characteristics. This project will go through the process of building a novel 3-degrees of freedom robotic arm. The study will also aim at introducing and demonstrating the importance of a proper kinematics: an analysis on how geometry computations are used to improve the control and motion of a robot is performed. Finally, the pros and cons and the constraints of the proposed system are discussed together with a set of approaches and solutions, providing an overall approach toward the design and implementation of closed hardware and software robotics solution. Keywords Robotic arm · Forward kinematics · Denavit-Hartenberg
1 Introduction Robotic arms, suitably named because they generally resemble the human arm and mimic their functionality, were mainly introduced into the heavy industries in the 1960s. The automotive industry was the first to adopt the technology; shortly after, other assembly-line factories followed [1]. Even from the early stages, having a machine that can continuously do repetitive tasks with speed and precision, lift loads greater than a person’s capacity and endure harsh environmental conditions—robots seemed to be the logical future. They became attractive by offering efficiency and cost-effective advantages through automation [2]. D. Manolescu · E. L. Secco (B) Robotics Lab, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK e-mail: [email protected] D. Manolescu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_42
569
570
D. Manolescu and E. L. Secco
Fig. 1 Robotic joint types. Image source medium, 2022 [9]
Throughout time, robotic arms were an evolutionary direction that built and guided the Robotics industry and determined its existence today. Thanks to greater sensory awareness and novel programming algorithms—like artificial intelligence, deep learning or machine learning—robotic arms can now interact and fulfill side-by-side tasks with humans. Typically, from an engineering perspective, a robotic arm consists of a steady base that anchors a multi-segmented structure and forms the robot body. The connections between segments are called joints, and individually, they determine an axis of motion, also known as the degree of freedom. The last link of the arm is the endeffector which is a specialized device designed to interact with the environment. The end-effector can take many shapes and forms, like gripper, vacuum head, magnets, soft-manipulation grabber, wielding tip, drilling or cutting tool, painting spray or camera module [3, 4]. In essence, most joints are built using actuators that are capable of initiating movements depending on their control signal feedback. In tandem with the linear or rotational motion of an actuator, joints can mostly be classified as prismatic or sliding, revolute or rotary, spherical or socket and helical or screw-based (Fig. 1). A higher number of joints enables more freedom of movement, and the majority of the robotic arms available have four to six joints [5–7]. Coordinating the entire movement of a robotic arm while determining its position, orientation and velocity is a fundamental task in robotics. Kinematics is the main mathematical option capable of accurately describing the relationship among the joint coordinates, the end-effector and their spatial layout [8]. There are two ways to use kinematics equations in the context of a robotic arm: 1. Forward kinematics—uses the kinematics equations to calculate the position, orientation and velocity of the end-effector while knowing the joints angle. In practice, the forward kinematics process uses the kinematic structure of the robot and the spatial configuration of the links to calculate the rotation and displacement of each frame. The result of forward kinematics is a single possible solution chain linked to the entire structure of the robotic arm, no matter its movement. 2. Inverse kinematics—uses kinematics equations to compute a configuration of the joints position or angle that is necessary to place the end-effector in a given
Design of a 3-DOF Robotic Arm and Implementation of D-H Forward …
571
or desired location and orientation. Inverse kinematics is a more complex mathematical calculus than forward kinematics and usually produces multiple solutions [10, 11]. Once deployed into the robotic arm control algorithm, kinematics calculations can also be used to improve precision motion and prediction or objects collision detection. This study will analyze the designing and building process of a 3-degrees of freedom robotic arm with revolute joints and will go through the process of calculating and applying forward kinematics [12]. The paper is organized as follows: the materials and methods section will talk about the characteristics of the robotic arm built in this project and the main components. The forward kinematics chapter will present the in-depth process of how kinematics works and how forward kinematics is calculated and integrated into the robot system. The algorithm section will complete the forward kinematics chapter by going through the code developed to operate the robotic arm and the integration of forward kinematics. The results section will point out the issues encountered and the solutions adopted. It will also present the final view of the robotic arm and prove its functionality. The last section, namely the conclusion, reports an objective overview of the entire project and reflects on the results achieved.
2 Materials and Methods The 3-degrees of freedom robotic arm built in this study has a vertical length of 26 cm and a working envelope of 18.3 cm radius (Fig. 2a). The project uses three servo actuators connected by nine enforced plastic frames and three reinforced plastic brackets (Fig. 2b). The base is anchored with screws by a dense polyamide board, and the entire structure is kept in place by two gripping clamps.
Fig. 2 Robotic arm 3D design made in Autodesk Fusion 360
572
D. Manolescu and E. L. Secco
Fig. 3 Robotic arm. All components overview (final version)
The arm is controlled by an Arduino Mega 2560 Rev3, which holds on top a servo driver shield connected to a 9.6 V DC power supply (Fig. 3b). The first joint within the base rotates 295° left–right on the Z-axis, forming a displacement of 13 cm from the ground to the next frame. The second joint rotates 190° up–down, constrained only by the position of the holding bracket. Between the first and second frames, there is a displacement of 11.75 cm. The third joint is identical to the previous one, while the distance between the second and third frames is 6.55 cm. There is also an additional 5 cm distance from the servo rotation axis to the end-effector tip—i.e., the pencil (Fig. 3, panels a and b).
2.1 The Arduino Mega and Arduino Uno Microcontrollers The central computing power of the robot is generated by an Arduino Mega Rev3 board which is based on the 8-bit ATMega 2560 microcontroller produced by Microchip. The board operates on 5 V and has a clock speed of 16 MHz while offering a convenient 54 digital connection pins and 16 analog pins (Fig. 4c). Although the microcontroller is not used for complicated computations, it holds enough processing power to perform decently real-time kinematics calculations. Additionally, in the early stage of development, an Arduino Uno had to be used in parallel with Mega, and its purpose was to capture the UART data packet coming from the servo motor via the shield drive. More about this issue is in the challenges section below.
2.2 Dynamixel Servo Shield The servo shield used in the study is created by Dynamixel for their own servo motor series, and it is built to fit on top of Arduino Mega or Uno (Fig. 4c). The shield
Design of a 3-DOF Robotic Arm and Implementation of D-H Forward …
573
Fig. 4 Dynamixel servo driver shield layout, AX-12A Dynamixel servo motor, Arduino Mega and servo shield attached
operates on 5–24 V, and one of its best features is to allow the export of the data packets coming from the servo motors via its UART pins [13]. An inconvenient flaw of the shield is the manual switch between the communication with the Arduino serial port and the servo motor control channel (Fig. 4a—UART SW). When uploading commands to the microcontroller, the UART needs to be set to Upload mode. For the servo motors to start reacting to the commands uploaded, the UART needs to be turned to Dynamixel mode. This manual switch becomes a real issue in the repetitive debugging stage of the servo motors and the robotic arm movements’ tests. Software-wise, the producer and the community offer a wide range of support for testing and further development through C ++ libraries—all fully compatible with the Arduino environment.
574
D. Manolescu and E. L. Secco
2.3 Dynamixel Servo Motors The entire robotic arm design has been built around three Dynamixel AX-12A servo motors. These are advanced, high-performance robotics actuators designed to be modular and daisy chain connected (Fig. 4b). Underneath the reinforced plastic housing, the actuator is composed of a 9–12 V DC motor with 1.5 N/m stall torque, a 254:1 spur gearbox, a built-in microcontroller with feedback, a driver and a half-duplex serial interface running at up to 1 Mbps through a UART/TTL serial link. This network feature enhances the servo capabilities and offers precise control and programmability of torque, speed, position, temperature and even voltage [14]. A very interesting feature of these intelligent servo motors is their capacity to be switched from a Joint mode—with a rotation capacity of 300°, to a Wheel mode— giving them the ability to endlessly rotate without constraints. Another helpful feature is the daisy chain connection—meaning the servos only need to be connected to each other in a serial chain via a 3P cable with a Molex connector. In the daisy chain setup, only a single servo needs to be directly plugged into the Arduino shield. This setup is possible because of the asynchronous communication between devices and the use of unique IDs for each motor in the commands data packet.
3 Forward Kinematics Calculations Forward kinematics is the mathematical process that uses the joint angles, in this case, θ 1 , θ 2 , θ 3 (theta values), to determine the exact position of the end-effector (Fig. 5a). The study will go through two different ways to apply forward kinematics, namely: 1. Conventional way—using the rotation matrix and the displacement matrix to calculate the homogeneous transformation matrix.
Fig. 5 Kinematics diagram for rotation matrices and calculus. Displacement layout between joint frames
Design of a 3-DOF Robotic Arm and Implementation of D-H Forward …
575
2. And the Denavit-Hartenberg method. Both of these methods use the same rotation matrix calculus.
3.1 Rotation Matrices Considering the robotic arm has three revolute joints, and the rotation matrix formula for the entire system is: R30 = R10 · R21 · R32 The first stage consists of calculating the projection matrices of each next frame onto the previous frame, with the joints in their initial state or position. This process will help determine the rotation of each joint in a frame space relationship. 1. Projection matrix of Frame 1 to Frame 0: x y z ⎡ 1 1 1⎤ x0 1 0 0 y0 ⎣ 0 0 −1 ⎦ z0 0 1 0 2. The projection matrix of Frame 2 to Frame 1 and the projection matrix of Frame 3 to Frame 2 are the same and equal to the identity matrix. That is because the orientation of their axes is identical. x y z ⎡ 2 2 2⎤ x1 1 0 0 y1 ⎣ 0 1 0 ⎦ z1 0 1 1 (Frame 2 − Frame 1)
x y z ⎡ 3 3 3⎤ x2 1 0 0 y2 ⎣ 0 1 0 ⎦ z2 0 1 1 (Frame 3 − Frame 2)
Once the projections are defined, the rotation matrices are calculated: ⎡
⎤ ⎡ ⎤ cos θ1 − sin θ1 0 10 0 ⎣ 0 0 −1 ⎦ = R01 = ⎣ sin θ1 cos θ1 0 ⎦ ∗ 0 0 1 01 0 Rotation of θ1 Projection of which is a Z − axis ∗ Frame 1 to Frame 0 rotation
⎤ cos θ 1 0 sin θ 1 ⎣ sin θ 1 0 −cos θ 1 ⎦ 0 1 0 ⎡
576
D. Manolescu and E. L. Secco
⎡
⎤ ⎡ ⎤ cos θ2 − sin θ2 0 100 ⎣0 1 0⎦ = R12 = ⎣ sin θ2 cos θ2 0 ⎦ ∗ 0 0 1 001 Rotation of θ2 Projection of which is a Z − axis ∗ Frame 2 to Frame 1 rotation (identify matrix) ⎡ ⎤ ⎡ ⎤ cos θ3 − sin θ3 0 100 ⎣0 1 0⎦ = R23 = ⎣ sin θ3 cos θ3 0 ⎦ ∗ 0 0 1 001 Rotation of θ3 Projection of ∗ which is a Z − axis Frame 3 to Frame 2 rotation (identify matrix)
⎤ cos θ 2 −sin θ 2 0 ⎣ sin θ 2 cos θ 2 0 ⎦ 0 0 1 ⎡
⎤ cos θ 3 −sin θ 3 0 ⎣ sin θ 3 cos θ 3 0 ⎦ 0 0 1 ⎡
Technically, there is no need to calculate the R30 matrix to determine the homogeneous transformation matrices. But the Python algorithm developed will compute it anyway; more about that is in the code section below.
3.2 Displacement Matrix Displacement, in the context of the robotic arm, refers to the x, y and z positions of frame n in frame m. The general formula is: ⎡
⎤ xm n ⎦ d nm = ⎣ ym n m zn Using Fig. 5b, the displacement matrices are determined in the context of θ 1 , θ 2 , θ 3 , as below: ⎡
⎤ a2 ∗cos θ 1 d10 = ⎣ a2 ∗sin θ 1 ⎦ a1
⎡
⎤ a3 ∗cos θ 2 d21 = ⎣ a3 ∗sin θ 2 ⎦ 0
⎡
⎤ a4 ∗cos θ 3 d32 = ⎣ a4 ∗sin θ 3 ⎦ 0
3.3 Homogenous Transformation Matrix In Robotics, the homogeneous transformation matrix (HTM) is a tool that combines the rotation matrix with the displacement matrix to generate the position and
Design of a 3-DOF Robotic Arm and Implementation of D-H Forward …
577
orientation of the end-effector. The HTM of the robot structure is calculated the same way as the rotation matrix by multiplying the HTM of all frames. The HTM formula for the entire robotic arm is: H30 = H10 · H21 · H32 where it holds: H nm =H
m Rm n dn 0 00 1
,
R − 3 × 3 rotation matrix d − 3 × 1 displacement matrix
In this study, the final H30 of the robotic system is computed by a Python algorithm. For testing purposes, the defined joint angles are θ 1 = 75°, θ 2 = 147°, θ 3 = 147°, and the result is:
Where the end-effector tip, according the all calculations so far, is placed at the coordinate (17.93, 5.48, 0.05), in the first quadrant. Additional explanations of the result are discussed in the results section below.
3.4 The Denavit-Hartenberg Method The second method studied in applying forward kinematics to the robotic arm is the Denavit-Hartenberg (D-H) method. For this technique to work, four rules must be followed in the frame assignment stage [15]: 1. In the case of a revolute joint, the Z-axis is considered the rotation axis; 2. The X-axis must be perpendicular to the Z-axis of the previous frame; 3. The X-axis must intersect the Z-axis of the previous frame; this rule does not apply in the context of Frame 0; 4. The Y-axis must be determined from the X- and Z-axes by following the righthand rule. Once these rules are in place as shown in Fig. 5a, the next step is to determine the D-H parameters table (Table 1). The final step of the D-H method is to use the parameters above to calculate the homogeneous transformation matrices. In this case, the general formula is:
578
D. Manolescu and E. L. Secco
Table 1 D-H parameters of the designed robotic arm Joint number
ϑ theta angle
α angle = the rotation around X-axis to get the previous Z-axis
r = the distance along X-axis from the center of the previous frame to the center of the present frame
d = distance along Z-axis from the center of the previous frame to the center of the present frame
1
ϑ1
90
a2
a1 –a5
2
ϑ2
0
a3
0
3
ϑ3
0
a4
0
Describes rotation
⎡
H n−1 n
cos θ n ⎢ sin θ n =⎢ ⎣0 0
− sin θ n ∗ cos αn cos θ n ∗ cos αn sin α n 0
Describes displacement
sin θ n ∗ sin α n − cos θ n ∗ sin α n cos α n 0
⎤ rn ∗ cos θ n rn ∗ sin θ n ⎥ ⎥ ⎦ dn 1
Each D-H homogeneous transformation matrix gets calculated in Python. The code is shown and explained below.
4 Algorithm—C++ and Python 4.1 The Arduino IDE and C++ libraries The setup to operate the joint movement of the robotic arm is defined in Fig. 6: at first, the servo shield C++ library is included in the program, by calling the DynamixelShield.h header file provided by the developer. Lines 3–11 in the code represent an automation process, where if statement makes the shield recognize the type of Arduino that will run the system [16]. The servo motors’ ID variables are declared in Lines 14–16, while the firmware used to run the servo motors is defined in Line 18. This PROTOCOL_VERSION is a default variable specific for the series of actuators used. Line 20 represents the instance of the library class DynamixelShield that gives access to all the motor functionalities, as they are made available by the producer. Line 23 also creates general access to the global variables used in the control of the servos, like ID, torque, temperature, position and many more. From Line 25 in the code, it begins the setup of the connection between Arduino and servo motors via the driving shield. The default baud rate to transmit the commands to the servos is 1 Mbps, set by command dxl.begin(1,000,000). In the main loop, in the code lines 40–42, the speed of each of the three servo motors is set to a value of 70 (Fig. 7). The function used, setGoalVelocity(), can take
Design of a 3-DOF Robotic Arm and Implementation of D-H Forward …
579
Fig. 6 Arduino sketch—setups
raw data, rpm or percentage values; in this case, it is set to a raw value, and it can go up to 450–500. The initial reading of the servo position is done using the getPresentPosition(), which can also take the raw or degrees values as a variable. At the same time, the data received is valuable feedback that can be used to drive the motor in the desired position with high accuracy. The setGoalPosition() function rotates the specific actuator at the requested angle. This function is very powerful and useful in the kinematics and the motion of the robotic arm.
580
D. Manolescu and E. L. Secco
Fig. 7 Arduino sketch to run the robotic arm. Main loop code
4.2 PyCharm 2022 and Python The robotic arm project used Python programming language and PyCharm IDE to build two different algorithms: 1. One to execute the conventional forward kinematics calculus with rotation matrices, displacement matrices and the homogeneous transformation matrices (Fig. 9). 2. The other one applies the D-H method with the rotation matrices, parameters table and D-H homogeneous transformation matrices (Fig. 8). The results are compared and discussed in the results section below. Both algorithms are running on the same theta values representing the joint angles, θ 1 = 75°, θ 2 = 147° and θ 3 = 147°. Both, Arduino sketch and Python forward kinematics code are made available online at github.com/deemano. Essential aspects to consider in the forward kinematics calculus are the transformation from degree into radians and the positioning of the joint axis, which starts from the initial position of the actuator (i.e., 0°). The 3D axis of the joints can be shifted conveniently to fit any desired layout without influencing the results.
Design of a 3-DOF Robotic Arm and Implementation of D-H Forward …
581
Fig. 8 Python algorithm. D-H method in forward kinematics calculus
Fig. 9 Python algorithm for conventional forward kinematics calculus
5 Results The only challenge this research faced was making the UART debugging process work. Typically, capturing all the servo motors feedback requires a dedicated device sold by the producer, which acts as a medium and can capture the half-duplex serial data stream and decode it. But in this case, using a second Arduino proved to be a viable solution in capturing the data stream. Solving this issue was important in changing the default ID assigned to each servo motor or other personalized parameters like protocols, which can be unknown. In the final stage, the movement of the robotic arm is smooth and steady, without any issues. This outstanding performance of the servo actuators allowed the study to focus the entire efforts on implementing forward kinematics.
582
D. Manolescu and E. L. Secco
Although the conventional way to calculate forward kinematics seemed more straightforward and modulated, both methods were generating identical results in determining the tip of the robotic arm end-effector. In the end, and consistently with the literature, the D-H method proved to be a simpler way to emulate the robotic arm structure and motion and faster to calculate.
6 Discussion and Conclusion This work has been focused on creating a complete view of the process of building an operational robotic arm. Additionally, the research elaborated a comprehension of how forward kinematics algorithms can be achieved and integrated into the robot system. The 3D design of the robot was a laborious procedure with significant creative efforts but of great contribution to understanding the kinematics relationships within the robot structure. The electronics part of the project has proven to be a relatively simple stage, primarily because of the intuitive and accessible documentation and other resources related to the hardware used. Forward kinematics, on the other hand, has been the most challenging part, demanding significant amounts of research and raw pen-and-paper calculations. In its essence, when building your own robotic arm, kinematics is a personal aspect of the structural motion of the robot that can only work if it is understood correctly. The industrial development of high-performance smart actuators is becoming accessible and more affordable. Once algorithm concepts like deep learning and AI get to be integrated into these devices by default, the Robotics field will be able to fully evolve and expand into all the other industries and aspects of human life [2, 13]. Acknowledgements This work was presented in coursework form in fulfillment of the requirements for the BEng in Robotics Engineering for the student Denis Manolescu from the Robotics Laboratory, School of Mathematics, Computer Science and Engineering, Liverpool Hope University.
References 1. Moran ME (2007) Evolution of robotic arms. (Online) Available at: https://www.ncbi.nlm.nih. gov/pmc/articles/PMC4247431/#:~:text=Unimate%20introduced%20the%20first%20industr ial,a%20robotic%20arm%20in%201993. Accessed 2022 2. Smith E (2018) Going through the motions. (Online) Available at: https://tedium.co/2018/04/ 19/robotic-arm-history-unimate-versatran/. Accessed 2022 3. Ernest L, Hall UOC (2015) Introduction to robotics-end effectors. (Online) Available at: https://www.researchgate.net/publication/282976515_Robotics_1_Lecture_7_End_ Effectors. Accessed 2022 4. Procter S, Secco EL (2022) Design of a biomimetic BLDC driven robotic arm for teleoperation & biomedical applications. J Hum Earth Future. ISSN: 2785-2997
Design of a 3-DOF Robotic Arm and Implementation of D-H Forward …
583
5. Chand R, Chand RP, Kumar SA (2022) Switch controllers of an n-link revolute manipulator with a prismatic end-effector for landmark nav. J Comput Sci 8:e885 6. Chand RP, Kumar SA et al (2021) Lyapunov-based controllers of an n-link prismatic robot arm. IEEE Asia-Pacific Conf CS Data Eng (CSDE) 2021:1–5 7. Intel.inc (2016) Industrial robotic arms: changing how work gets done. (Online) Available at: https://www.intel.com/content/www/us/en/robotics/robotic-arm.html. Accessed 2022 8. Illinois UO (2015) Robot Kinematics. (Online) Available at: http://motion.cs.illinois.edu/Rob oticSystems/Kinematics.html. Accessed 2022 9. Medium (2022) Available at: www.medium.com. Accessed 2022 10. Kumar V, P. E. U. o. P. S. o. E. a. A. S. (2016) Introduction to robot geometry and kinematics. (Online) Available at: https://www.seas.upenn.edu/~meam520/notes02/IntroRobotKi nematics5.pdf. Accessed 2022 11. Manolescu VD, Secco EL () Design of an assistive low-cost 6 d.o.f. robotic arm with gripper. In: 7th International congress on information and communication technology (ICICT 2022), (Lecture notes in networks and systems). ISSN: 2367-3370 12. Inc I (2021) Moore’s Law and Intel Innovation. (Online) Available at: https://www.intel.co.uk/ content/www/uk/en/history/museum-gordon-moore-law.html. Accessed 2022 13. Robotics (2021) Dynamixel Shield. (Online) Available at: https://emanual.robotis.com/docs/ en/parts/interface/dynamixel_shield. Accessed 2022 14. Adafruit (2020) DYNAMIXEL Motor-AX-12A. (Online) Available at: https://www.adafruit. com/product/4768. Accessed 2022 15. Angela Sodemann AA (2020) How to assign Denavit-Hartenberg frames to robotic arms. (Online) Available at: https://automaticaddison.com/how-to-assign-denavit-hartenbergframes-to-robotic-arms/. Accessed 2022 16. Association RI (2015) The first industrial robot. (Online) Available at: https://www.automate. org/a3-content/joseph-engelberger-unimate. Accessed 2022
Impact of Electric Vehicle Charging Station in Distribution System Using V2G Technology Golla Naresh Kumar , Suresh Kumar Sudabattula , Abhijit Maji , Chowtakuri Jagath Vardhan Reddy , and Bandi Kanti Chaitanya
Abstract The electrification of the transport sector is playing a vital role due to the depletion of the fossil fuels and emission of the carbon gases over the atmosphere, which leads to increase in the global warming. As a result, the usage of Electric Vehicles (EVs) is going to increase day by day in the transportation sector. Usage of EVs might reduce the pollution but it will increase the load on the grid, which further increases the losses in the power grid system. As the EV can act as a load and source with a bidirectional converter having dual mode of operation using Vehicleto-Grid (V2G) Technology, which is going to act as a Distributed Generation (DG) during peak time of charging and as a load during off-peak time. In order to reduce the losses on the grid, modelling of EV charging behaviour is key to estimate the charging needs and beyond. In the proposed model, an IEEE 33-bus system integrated with four electric vehicle charging stations (EVCS) of 10th, 14th, 17th and 30th buses was considered to analyse the performance of the system. Also, to estimate the load demand of distribution system for a 24 h duration based on the load setting for the 33-bus system. The simulation results of the V2G model will show the reduction of losses in the distribution system using V2G method. Keywords Electrification · Global warming · Electric vehicle charging station · Vehicle to grid · Distributed Generation
1 Introduction The automotive sector is rising to the top of the list of global industries in terms of both economic importance and R&D spending. More technology components are being added to automobiles in an effort to increase the safety of both passengers and pedestrians. One of the technologies is electric vehicles (EV), which are quickly G. N. Kumar (B) · A. Maji · C. J. V. Reddy · B. K. Chaitanya B.V Raju Institute of Technology, Tuljaraopet, Telangana 502313, India e-mail: [email protected] G. N. Kumar · S. K. Sudabattula Lovely Professional University, Phagwara, Punjab 144411, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_43
585
586
G. N. Kumar et al.
gaining popularity among businesses. Additionally, there are more cars on the roads, which makes it possible for us to travel quickly and pleasantly. But as a result, air pollution levels in cities have dramatically increased. Investing in an EV is one of the best ways to reduce pollution. Additionally, by using EVs, we can protect the environment for future generations [1]. Fossil fuel-dependent cars have come under fire for their impact on the environment, including global warming and other pollution-related issues. As non-renewable energy sources, oil and gas are the first to be mentioned. It is anticipated that these two sources would run out within the next 100 years. Energy reserves should be utilised wisely to guarantee that present and future demands of the people are satisfied. Electric cars have been a big step in making this transformation, despite the obstacles experienced by electric car manufacturers. Eventually, they will displace gasolinepowered vehicles and open the door to a drastic transition to cleaner forms of energy, because they are a new technology and have managed to thrive quite considerably despite factors like unfair competition and their primitivistic nature in technological advancement [2]. EVs are the future of our world and not just of transportation. These automobiles are connected to a low-voltage charging station. These automobiles do not produce any pollutants. EVs will undoubtedly be essential in achieving this goal, at least to some extent. Internal Combustion Engine (ICE) or fossil fuel-burning automobiles have produced a lot of carbon dioxide in the atmosphere [3]. The possible impacts of electric vehicle charging station loads on the voltage profile of distribution networks have been investigated by a large number of industry professionals. It has been observed, for a variety of EV penetration scenarios, that the loads from EV charging stations have an influence on a low-voltage distribution network in Europe [4]. According to the findings of the research, the network is capable of withstanding an EV penetration rate of 1–2%. On the other hand, it was discovered that the voltage profile of the node in which a number of EVCS had been installed, degraded to some degree. Additionally, it was discovered that the high loads caused by EV charging stations were a contributing factor in the voltage profile degeneration of the system’s weak buses. Since many nations are known to consume enormous amounts of electricity, the total charging load for EVs was approaching its maximum levels. However, by incorporating the DG, where the EV can operate as a load and source depending on load times, this can be minimised. Here, V2G is one of the ways to serve as the distribution system’s DG. According to this, the EV will supply the system with electricity by discharging at peak hours. based on the vehicle’s State of Charge (SoC), the load, and the demand. When applying smart charging analysis, smart scheduling of EV charging and discharging has become crucial. Charging the car during off-peak hours and discharging it during peak hours are crucial components of the smart charging installation [5]. The V2G approach is essential in this situation. Smart charging methods, which used the V2G methodology and were based on the SoC of the car, load, and total generation, were used to reduce losses.
Impact of Electric Vehicle Charging Station in Distribution System …
587
2 Literature Review The fast adoption of EVs in daily life suggests that various approaches to lowering the impact of the EV load on the distribution system will be pursued. A growing body of research on electric vehicles with grid integration focuses on modelling the cost of integration under various scenarios, but few studies look at the current promising practices that can be based entirely on policy tools currently in use. Chukwu et al., proposed a model to evaluate how V2G would have an effect on the power factors and the power losses in an electrical distribution system that included V2G devices. The potential significance of this study depends on the usage of V2G to achieve the reduction of power loss and probable control of power factor. Higher power factors decrease the amount of energy wasted via feeder lines, improving the electrical power system’s operational energy efficiency and dependability [6]. Habib et al. present the findings of this research, on a vehicle that has the capability to utilise a V2G application is able to provide a variety of benefits to the grid. Off-peak pricing, delayed pricing, coordinated and uncoordinated pricing, and intelligent scheduling in a distribution network are all extensively treated in this article (DN). According to the findings of our research, the economic advantages of a V2G technology are strongly dependent on the pricing techniques and vehicle aggregation tactics that are used [7]. Malya et al., provides a special approach of parametrising the driving style of electric vehicles and the V2G energy transfer, anticipating the expenses of battery deterioration. A profit model is used to determine the profit produced by car owners who sell their batteries. Profit is calculated based on the owner’s tendency to purchase and sell electricity from the grid in relation to the cost of energy [8]. Woo young Choi et al., has investigated the topic “Reviews on Grid-Connected Inverter, Utility-Scale Battery Energy Storage System, and Vehicle-to-Grid Application—Challenges and Opportunities” in relation to V2G systems. This study reviews three developing grid-connected distributed energy resource technologies: utilityscale battery energy storage systems (BESSs), grid-connected inverters (GCIs), and V2G applications. The topologies and functions are the main topics of the GCI overview. There are several utility-scale battery energy storage systems (BESSs) introduced. It is suggested that a utility-scale BESS has potential grid assistance capabilities [9]. Hydro-Quebec’s research institute signed a contract to build and supply an advanced bidirectional charging station for a test project on vehicle-to-home (V2H) and vehicle-to-vehicle and V2G power exchange [10]. Yasin et al., recommended a survey on communication between smart metres and the grid. The review paper aims to clearly and concisely explain what a smart grid is and what kinds of communication techniques are employed. To make them easier to understand, each component of a smart grid is described logically, and communication techniques are discussed in relation to their enhancements, benefits, and deficiency of features. In terms of integrating a smart grid, the evolving generation, transmission, distribution, and client appliances are surveyed. The communication technologies are introduced as a classification of wireline and wireless, where the
588
G. N. Kumar et al.
salient characteristics are also listed. According to the cyber and physical structures of a smart grid, the security needs for hardware and software are described [11]. Samaresh et al., provided a survey on smart grid applications for cloud computing. In their study, the authors addressed how distributed architecture would enable future smart grids to have cost-effective power management that is dependable, efficient, and secure. They offer a thorough overview of various cloud computing applications for the smart grid design in three separate areas: energy management, information management, and security, in order to concentrate on these requirements. These topics examine the value of cloud computing applications and provide guidance on potential future directions for the growth of the smart grid [12]. Himanshu Khurana et al., presented a paper on smart grid security issues. With their findings, they presented a wide overview of challenges relating to smart grid safety. At this early level, they have devised solutions that will be advantageous in the future [13]. Wang et al., performed a poll on cyber security in a smart grid. They did an indepth investigation of the cyber security issues surrounding the Smart Grid. With respect to Smart Grid security, they focussed on identifying and resolving flaws in the system’s communications infrastructure as well as developing attack defences. Understanding security threats and solutions to them, as well as highlighting possible research areas for Smart Grid security in the future [14]. Farhad Khosrojerdi et al., has studied about V2G system under the topic “Integration of Electric Vehicles into a Smart Power Grid: A Technical Review”. One of the best ideas is to electrify transportation systems so that people may simply lessen their reliance on non-renewable resources and use them when demand is at an all-time high [15]. Zhou et al., when employed as a mobile storage device, the EV may be used to participate in the power grid’s load adjustment and to serve as a platform for the coordination of renewable energy sources. The central control, autonomous control, and battery management systems for V2G are thoroughly examined, as are the main challenges such as battery fatigue, bidirectional charging, and charging stations. In addition, the article discusses the economic advantages of both the power grid and EV owners, as well as research opportunities [16]. Samir et al., suggested a new type of technology for charging the EV with the programme mechanism called as v2g. They have created the business plan for putting the EV into practice as well as the appropriate algorithms needed to create the validated v2g services in the article that has been published. Additionally, they discussed in this paper the adjustments needed for v2g technology services [17]. Wu et al., explores the importance of the EV in the present day and how it is going to change the future of transportation. And in this, he proposed a model to charge the electrical vehicle that is v2g (vehicle to grid technology) and he even discussed the DSR algorithm and implementing a control unit for energy flow management with different distribution networks. The DSR algorithm which has been proposed by him concentrates on the non-dominated sorting genetic algorithm. This article also describes the physical testing platforms for V2G mechanism [18].
Impact of Electric Vehicle Charging Station in Distribution System …
589
Benjamin k et al., have stated in their proposals about the different concepts of V2G and provide different solutions for intelligent charging approaches. It thoroughly examines the software, tools needed, and suggestions for improvement for the loading system. The minimising of energy loss, procedures to manage voltage, and costs associated with power generating are also covered in this essay. In this research, the significance of the various EV charging techniques is also briefly discussed, along with how this would affect the production of all EVs [18]. Baghmare et al., presented is the role of V2G mechanism in the EV adoption in the world. It claims that the evolution of the EV is the next major thing in the world and that V2G has a significant role to play in the EV industry’s manufacturing. According to their assessment, the EV has a global opportunity worth USD 22 billion, and this paper provides statistics on the many opportunities in the EV business. It claims that the evolution of the EV is the next major thing in the world and that V2G has a significant role to play in the EV industry’s manufacturing. According to their assessment, the EV has a global opportunity worth USD 22 billion, and this paper provides statistics on the many opportunities in the EV business [19]. Can et al., discussed various problems and limitations. This paper also says about how it affects the economy of a nation, it also says about the challenges which will be faced when implementing this technology at almost every EV charging station and says how it will affect the environment by drastically decreasing the release of pollutants which are released during the generation of electricity by using nonrenewable resources [20]. This work demonstrates a technique for reducing losses by the strategic placement and size of many DGs and EVCSs that can switch between generating direct current (DC) and alternating current (AC). This research provides recommendations for the optimal size and placement of renewable and non-renewable distributed generation units, as well as electric vehicle charging stations. In addition to lowering power losses, this strategy raises network voltages [21]. The Simultaneous Particle Swarm Optimization (PSO) method is used in the implementation for IEEE 15, 33, 69, and 85 bus architectures. According to the findings, by using optimum planning and operation of both DGs and EVs, the suggested optimisation approach boosts the system’s efficiency and performance.
3 Problem Formulation The main objective of the proposed work is to reduce the losses in the distribution system by using V2G method based on the intermittent load demand. The power equations for the distribution system are based on the below one: • Grid to vehicle (G2V) • Vehicle to grid (V2G).
590
G. N. Kumar et al.
The following power Eq. (1) gives the operation of G2V PG =
24
PBL + PEV + PL
(1)
i=1
Here, PG is Total power generation PBL is base load, PEV is the EV load, and PL is the Losses. The following power Eq. (2) gives the operation of V2G. PG +
24 i=1
PEVDG =
24
PBL + PL
(2)
i=1
where PEVDG is the power generated by EV acting as DG. EVs Power charging, discharging limits, Pch,n,t ≤ P max
(3)
Pdisch,n,t ≤ P max
(4)
SoCmin ≤ SoCn ≤ SoCmax
(5)
ch,n
disch,n
SoC limits of an EV,
4 Methodology A standard IEEE 33-bus system together with 4 electric vehicle charging stations (10, 14, 17, and 30) are taken into account in the suggested technique to assess the system’s overall performance. Based on the load setting, the analysis is also performed for a 24-h time frame. The block design for the suggested model on EV serving as load and DG with dual operation by the bidirectional converter based on stochastic load demand is shown in the accompanying Fig. 1. The number of lines: 32, Slack bus number: 1, Base Voltage: 12.66 kV, MVA: 100 MVA, total Real Power: 3.715 MW, and Reactive Power: 2.295 Mvar determine the bus system’s configuration throughout all 33 radials. The integration of EVCS into the distribution system is shown in Fig. 2.
Impact of Electric Vehicle Charging Station in Distribution System …
591
Fig. 1 Block diagram for proposed model
Fig. 2 IEEE 33-bus system integrated with EVCS
5 Results and Discussions There is a considerable impact on the distribution system as a result of EVs’ quick entry into the system. Here, the EV serves as both a load and a source depending on the load demand, using the V2G approach. The operational values of the EV serving as the DG are provided in Table 1 and illustrated in Figs. 3, 4, 5 and 6. Using EV integration as DG in the 33-bus system based on changes in load demand depending on the SoC in the EV’s availability, it acts as a G2V during off-load hours Table 1 Operating range of EV acting as V2G Bus number
Operating range in kW
Power generation as V2G in kW
10
240–528
480
14
150–360
360
17
220–484
440
30
250–560
560
592
G. N. Kumar et al.
Comparision of Loss with EV acting as DG at 10th bus 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1
2
3
4
5
6
7
Loss without EV (MW)
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Loss with EV (MW)
Loss with EV at 10th bus as DG (MW)
Fig. 3 Comparison of loss with EV acting as DG at 10th bus with time in hours
Comparision of Loss with EV acting as DG at 10&14th bus 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1
2
3
4
5
6
Loss without EV (MW)
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Loss with EV (MW)
Loss with EV at 10,14th bus as DG (MW)
Fig. 4 Comparison of loss with EV acting as DG at 10 and 14th bus with time in hours
and charges at peak load times. The simulation’s results demonstrate how distribution system losses vary under many potential scenarios. The variation of losses for a 24-h horizon due to the injection of EV functioning as DG in the 33-bus system is displayed in the following Table 2. The comparison of losses with EV functioning as DG for the four buses 10, 14, 17, and 30 in the 33-bus system is shown in the accompanying Fig. 7. It is noted that the losses were minimised to 6.29 MW by the EVs functioning as DG on the 10, 14, 17, and 30th buses.
Impact of Electric Vehicle Charging Station in Distribution System …
593
Comparision of Loss with EV acting as DG at 10,14&17th bus 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1
2
3
4
5
6
7
Loss without EV (MW)
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Loss with EV (MW)
Loss with EV at 10,14,17th bus as DG (MW)
Fig. 5 Comparison of loss with EV acting as DG at 10, 14, and 17th bus with time in hours
Comparision of Loss with EV acting as DG at 10,14,17& 30th bus 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1
2
3
4
5
Loss without EV (MW)
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Loss with EV (MW)
Loss with EV at 10,14,17 &30 th bus as DG (MW)
Fig. 6 Comparison of loss with EV acting as DG at 10, 14, 17, and 30th bus with time in hour
6 Conclusion and Future Scope The grid and distribution systems will be greatly impacted by the widespread use of EVs. Consequently, the losses mounted. The DGs are integrated into the system to lower losses. There is a clever technique to charge and discharge the EV here depending on the SoC of the car, load, and demand, which were simplified due to the high capital cost of integrating renewable DGs to the system. With smart charging, electric vehicles may be used sustainably and the load on the grid can be decreased. For a full day, smart charging will be dependent on off-peak and peak usage. Based on the load demand, the simulation results reveal that the losses are compared with the potential EV cases acting as DG using V2G technique and that it is shown that the losses are decreased when EV is acting as DG at all four buses. Therefore, one
594
G. N. Kumar et al.
Table 2 Loss comparison of EV acting as DG using V2G method at various buses Time in hrs
Loss without EV (MW)
Loss with EV (MW)
Loss with EV at 10th bus as DG (MW)
Loss with EV at 10, 14th bus as DG (MW)
Loss with EV at 10, 14, 17th bus as DG (MW)
Loss with EV at 10, 14, 17, and 30th bus as DG (MW)
1
0.13398
0.24684
0.237864
0.2280278
0.2198746
0.210375
2
0.12586
0.23188
0.223278
0.2140776
0.2065228
0.197472
3
0.1218
0.2244
0.216172
0.2072334
0.199903
0.1911888
4
0.11774
0.21692
0.2089538
0.2003892
0.1932084
0.1847934
5
0.11774
0.21692
0.2089538
0.2003892
0.1932084
0.1847934
6
0.1218
0.2244
0.216172
0.2072334
0.199903
0.1911888
7
0.15225
0.2805
0.270028
0.2588828
0.2498694
0.238986
8
0.17661
0.32538
0.3134494
0.3005838
0.2898126
0.2772462
9
0.19082
0.35156
0.338657
0.3247068
0.3131128
0.299574
10
0.19285
0.3553
0.34221
0.3282224
0.3164788
0.3028278
11
0.19082
0.35156
0.338657
0.324819
0.3131128
0.299574
12
0.18879
0.34782
0.33473
0.320892
0.3098216
0.2964324
13
0.18676
0.34408
0.331364
0.317526
0.306493
0.2931786
14
0.19285
0.3553
0.34221
0.3281476
0.316404
0.3026782
15
0.18879
0.34782
0.33473
0.320892
0.3098216
0.2963576
16
0.18473
0.34034
0.327624
0.31416
0.30294
0.2898126
17
0.19488
0.35904
0.345576
0.331364
0.3198074
0.305932
18
0.20097
0.37026
0.3567212
0.342023
0.3297932
0.315469
19
0.203
0.374
0.360162
0.3453516
0.3331592
0.3187228
20
0.19285
0.3553
0.34221
0.327998
0.3164788
0.302753
21
0.1827
0.3366
0.324258
0.3119534
0.2997984
0.2867832
22
0.17255
0.3179
0.30625738
0.29359
0.2831554
0.2708508
23
0.14819
0.27302
0.262922
0.2521134
0.2431748
0.2325906
24
0.12789
0.23562
0.226644
0.217668
0.2098888
0.2007632
Total
4.00722
7.38276
7.10980358
6.8182444
6.5757428
6.2903434
of the promising ways to promote intelligent/smart ways to charge EVs, decrease losses, and thereby reduce losses on the grid is through the function of EV as a DG. The planned analysis is expanded to take into account battery degradation in V2G operation from both a technical and an economic standpoint.
Impact of Electric Vehicle Charging Station in Distribution System …
595
Loss Comparision with EV acting as DG 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Loss with EV at 10th bus as DG
Loss with EV at 10,14th bus as DG
Loss with EV at 10,14,17th bus as DG
Loss with EV at 10,14,17 &30 th bus as DG
Fig. 7 Comparison of loss with EV acting as DG at 10, 14, 17, and 30th bus with time in hours
References 1. European Commission. Transport in Figures—Statistical Pocketbook (2011). Available online: https://ec.europa.eu/transport/facts-fundings/statistics/pocketbook-2011_en/. Accessed on 21 February 2021 2. www.futurelearn.com/info/blog/electric-vehicles-future-transport 3. thisiswhyiride.com/what-is-the-future-scope-of-electric-vehicle-inindia/ 4. Geske M, Komarnicki P, Stötzer M, Styczynski ZA (2010) Modeling and simulation of electric car penetration in the distribution power system—case study. Modern Electric Power Syst 2010:1–6 5. Guille C, Gross G (2008) Design of a conceptual framework for the V2G Implementation. In: 2008 IEEE Energy 2030 conference, pp 1–3. https://doi.org/10.1109/ENERGY.2008.4781057 6. Chukwu UC (2019) The impact of V2G on the distribution system: power factors and power loss issues. Southeast Con 2019:1–4. https://doi.org/10.1109/SoutheastCon42311.2019.9020481 7. Habib S, Kamran M, Rashid U (2015). Impact analysis of vehicle-to-grid technology and charging strategies of electric vehicles on distribution networks—a review. J Power Sour 277. https://doi.org/10.1016/j.jpowsour.2014.12.020 8. Malya P, Fiorini L, Rouhani M, Aiello M (2021) Electric vehicles as distribution grid batteries: a reality check. Energy Inf 4:29. https://doi.org/10.1186/s42162-021-00159-3 9. Choi W et al (2017) Reviews on grid-connected inverter, utility-scaled battery energy storage system, and vehicle-to-grid application—challenges and opportunities. IEEE Transp Electr Conf Expo (ITEC) 2017:203–210. https://doi.org/10.1109/ITEC.2017.7993272 10. www.newswire.ca/news-releases/hydro-quebec-launches-experimental-project-on-plug-invehicles-and-the-power-grid-v2g-v2h-gridbot-canada-selected-to-build-an-advanced-bidire ctional-charging-station-510514651.Copyright © 2022 CNW Group Ltd. All Rights Reserved. A Cision company 11. Kabalci Y (2016) A survey on smart metering and smart grid communication. Renew Sustain Energy Rev 57:302–318. https://doi.org/10.1016/j.rser.2015.12.114 12. Bera S, Misra S, Rodrigues JJPC (2015) Cloud computing applications for smart grid: a survey. IEEE Trans Parallel Distrib Syst 26(5):1474–1494 13. Khurana H, Delgado-Gomes V, Martins JF, Lima C, Borza PN (2015) Smart grid security issues. In: 9th International conference on compatibility and power electronics (CPE)
596
G. N. Kumar et al.
14. Wang W, Lu Z (2013) Cyber security in the smart grid: survey and challenges, computer networks. Int J Comput Telecommun Netw 57(5):1344–1371 15. Khosrojerdi F, Taheri S, Taheri H, Pouresmaeil E (2016) Integration of electric vehicles into a smart power grid: a technical review. In: 2016 IEEE electrical power and energy conference (EPEC), IEEE, pp 1–6 16. Zhou Y (2015) Vehicle to grid technology: a review. https://doi.org/10.1109/ChiCC.2015.726 1068 17. Shariff SM, Iqbal D, Saad Alam M, Ahmad F (2019) Published under license by IOP Publishing Ltd IOP Conference Series: Materials Science and Engineering, Volume 561, First International Conference on Materials Science and Manufacturing Technology 12–13 April 2019, Hotel Aloft, Coimbatore, Tamil Nadu, India. 18. Wu B, Zhou J, Ji X, Yin Y, Shen X (2020) An ameliorated teaching–learning-based optimization algorithm based study of image segmentation for multilevel thresholding using Kapur’s entropy and Otsu’s between class variance. Inf Sci 533:72–107. ISSN 0020-0255. https://doi.org/10. 1016/j.ins.2020.05.033 19. Lance Noel, Gerardo Zarazua de Rubens, Johannes Kester, Benjamin K. Sovacool ,4 January 2018 • © 2018 The Author(s). Published by IOP Publishing Ltd. Environmental Research Letters, Volume 13, 2018 Environ. Res. Lett. 13 013001. 20. Baghmare KK, Daigavane PM () Published under licence by IOP Publishing Ltd Journal of Physics: Conference Series, Volume 2089, 1st International Conference on Applied Mathematics, Modeling and Simulation in Engineering (AMSE) 2021 15–16 September 2021, India (Virtual)Citation K K Baghmare and P M Daigavane 2021 J. Phys.: Conf. Ser. 2089 012011 21. Yiyun T, Can L, Lin C, Lin L (2011) Research on vehicle-to-grid technology. Int Conf Comput Distrib Control Intell Environ Monit 2011:1013–1016. https://doi.org/10.1109/CDCIEM.201 1.194
Ontology-Based Querying from Heterogeneous Sensor Data for Heart Failure Diksha Hooda and Rinkle Rani
Abstract The condition of Heart Failure is considered to be one of the most detrimental and prevalent heart condition that takes lives of so many elderlies as well as youngsters. The traditional approaches are trivially data-driven to recognize the heart failure condition from sensor generated patient clinical data. In this paper, an ontology-based information system to integrate sensor data as patient’s clinical parameters and to detect heart failure condition and abnormality is proposed. The proposed system deploys SWRL rule-based semantic reasoning to extract information from the developed OWL Knowledge Base. The developed ontology is aligned with the vocabulary of the existing state-of-the-art medical oncology, namely SNOMED. The proposed system is instantiated and validated with a real-world dataset containing clinical parameter data with respect to patients. The proposed model shows comparable performance with accuracy of 88% when instantiated with a real-world dataset and compared with the existing traditional approaches. Keywords Heart failure · Sensor data fusion · Ontology · SPARQL query translation · Knowledge engineering · Information systems
1 Introduction In modern history, new illnesses are being recognized and diagnosed in patients all over the world. With each passing day, so many more people are becoming patients and getting diagnosed with illnesses. Cardiovascular illness is one of the most commonly occurring and becoming causes of mortality in developed nations as discussed by Priyadarshini et al. [1]. Across the world, the number of deaths contributing to various heart related illnesses has surged by approximately one third
D. Hooda (B) · R. Rani CSED, Thapar Institute of Engineering and Technology, Patiala, India e-mail: [email protected] R. Rani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Kumar et al. (eds.), Third Congress on Intelligent Systems, Lecture Notes in Networks and Systems 608, https://doi.org/10.1007/978-981-19-9225-4_44
597
598
D. Hooda and R. Rani
times in recent years. The causes of these diseases can vary to a wide extent. Ignorance and negligence are common causes for increased adverse effects of this disease. It is very crucial to perform timely diagnosis of any heart related disease. Diagnosis is a result of the decision-driven process carried out by the physicians to come to a conclusion about the patient’s diseases on the basis of their signs, symptoms and results of the clinical tests conducted. But, this task of manual diagnosis can be delayed due to unavailability of clinicians. Hence, in order to automate this errorprone process for accurate diagnosis, medical decision support information systems need to be developed. But, such systems are not easy to bring to life because of the complexity due to insightful medical knowledge required. Developing a health related decision system can result in more accuracy and effective therapy for the patient. It has been known for a while now that ontologies can be considered a good methodology for encoding domain knowledge in the decision systems presented by Enrico et al., Vikram et al., Baader et al., de Clercq et al. and Noy et al. [2–6]. Ontology is a methodology that promoted sharing of the knowledge in a machine interpretable and processable format. It is defined as ‘a formal explicit description of concepts in a domain of discourse (classes). Relationships denote various features of classes or concepts, and constraints on the classes. Ontologies combined with set of instances constitute an Ontological Knowledge Base as presented by Noy and McGuinness [7]. Additionally, usage of ontology reasoners has proven to be beneficial, in the sense that it promotes shareability of the knowledge as depicted by Tu and Musen [8]. Semantic web technology (OWL, RDF and SPARQL) has now known to be de-facto method for developing applications built upon ontologies as knowledge representation technique for intelligent analysis. OWL with the rule driven SWRL framework and inference engine, contributes to a full-fledged framework for intelligent analysis and development of an information system. In this paper, OWL is used for the construction of ontological Knowledge Base for heart failure detection system. A heart failure ontological Knowledge Base is designed and developed as a major module of the system to perform automated diagnosis and management patient related clinical data.
1.1 Our Contributions • An ontology-based Knowledge Base that integrates sensor data as patient’s clinical parameters is developed. • The novel ontological system detects heart failure condition and abnormality based upon SWRL driven rule framework. • The proposed system constitutes a query module which converts a natural language query into a SPARQL query. • The developed ontology is aligned with existing benchmarked ontology; SNOMED-CT to promote interoperability and information reuse.
Ontology-Based Querying from Heterogeneous Sensor Data for Heart …
599
2 Related Works Hadzic et al. [9] presented an ontology-driven framework for the bio medical domain. The system supports human disease research and resolves medical issues. Though, the proposed system fails to improve the system’s usability. Hervas et al. [10] proposed a Clinical Decision System that estimates the risk and facts of heart diseases. It is based on a conceptualized model that gives recommendations to users/patients. The reasoning process is based on OWL and rules written using SWRL for risk detection. The work presented by Eccher et al. [11] provides an architecture that models a health care task and represents domain medical knowledge for recognition of heart failure condition. Additionally, Hristoskova et al. [12] proposed ontological system to support real-time patient monitoring with heart failure. Another work proposed by Lasierra et al. [13] based on architecture for remote patient monitoring. It is a two layered architecture: first one is a conceptual layer and another communication layer to retrieve and exchange information represented as ontology. Chiarugi et al. [14] presented an approach for heart failure diagnosis and management with respect to patients driven by the domain knowledge acquired. The various small ontologies which are part of the system are developed to achieve faster reasoning as well as for effectively easier maintenance. Jara et al. [15] developed ontology-based information framework that detects myocardial diseases. It can be used for remote monitoring the chronic condition patient cases and exchange of health-related data for larger analysis. Existing works in this domain lack a semantic enrichment in the health monitoring domain. Traditional methodologies revolve around data-driven approaches, which require an external intervention by a human. In order to deal with faulty decisions in case of such data-driven systems, semantic-based knowledge driven approaches need to be implemented. The proposed work is an ontology-based approach that integrates patient’s sensor data for the detection of heart failure condition and any abnormality. It constitutes SWRL-based reasoning to extract crucial patient related information from the developed OWL Knowledge Base. Additionally, a query translator is implemented to perform translation of natural language query into SPARQL query, which does not exist in the existing systems. The sectioning of the paper is done as follows: Sect. 1 describes the introduction. Then Sect. 2 showcases the survey and related works. Further, Sect. 3 comprises of proposed approach and methodology, Sect. 4 contains the results and discussion followed by Sect. 5 containing conclusion.
3 Proposed Approach In the proposed approach, information system for heart failure is developed representing the domain knowledge in the form of ontological Knowledge Base. The ontology developed bridges the gap between sensor observational data and the core
600
D. Hooda and R. Rani
concepts of heart failure. The developed approach consists of 3 components: (1) Patient related Data Transformation (2) Ontological Knowledge Base Development and (3) Querying Translation. Firstly, the patient profiling data is semantically annotated as per the conceptual schema of the observational aspect of the ontology. Secondly, construction of the domain ontology is done based upon the knowledge acquisition task. Then, query processing phase gets the user query from the end user and fetches the results based upon the corresponding SPARQL translation. Figure 1 shows the architectural diagram of the proposed system. The detail description of the components is as follows: 1. Patient Related Data Transformation: In order to perform patient related data analysis tasks, the sensor data is semantically annotated corresponding to the designed observational ontology. The patient related data from patient health records is extracted and represented in the ontological KB. This formalized addition of the procedural knowledge to map patient clinical data has significant importance. This semantic mapping of sensor driven clinical data ensures transparency and further processing for reproducibility of results. As an outcome of implementation of this first component of the information system, a formal schema conceptually defines patient profiles in which data corresponding to various patients is represented. This operational part of the ontology provides solution for the management and fusion of sensor generated patient clinical data. The patient related data is inserted into the ontology as instances of the concept “Patient”. The data may be categorical, say “Patient_chest_pain” or of numerical type as values of serum cholesterol w.r.t. the patient.
Fig. 1 Architecture of proposed system
Ontology-Based Querying from Heterogeneous Sensor Data for Heart …
601
2. Ontological Knowledge Base Development: The knowledge acquisition corresponding to Heart Failure signs and symptoms, patient physiological parameters and its representation is crucial for the implementation of this component. The domain knowledge gathering is of utmost importance since it determines the accuracy and correctness of the outcomes of the proposed information system. The reasoning process built upon the domain knowledge representation will infer the diagnosis results of the patients. To implement this component, gaining consensual domain information such as detailed description of heart failure condition, signs and symptoms, clinical parameters value and treatments corresponding to the diagnosis is the fundamental step. Once the domain knowledge is gathered, then the conceptual schema to represent this knowledge in the form of triples as per the ontology is designed. Figure 2 shows graphical representation of logical schema of the developed ontology. In order to build the Ontological Knowledge Base for Heart Failure the following building process is adopted: • • • • • • •
Determining the scope and domain. Conceptualize the most crucial terms in the ontology. Defining the taxonomy of the ontological terms and concepts. Defining relationships between the classes/concepts. Defining the various facets of slots. Creating the instances of the defined concepts. Apply rule driven ontological reasoning for inference of new information.
Fig. 2 Ontological KB
602
D. Hooda and R. Rani
Table 1 Rule base Rule # SWRL rule
Description V V
V
1
Patient(?p) hasPara(?p,?par) Produces the suggested treatment corresponding to treatment of future heart Type(?par,?sen) Sensor(?sen) SensorOutput(?sen, literal) hasStatus(?p, failure ?sign) hasSymp(?p, ?Symp) hasHFevent(?p, ?hf) hasDatetime(?hf, literal) → hasTreatment(?p,?t)
2
Patient(?p) hasPara(?p, SBP) swrlb:lessThan(SBP,120) hasPara(?p, DBP) swrlb:greaterThan(DBP,90) swrlb:lessThan(SBP,60) hasPara(?p, FBS) swrlb:greaterThan(FBS,6) has Age(?p,?a) swrlb:greaterThan(?a,55) hasBMI (?p, “high”) → hasHFevent(?p, “status”)
3
Patient(?p) hasSymp(?p,“ChestPain_high”) hasSymp(?p, “shortness_of_breath”) hasSymp(?p,“nausea”) hasSymp(?p,“vomiting”) hasSymp(?p,“high Heartrate”) → hasHFevent(?p, “alert”)
4
Patient(?p) hasHFevent(?p,“alert”) → hasSuggestions (?p,“String”)
V
V
V
V
V
Shows the boundary values of some of the parameters for the event of heart failure to occur
V
V
V
V
V
V
V
V
V Shows the alert status of heart failure in case of severe chest pain and shortness of breath and other symptoms
V
V
V
V
V
V
V
Treatment for alert heart failure condition if there is time
V
3.1 Developing Rule Base In this step, Semantic Web Rule Language (SWRL) framework is deployed to design and create SWRL based rules on relevant relationships for detection of Heart Failure from the clinical patient related data values along with the given symptoms. The developed rule base consists of a set of these SWRL axioms that will be applied to entail new information from the Ontological knowledge system. All the constructed rules will contain atoms of the form: (concept, property and instance). The rules comprising of two parts; antecedent and consequent, consisting of user defined and built-in atoms are stored as formalized representation using Ontology Web Language (OWL) in the domain ontology. Table 1 shows few of the rules that make up the Rule Base of the proposed system.
3.2 Reasoning Mechanism In order to represent and detect the condition of a potential heart failure of a patient based upon the clinical parameters, reasoning over the developed ontological KB is carried out. The description logic reasoned, Pellet [16] is deployed for inference of
Ontology-Based Querying from Heterogeneous Sensor Data for Heart …
603
results based upon the instantiated ontology with the observation data. The reasoning mechanism is carried out on data fetched from patient records and represented as factual knowledge in the ontology. This process of activating the Pellet reasoned involves giving the set of rules developed in the previous step as input. The overall complexity of this resultant transformation of patient’s data into ontological format and the reasoning process from the ontological form depends upon the number of parameters involved in the form of triples and variables. 3. Query Translation: This component of the information system comprises of three modules. These are designed to handle user queries in natural language and translate them into SPARQL queries to fetch information from the Ontologydriven Knowledge Base. It is based on Natural Language Processing techniques. The three modules comprising the query translation process as shown in Fig. 3 are as follows: (1) User Query Processing, (2) Query Mapping and Validation, (3) Query Builder. The modules are discussed in detail as follows: 3.1. User Query Processing: In this module, Natural language processing (NLP) techniques are used to process and simply the query to be used as an input for the next module. The query processing includes tokenization task followed by stemming and removal of stop words in the input query. Firstly, the user query is divided into a set of tokens, then all the stop words like is, are, out, etc. are extracted and the rest of the tokens are stored for further processing. Second step is to perform stemming, where a set of tokens are filtered to preserve their root/stem in order to deal with redundancy. Thirdly, modifiers and numbers are separated from the tokens. Figure 3 depicts the query processing and translation steps. Hence, the resultant set of tokens is supplied as input to the next module. 3.2. Query Mapping and Validation: In this module, the filtered set of tokens is aligned with the entities of the Ontological Knowledge Base. It is done in the following ways: • In the first step, all the entities, properties and concepts are extracted from the ontology as keywords for the mapping task.
Fig. 3 Query processing and translation steps
604
D. Hooda and R. Rani
Table 2 SPARQL query template
S. No.
SPARQL template
1
SELECT DISTINCT ?uri WHERE {