146 39 25MB
English Pages 804 [768] Year 2023
Lecture Notes in Networks and Systems 537
Aboul Ella Hassanien Oscar Castillo Sameer Anand Ajay Jaiswal Editors
International Conference on Innovative Computing and Communications Proceedings of ICICC 2023, Volume 3
Lecture Notes in Networks and Systems Volume 537
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Aboul Ella Hassanien · Oscar Castillo · Sameer Anand · Ajay Jaiswal Editors
International Conference on Innovative Computing and Communications Proceedings of ICICC 2023, Volume 3
Editors Aboul Ella Hassanien IT Department Faculty of Computers and Information Cairo University Giza, Egypt Sameer Anand Department of Computer Science Shaheed Sukhdev College of Business Studies University of Delhi New Delhi, Delhi, India
Oscar Castillo Institute of Technology Tijuana Tijuana, Mexico Ajay Jaiswal Department of Computer Science Shaheed Sukhdev College of Business Studies University of Delhi New Delhi, Delhi, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-3009-8 ISBN 978-981-99-3010-4 (eBook) https://doi.org/10.1007/978-981-99-3010-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Prof. (Dr.) Aboul Ella Hassanien would like to dedicate this book to his wife Nazaha Hassan. Dr. Sameer Anand would like to dedicate this book to his Dada Prof. D. C. Choudhary, his beloved wife Shivanee, and his son Shashwat. Dr. Ajay Jaiswal would like to dedicate this book to his father Late Prof. U. C. Jaiswal, his mother Brajesh Jaiswal, his beloved wife Anjali, his daughter Prachii, and his son Sakshaum.
ICICC-2023 Steering Committee Members
Patrons Dr. Poonam Verma, Principal, SSCBS, University of Delhi Prof. Dr. Pradip Kumar Jain, Director, National Institute of Technology Patna, India
General Chairs Dr. Prabhat Kumar, National Institute of Technology Patna, India Prof. Oscar Castillo, Tijuana Institute of Technology, Mexico
Honorary Chairs Prof. Dr. Janusz Kacprzyk, FIEEE, Polish Academy of Sciences, Poland Prof. Dr. Vaclav Snasel, Rector, VSB-Technical University of Ostrava, Czech Republic
Conference Chairs Prof. Dr. Aboul Ella Hassanien, Cairo University, Egypt Prof. Dr. Joel J. P. C. Rodrigues, National Institute of Telecommunications (Inatel), Brazil Prof. Dr. R. K. Agrawal, Jawaharlal Nehru University, Delhi
vii
viii
ICICC-2023 Steering Committee Members
Technical Program Chairs Prof. Dr. A. K. Singh, National Institute of Technology, Kurukshetra Prof. Dr. Anil K. Ahlawat, KIET Group of Institutes, Ghaziabad
Editorial Chairs Prof. Dr. Abhishek Swaroop, Bhagwan Parshuram Institute of Technology, Delhi Prof. Dr. Arun Sharma, Indira Gandhi Delhi Technical University for Women, Delhi
Conveners Dr. Ajay Jaiswal, SSCBS, University of Delhi Dr. Sameer Anand, SSCBS, University of Delhi Dr. Deepak Gupta, Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi
Organizing Secretaries Dr. Ashish Khanna, Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi Dr. Gulshan Shrivastava, National Institute of Technology Patna, India
Publication Chair Dr. Vicente García Díaz, University of Oviedo, Spain
Co-convener Mr. Moolchand Sharma, Maharaja Agrasen Institute of Technology, India
ICICC-2023 Steering Committee Members
Organizing Chairs Dr. Kumar Bijoy, SSCBS, University of Delhi Dr. Rishi Ranjan Sahay, SSCBS, University of Delhi Dr. Amrina Kausar, SSCBS, University of Delhi Dr. Abhishek Tandon, SSCBS, University of Delhi
Organizing Team Dr. Gurjeet Kaur, SSCBS, University of Delhi Dr. Abhimanyu Verma, SSCBS, University of Delhi Dr. Onkar Singh, SSCBS, University of Delhi Dr. Kalpna Sagar, KIET Group of Institutes, Ghaziabad Dr. Suresh Chavhan, Vellore Institute of Technology, Vellore, India Dr. Mona Verma, SSCBS, University of Delhi
ix
Preface
We hereby are delighted to announce that Shaheed Sukhdev College of Business Studies, New Delhi, in association with National Institute of Technology Patna, and University of Valladolid Spain has hosted the eagerly awaited and much coveted International Conference on Innovative Computing and Communication (ICICC-2023) in Hybrid Mode. The sixth version of the conference was able to attract a diverse range of engineering practitioners, academicians, scholars, and industry delegates, with the reception of abstracts including more than 3400 authors from different parts of the world. The committee of professionals dedicated toward the conference is striving to achieve a high-quality technical program with tracks on innovative computing, innovative communication network and security, and Internet of Things. All the tracks chosen in the conference are interrelated and are very famous among present-day research community. Therefore, a lot of research is happening in the above-mentioned tracks and their related sub-areas. As the name of the conference starts with the word “innovation”, it has targeted out of box ideas, methodologies, applications, expositions, surveys, and presentations helping to upgrade the current status of research. More than 850 full-length papers have been received, among which the contributions are focused on theoretical, computer simulation-based research, and laboratory-scale experiments. Among these manuscripts, 200 papers have been included in the Springer proceedings after a thorough two-stage review and editing process. All the manuscripts submitted to the ICICC-2023 were peer-reviewed by at least two independent reviewers, who were provided with a detailed review proforma. The comments from the reviewers were communicated to the authors, who incorporated the suggestions in their revised manuscripts. The recommendations from two reviewers were taken into consideration while selecting a manuscript for inclusion in the proceedings. The exhaustiveness of the review process is evident, given the large number of articles received addressing a wide range of research areas. The stringent review process ensured that each published manuscript met the rigorous academic and scientific standards. It is an exalting experience to finally see these elite contributions materialize into three book volumes as ICICC-2023 proceedings by Springer entitled “International Conference on Innovative Computing and Communications”. The articles are organized into three volumes in some broad categories covering subject xi
xii
Preface
matters on machine learning, data mining, big data, networks, soft computing, and cloud computing, although given the diverse areas of research reported it might not have been always possible. ICICC-2023 invited three keynote speakers, who are eminent researchers in the field of computer science and engineering, from different parts of the world. In addition to the plenary sessions on each day of the conference, ten concurrent technical sessions are held every day to assure the oral presentation of around 200 accepted papers. Keynote speakers and session chair(s) for each of the concurrent sessions have been leading researchers from the thematic area of the session. A technical exhibition is held during all the 2 days of the conference, which has put on display the latest technologies, expositions, ideas, and presentations. The research part of the conference was organized in a total of 26 special sessions. These special sessions and international workshops provided the opportunity for researchers conducting research in specific areas to present their results in a more focused environment. An international conference of such magnitude and release of the ICICC-2023 proceedings by Springer has been the remarkable outcome of the untiring efforts of the entire organizing team. The success of an event undoubtedly involves the painstaking efforts of several contributors at different stages, dictated by their devotion and sincerity. Fortunately, since the beginning of its journey, ICICC-2023 has received support and contributions from every corner. We thank them all who have wished the best for ICICC-2023 and contributed by any means toward its success. The edited proceedings volumes by Springer would not have been possible without the perseverance of all the steering, advisory, and technical program committee members. All the contributing authors owe thanks from the organizers of ICICC-2023 for their interest and exceptional articles. We would also like to thank the authors of the papers for adhering to the time schedule and for incorporating the review comments. We wish to extend my heartfelt acknowledgment to the authors, peer-reviewers, committee members, and production staff whose diligent work put shape to the ICICC-2023 proceedings. We especially want to thank our dedicated team of peerreviewers who volunteered for the arduous and tedious step of quality checking and critique on the submitted manuscripts. We wish to thank my faculty colleagues Mr. Moolchand Sharma for extending their enormous assistance during the conference. The time spent by them and the midnight oil burnt is greatly appreciated, for which we will ever remain indebted. The management, faculties, administrative, and support staff of the college have always been extending their services whenever needed, for which we remain thankful to them. Lastly, we would like to thank Springer for accepting our proposal for publishing the ICICC-2023 conference proceedings. Help received from Mr. Aninda Bose, the acquisition senior editor, in the process has been very useful. New Delhi, India
Ajay Jaiswal Sameer Anand Conveners, ICICC-2023
Contents
Joint Identification and Clustering Using Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimple Sethi, Chandra Prakash, and Sourabh Bharti Comparative Analysis of Deep Learning with Different Optimization Techniques for Type 2 Diabetes Mellitus Detection Using Gene Expression Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karuna Middha and Apeksha Mittal
1
13
Differential Analysis of MOOC Models for Increasing Retention and Evaluation of the Performance of Proposed Model . . . . . . . . . . . . . . . . Harsh Vardhan Pant and Manoj Chandra Lohani
29
Deep Convolutional Neural Networks Network with Transfer Learning for Image-Based Malware Analysis . . . . . . . . . . . . . . . . . . . . . . . . . V. S. Jeyalakshmi, N. Krishnan, and J. Jayapriya
39
Analysis of Network Failure Detection Using Machine Learning in 5G Core Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anjali Rajak and Rakesh Tripathi
53
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty, Reliable, and Timely Emergency Message Dissemination in VANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mahabaleshwar Kabbur and M. Vinayaka Murthy Machine Learning Algorithms for Prediction of Mobile Phone Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jinsi Jose, Vinesh Raj, Sweana Vakkayil Seaban, and Deepa V. Jose Customized CNN for Traffic Sign Recognition Using Keras Pre-Trained Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vaibhav Malpani, Sanyam Shukla, Manasi Gyanchandani, and Saurabh Shrivastava
63
81
91
xiii
xiv
Contents
Underwater Image Enhancement and Restoration Using Cycle GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chereddy Spandana, Ippatapu Venkata Srisurya, A. R. Priyadharshini, S. Krithika, S. Aasha Nandhini, R. Prasanna Kumar, and G. Bharathi Mohan
99
Implementation of Machine Learning Techniques in Breast Cancer Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Mitanshi Rastogi, Meenu Vijarania, and Neha Goel Performance and Analysis of Propagation Delay in the Bitcoin Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Shahanawaj Ahamad, Suryansh Bhaskar Talukdar, Rohit Anand, Veera Talukdar, Sanjiv Kumar Jain, and Arpit Namdev Machine Learning Analysis on Predicting Credit Card Forgery . . . . . . . . 137 S. Janani, M. Sivarathinabala, Rohit Anand, Shahanawaj Ahamad, M. Ahmer Usmani, and S. Mahabub Basha GanCOV: A Deep Learning and GAN-Based Algorithm to Detect COVID-19 Through Lung X-Ray Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Apratim Shrivastav, Lakshmi Sai Srikar Vadlamani, and Rajni Jindal Text-to-Image Synthesis using BERT Embeddings and Multi-Stage GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Poonam Rani, Devender Kumar, Nupur Sudhakar, Deepak Prakash, and Shubham Application of Convoluted Brainwaves for Efficient Identification of Eating Disorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Shipra Swati and Mukesh Kumar A New Adaptive Digital Signal Processing Algorithm . . . . . . . . . . . . . . . . . 177 Shiv Ram Meena and Chandra Shekhar Rai Building Domain-Specific Sentiment Lexicon Using Random Walk-Based Model on Common-Sense Semantic Network . . . . . . . . . . . . . 193 Minni Jain, Rajni Jindal, and Amita Jain An Optimized Path Selection Algorithm for the Minimum Number of Turns in Path Planning Using a Modified A-Star Algorithm . . . . . . . . . 205 Narayan Kumar and Amit Kumar Predicting Corresponding Ratings from Goodreads Book Reviews . . . . . 215 Abhigya Verma, Nandini Baliyan, Pooja Gera, and Shweta Singhal A Precise Smart Parking Model with Applied Wireless Sensor Network for Urban Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Ishu Kumar, Sejal Sahu, Rebanta Chakraborty, Sushruta Mishra, and Vikas Chaudhary
Contents
xv
An Ensemble Learning Approach for Detection of COVID-19 Using Chest X-Ray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Aritra Nandi, Shivam Yadav, Asmita Hobisyashi, Arghyadeep Ghosh, Sushruta Mishra, and Vikas Chaudhary ECG-Based Cardiac Abnormalities Analysis Using Adaptive Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Prapti Patra, Vishisht Ved, Sourav Chakraborty, Sushruta Mishra, and Vikas Chaudhary A Novel Dataframe Creation and 1D CNN Model for Subject-Independent Emotion Classification from Raw EEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Pooja Manral and K. R. Seeja Generic Recommendation System for Business Process Modeling . . . . . . 267 J L Shreya, Anu Saini, Sunita Kumari, and Astha Jain Parkinson Risks Determination Using SVM Coupled Stacking . . . . . . . . . 283 Supratik Dutta, Sibasish Choudhury, Adrita Chakraborty, Sushruta Mishra, and Vikas Chaudhary Customer Feedback Analysis for Smartphone Reviews Using Machine Learning Techniques from Manufacturer’s Perspective . . . . . . . 293 Anuj Agrawal, Siddharth Dubey, Prasanjeet Singh, Sahil Verma, and Prabhat Kumar Fitness Prediction in High-Endurance Athletes and Sports Players Using Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Shashwath Suvarna, C. Sindhu, Sreekant Nair, and Aditya Naidu Kolluru Grade It: A Quantitative Essay Grading System . . . . . . . . . . . . . . . . . . . . . . 317 Roopchand Reddy Vanga, M. S. Bharath, C. Sindhu, G. Vadivu, and Hsiu Chun Hsu Spam Detection Using Naïve Bayes and Trigger-Based Filter . . . . . . . . . . 329 Deepali Virmani, Sonakshi Vij, Abhishek Dwivedi, Ayush Chaurasia, and Vidhi Karnwal Security Enhancer Novel Framework for Network Applications . . . . . . . . 341 Vishal Kumar Image Tagging Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Rajeswara Rao Duvvada, Vijaya Kumari Majji, Sai Pavithra Nandyala, and Bhavana Vennam Data Driven Scheme for MEMS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Satyavir Singh
xvi
Contents
Detection and Mitigation of ARP Spoofing Attack . . . . . . . . . . . . . . . . . . . . 383 Swati Jadhav, Arjun Thakur, Shravani Nalbalwar, Shubham Shah, and Sankalp Chordia Stochastic Differential Equation-Based Testing Coverage SRGM by Using ANN Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Ritu Bibyan, Sameer Anand, Anu G. Aggarwal, and Abhishek Tandon Integrated Quantum Health Care with Predictive Intelligence Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Tridiv Swain, Sushruta Mishra, Deepak Gupta, and Ahmed Alkhayyat A Smart Data-Driven Prototype for Depression and Stress Tracking in Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Pragya Pranjal, Saahil Mallick, Malvika Madan, Sushruta Mishra, Ahmed Alkhayyat, and Smaraki Bhaktisudha Applied Computational Intelligence for Breast Cancer Detection . . . . . . . 435 Bhavya Dua, Kaushiki Kriti, Sushruta Mishra, Chitra Shashidhar, Marcello Carvalho dos Reis, and Victor Hugo C. de Albuquerque Crop Yield Forecasting with Precise Machine Learning . . . . . . . . . . . . . . . 445 Swayam Verma, Shashwat Sinha, Pratima Chaudhury, Sushruta Mishra, and Ahmed Alkhayyat Recommendation Mechanism to Forge Connections Between Users with Similar Interests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Indrakant Dana, Udit Agarwal, Akshat Ajay, Saurabh Rastogi, and Ahmed Alkhayyat Identification of Device Type Using Transformers in Heterogeneous Internet of Things Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Himanshu Sharma, Prabhat Kumar, and Kavita Sharma A Novel Approach for Effective Classification of Brain Tumors Using Hybrid Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 Ananapareddy V. N. Reddy, A. Kavya, B. Rohith, B. Narasimha Rao, and L. Harshada An Application of Multilayer Perceptron for the Prediction of River Water Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Rozaida Ghazali, Norfar Ain Mohd Fuzi, Salama A. Mostafa, Umar Farooq Khattak, and Rabei Raad Ali ELM-MFO: A New Nature-Inspired Predictive Model for Financial Contagion Modeling of Indian Currency Market . . . . . . . . . 511 Swaty Dash, Pradip Kumar Sahu, and Debahuti Mishra
Contents
xvii
Plant Disease Detection Using Fine-Tuned ResNet Architecture . . . . . . . . 527 Jalluri Geetha Renuka, Goberu Likhitha, Vamsi Krishna Modala, and Duggi Manikanta Reddy Data Mining Approach in Predicting House Price for Automated Property Appraiser Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Naeem Th. Yousir, Shaymaa Mohammed Abdulameer, and Salama A. Mostafa IPSO-SMOTE-AdaBoost: An Optimized Class Imbalance Strategy Using Boosting and PSO Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Sarvani Anandarao, Polani Veenadhari, Gudivada Sai Priya, and Ginjupalli Raviteja HMLF_CDD_SSBM: A Hybrid Machine Learning Framework for Cardiovascular Disease Diagnosis Prediction Using the SMOTE Stacking Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Satuluri Naganjaneyulu, Gurija Akanksha, Shaik Shaheeda, and Mohammed Sadhak Optimal and Virtual Multiplexer Resource Provisioning in Multiple Cloud Service Provider System . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Phaneendra Kanakamedala, M. Babu Reddy, G. Dinesh Kumar, M. Srinivasa Sesha Sai, and P. Ashok Reddy AI with Deep Learning Model-Based Network Flow Anomaly Cyberattack Detection and Classification Model . . . . . . . . . . . . . . . . . . . . . . 599 Sara A. Althubiti Golden Jackal Optimization with Deep Learning-Based Anomaly Detection in Pedestrian Walkways for Road Traffic Safety . . . . . . . . . . . . . 617 Saleh Al Sulaie Explainable Artificial Intelligence-Enabled Android Malware Detection Model for Cybersecurity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Laila Almutairi Observing Different Machine Learning Approaches for Students’ Performance Using Demographic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 Neeraj Kumar Srivastava, Prafull Pandey, Manoj Kumar Mishra, and Vikas Mishra A Workflow Allocation Strategy Using Elitist Teaching–Learning-Based Optimization Algorithm in Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 Mohammad Imran, Faraz Hasan, Faisal Ahmad, and Mohammad Shahid Churn Prediction Algorithm Optimized and Ameliorated . . . . . . . . . . . . . 677 Vani Nijhawan, Mamta Madan, and Meenu Dave
xviii
Contents
Employee Turnover Prediction Using Machine Learning . . . . . . . . . . . . . . 693 Mukesh Dhetarwal, Azhar Ashraf, Sahil Verma, Kavita, and Babita Rawat Smart Card Security Model Based on Sensitive Information . . . . . . . . . . . 703 Reem M. Abdullah and Sundos A. Hameed Alazawi Brain Tumor Classification from MRI Scans . . . . . . . . . . . . . . . . . . . . . . . . . 713 Aman Bahuguna, Azhar Ashraf, Kavita, Sahil Verma, and Poonam Negi Recognition of Handwritten Digits Using Convolutional Neural Network in Python and Comparison of Performance for Various Hidden Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727 Himansh Gupta, Amanpreet Kaur, Kavita, Sahil Verma, and Poonam Rawat Medical Image Watermarking Using Slantlet Transform and Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 Eko Hari Rachmawanto, Lahib Nidhal Dawd, Christy Atika Sari, Rabei Raad Ali, Wisam Subhi Al-Dayyeni, and Mohammed Ahmed Jubair Voice Email for the Visually Disabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 Randeep Thind, K. Divya, Sahil Verma, Kavita, Navneet Kaur, and Vaibhav Uniyal Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771
Editors and Contributors
About the Editors Prof. Aboul Ella Hassanien is the Founder and Head of the Egyptian Scientific Research Group (SRGE) and a Professor of Information Technology at the Faculty of Computer and Artificial Intelligence, Cairo University. Professor Hassanien is an ex-dean of the faculty of computers and information, Beni Suef University. Professor Hassanien has more than 800 scientific research papers published in prestigious international journals and over 40 books covering such diverse topics as data mining, medical images, intelligent systems, social networks, and smart environment. Prof. Hassanien won several awards, including the Best Researcher of the Youth Award of Astronomy and Geophysics of the National Research Institute, Academy of Scientific Research (Egypt, 1990). He was also granted a scientific excellence award in humanities from the University of Kuwait for the 2004 Award and received the scientific—University Award (Cairo University, 2013). Also, He was honored in Egypt as the best researcher at Cairo University in 2013. He was also received the Islamic Educational, Scientific and Cultural Organization (ISESCO) prize on Technology (2014) and received the State Award for excellence in engineering sciences 2015. He was awarded the medal of Sciences and Arts of the first class by the President of the Arab Republic of Egypt, 2017. Oscar Castillo holds the Doctor in Science degree (Doctor Habilitatus) in Computer Science from the Polish Academy of Sciences (with the Dissertation “Soft Computing and Fractal Theory for Intelligent Manufacturing”). He is a Professor of Computer Science in the Graduate Division, Tijuana Institute of Technology, Tijuana, Mexico. In addition, he is serving as Research Director of Computer Science and head of the research group on Hybrid Fuzzy Intelligent Systems. Currently, he is President of HAFSA (Hispanic American Fuzzy Systems Association) and Past President of IFSA (International Fuzzy Systems Association). Prof. Castillo is also Chair of the Mexican Chapter of the Computational Intelligence Society (IEEE). He also belongs to the
xix
xx
Editors and Contributors
Technical Committee on Fuzzy Systems of IEEE and to the Task Force on “Extensions to Type-1 Fuzzy Systems”. He is also a member of NAFIPS, IFSA and IEEE. He belongs to the Mexican Research System (SNI Level 3). His research interests are in Type-2 Fuzzy Logic, Fuzzy Control, Neuro-Fuzzy and Genetic-Fuzzy hybrid approaches. He has published over 300 journal papers, 10 authored books, 50 edited books, 300 papers in conference proceedings, and more than 300 chapters in edited books, in total more than 998 publications (according to Scopus) with h index of 80 according to Google Scholar. He has been Guest Editor of several successful Special Issues in the past, like in the following journals: Applied Soft Computing, Intelligent Systems, Information Sciences, Soft Computing, Non-Linear Studies, Fuzzy Sets and Systems, JAMRIS and Engineering Letters. He is currently Associate Editor of the Information Sciences Journal, Journal of Engineering Applications on Artificial Intelligence, International Journal of Fuzzy Systems, Journal of Complex and Intelligent Systems, Granular Computing Journal and Intelligent Systems Journal (Wiley). He was Associate Editor of Journal of Applied Soft Computing and IEEE Transactions on Fuzzy Systems. He has been elected IFSA Fellow in 2015 and MICAI Fellow in 2016. Finally, he recently received the Recognition as Highly Cited Researcher in 2017 and 2018 by Clarivate Analytics and Web of Science. Dr. Sameer Anand is currently working as an Assistant professor in the Department of Computer science at Shaheed Sukhdev College of Business Studies, University of Delhi, Delhi. He has received his M.Sc., M.Phil., and Ph.D. (Software Reliability) from Department of Operational Research, University of Delhi. He is a recipient of ‘Best Teacher Award’ (2012) instituted by Directorate of Higher Education, Government of NCT, Delhi. The research interest of Dr. Anand includes Operational Research, Software Reliability and Machine Learning. He has completed an Innovation project from the University of Delhi. He has worked in different capacities in International Conferences. Dr. Anand has published several papers in the reputed journals like IEEE Transactions on Reliability, International journal of Production Research (Taylor & Francis), International Journal of Performability Engineering etc. He is a member of Society for Reliability Engineering, Quality and Operations Management. Dr. Sameer Anand has more than 16 years of teaching experience. Dr. Ajay Jaiswal is currently serving as an Assistant Professor in the Department of Computer Science of Shaheed Sukhdev College of Business Studies, University of Delhi, Delhi. He is co-editor of two books/Journals and co-author of dozens of research publications in International Journals and conference proceedings. His research interest includes pattern recognition, image processing, and machine learning. He has completed an interdisciplinary project titled “Financial InclusionIssues and Challenges: An Empirical Study” as Co-PI. This project was awarded by the University of Delhi. He obtained his masters from the University of Roorkee (now IIT Roorkee) and Ph.D. from Jawaharlal Nehru University, Delhi. He is a recipient of the best teacher award from the Government of NCT of Delhi. He has more than nineteen years of teaching experience.
Editors and Contributors
xxi
Contributors Shaymaa Mohammed Abdulameer College of Information Engineering, AlNahrain University, Baghdad, Iraq Reem M. Abdullah Computer Science Department, Al-Mustansiriyah University, Baghdad, Iraq Udit Agarwal Maharaja Agrasen Institute of Technology, New Delhi, India Anu G. Aggarwal Department of Operational Research, University of Delhi, Delhi, India Anuj Agrawal Computer Science and Engineering Department, National Institute of Technology Patna, Patna, India Shahanawaj Ahamad College of Computer Science and Engineering, University of Hail, Hail City, Saudi Arabia Faisal Ahmad Workday Inc., Pleasanton, USA Akshat Ajay Maharaja Agrasen Institute of Technology, New Delhi, India Gurija Akanksha Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Saleh Al Sulaie Department of Industrial Engineering, College of Engineering in Al-Qunfudah, Umm Al-Qura University, Makkah, Saudi Arabia Wisam Subhi Al-Dayyeni Department of Computer Techniques Engineering, Dijlah University College, Baghdad, Iraq Rabei Raad Ali School of Information Technology, UNITAR International University, Petaling Jaya, Malaysia; Department of Computer Engineering Technology, Northern Technical University, Mosul, Iraq; National University of Science and Technology, Thi-Qar, Iraq Ahmed Alkhayyat College of Technical Engineering, The Islamic University, An Najaf, Iraq Laila Almutairi Department of Computer Engineering, College of Computer and Information Sciences, Majmaah University, Al-Majmaah, Saudi Arabia Sara A. Althubiti Department of Computer Science, College of Computer and Information Sciences, Majmaah University, Al-Majmaah, Saudi Arabia Rohit Anand Department of ECE, G. B. Pant DSEU Okhla-I Campus (Formerly G. B. Pant Engineering College), New Delhi, India Sameer Anand Department of Operational Research, University of Delhi, Delhi, India
xxii
Editors and Contributors
Sarvani Anandarao Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Azhar Ashraf Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, Punjab, India Aman Bahuguna Department of Computer Science and Engineering, Chandigarh University Mohali, Punjab, India Nandini Baliyan Indira Gandhi Delhi Technical University, New Delhi, India S. Mahabub Basha Department of Commerce, IIBS Bangalore Airport Campus, Bengaluru, Karnataka, India Smaraki Bhaktisudha Kalinga Institute of Industrial Technology University, Bhubaneswar, India M. S. Bharath Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, India Sourabh Bharti Nimbus Research Centre, Munster Technological University, Cork, Ireland Ritu Bibyan Department of Operational Research, University of Delhi, Delhi, India Adrita Chakraborty Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India Rebanta Chakraborty Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India Sourav Chakraborty Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India Manoj Chandra Lohani School of Computing, Graphic Era Hill University, Bhimtal Campus, Uttarakhand, India Vikas Chaudhary AI and DS Department, GNIOT, Greater Noida, India Pratima Chaudhury Kalinga Institute Of Industrial Technology, Deemed to be University, Bhubaneswar, India Ayush Chaurasia Vivekananda Institute of Professional Studies, Pitampura, New Delhi, India Sankalp Chordia Department of Computer Engineering, Vishwakarma Institute of Technology Pune, Pune, India Sibasish Choudhury Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India Indrakant Dana Maharaja Agrasen Institute of Technology, New Delhi, India
Editors and Contributors
xxiii
Swaty Dash Department of Information Technology, Veer Surendra Sai University of Technology, Burla, Sambalpur, Odisha, India Meenu Dave JaganNath University, Jaipur, India Lahib Nidhal Dawd Department of Computer Techniques Engineering, Dijlah University College, Baghdad, Iraq Victor Hugo C. de Albuquerque Department of Teleinformatics Engineering, Federal University of Ceará, Fortaleza, Brazil Mukesh Dhetarwal Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, India K. Divya Department of Computer Science and Engineering, Chandigarh University, Gharuan, India Marcello Carvalho dos Reis Federal Institute of Education, Science and Technology of Ceará, Fortaleza, Ceará, Brazil Bhavya Dua Kalinga Institute of Industrial Technology University, Bhubaneswar, India Siddharth Dubey Computer Science and Engineering Department, National Institute of Technology Patna, Patna, India Supratik Dutta Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India Rajeswara Rao Duvvada Department of Computer Science and Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India Abhishek Dwivedi Vivekananda Institute of Professional Studies, Pitampura, New Delhi, India Norfar Ain Mohd Fuzi Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Batu Pahat, Johor, Malaysia Pooja Gera Indira Gandhi Delhi Technical University, New Delhi, India Rozaida Ghazali Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Batu Pahat, Johor, Malaysia Arghyadeep Ghosh Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneshwar, India Neha Goel VIPS, Delhi, India Deepak Gupta Maharaja Agrasen Institute of Technology, Delhi, India Himansh Gupta Department of CSE, Chandigarh University, Gharuan, India Manasi Gyanchandani MANIT, Bhopal, India
xxiv
Editors and Contributors
Sundos A. Hameed Alazawi Computer Science Department, Al-Mustansiriyah University, Baghdad, Iraq L. Harshada Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Faraz Hasan Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, India Asmita Hobisyashi Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneshwar, India Hsiu Chun Hsu Department of Information Management, National Chung Cheng University, Minxiong, Taiwan Mohammad Imran Department of Computer Science, Aligarh Muslim University, Aligarh, India Swati Jadhav Department of Computer Engineering, Vishwakarma Institute of Technology Pune, Pune, India Amita Jain Computer Science and Engineering, Netaji Subhas University of Technology, Delhi, India Astha Jain Tata Consultancy Services Limited, Noida, Uttar Pradesh, India Minni Jain Computer Science and Engineering, Delhi Technological University, Delhi, India Sanjiv Kumar Jain Department of EE, Medi-Caps University, Indore, MP, India S. Janani Department of ECE, Periyar Maniammai Institute of Science and Technology, Thanjavur, Tamil Nadu, India J. Jayapriya Department of Computer Science (YPR Campus), Christ University, Bangalore, India V. S. Jeyalakshmi Centre for Information Technology and Engineering, Manonmaniam Sundaranar University, Tirunelveli, India Rajni Jindal Computer Science and Engineering, Delhi Technological University, Delhi, India Deepa V. Jose Department of Computer Science, Christ University, Bangalore, India Jinsi Jose Department of Computer Science, Rajagiri College of Social Sciences, Kalamassery, India Mohammed Ahmed Jubair Department of Computer Technical Engineering, College of Information Technology, Imam Ja’afar Al-Sadiq University, AlMuthanna, Iraq
Editors and Contributors
xxv
Mahabaleshwar Kabbur School of Computer Science and Applications, REVA University, Bengaluru, India Phaneendra Kanakamedala Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Vidhi Karnwal Vivekananda Institute of Professional Studies, Pitampura, New Delhi, India Amanpreet Kaur Department of CSE, Chandigarh University, Gharuan, India Navneet Kaur Department of Computer Science and Engineering, Chandigarh University, Gharuan, India Kavita Uttaranchal University, Dehradun, India A. Kavya Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Umar Farooq Khattak School of Information Technology, UNITAR International University, Petaling Jaya, Malaysia Aditya Naidu Kolluru Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, India N. Krishnan Centre for Information Technology and Engineering, Manonmaniam Sundaranar University, Tirunelveli, India S. Krithika Department of Computer Science and Engineering, Amrita School of Computing, Chennai, India Kaushiki Kriti Kalinga Bhubaneswar, India
Institute
of
Industrial
Technology
University,
Amit Kumar Department of Mechanical Engineering, National Institute of Technology, Patna, India Devender Kumar Department of Information Technology, Netaji Subhas University of Technology, New Delhi, India G. Dinesh Kumar Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India Ishu Kumar Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India Mukesh Kumar Department of CSE, National Institute of Technology Patna, Patna, India Narayan Kumar Department of Mechanical Engineering, Muzaffarpur Institute of Technology, Muzaffarpur, India Prabhat Kumar Computer Science and Engineering Department, National Institute of Technology Patna, Patna, India
xxvi
Editors and Contributors
R. Prasanna Kumar Department of Computer Science and Engineering, Amrita School of Computing, Chennai, India Vishal Kumar Department of CSE, Chandigarh University (Mohali), Chandigarh, India Sunita Kumari Department of Computer Science and Engineering, G. B. Pant DSEU Okhla-1 Campus, DSEU, New Delhi, India Goberu Likhitha Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Malvika Madan Kalinga Bhubaneswar, India
Institute
of
Industrial
Technology
University,
Mamta Madan VIPS, Delhi, India Vijaya Kumari Majji Department of Computer Science and Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India Saahil Mallick Kalinga Bhubaneswar, India
Institute
of
Industrial
Technology
University,
Vaibhav Malpani MANIT, Bhopal, India Pooja Manral Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India Shiv Ram Meena University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, New Delhi, India Karuna Middha Computer Science, School of Engineering and Science, GD Goenka University, Gurugram, Haryana, India Debahuti Mishra Department of Computer Science & Engineering, Siksha ‘O’ Anusandhan (Deemed to Be) University, Bhubaneswar, Odisha, India Manoj Kumar Mishra Department of Computer Science and Engineering, United College of Engineering and Research, Naini, Prayagraj, India Sushruta Mishra Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India Vikas Mishra Department of Computer Applications, United Institute of Management, Naini, Prayagraj, India Apeksha Mittal Computer Science, School of Engineering and Science, GD Goenka University, Gurugram, Haryana, India Vamsi Krishna Modala Lakireddy Mylavaram, Andhra Pradesh, India
Bali
Reddy
College
of
Engineering,
G. Bharathi Mohan Department of Computer Science and Engineering, Amrita School of Computing, Chennai, India
Editors and Contributors
xxvii
Salama A. Mostafa Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Batu Pahat, Johor, Malaysia Satuluri Naganjaneyulu Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Sreekant Nair Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, India Shravani Nalbalwar Department of Computer Engineering, Vishwakarma Institute of Technology Pune, Pune, India Arpit Namdev Department of IT, University Institute of Technology RGPV, Bhopal, MP, India S. Aasha Nandhini Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattangulathur, India Aritra Nandi Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneshwar, India Sai Pavithra Nandyala Department of Computer Science and Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India Poonam Negi Uttaranchal University, Dehradun, India Vani Nijhawan VIPS, Delhi, India Prafull Pandey Department of Computer Science and Engineering, United Institute of Technology, Naini, Prayagraj, India Prapti Patra Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India Chandra Prakash Department of Computer Science and Engineering, National Institute of Technology, Delhi, Delhi, India Deepak Prakash Department of Computer Engineering, Netaji Subhas University of Technology, New Delhi, India Pragya Pranjal Kalinga Bhubaneswar, India
Institute
of
Industrial
Technology
University,
Gudivada Sai Priya Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India A. R. Priyadharshini Department of Computer Science and Engineering, Amrita School of Computing, Chennai, India Eko Hari Rachmawanto Informatics Engineering, Computer Science Faculty, Dian Nuswantoro University, Semarang, Indonesia Chandra Shekhar Rai University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, New Delhi, India
xxviii
Editors and Contributors
Vinesh Raj Department of Computer Science, Rajagiri College of Social Sciences, Kalamassery, India Anjali Rajak Department of Information Technology, National Institute of Technology Raipur, Raipur, India Poonam Rani Department of Computer Engineering, Netaji Subhas University of Technology, New Delhi, India B. Narasimha Rao Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Mitanshi Rastogi CSE, K.R. Mangalam University, Gurugram, Haryana, India Saurabh Rastogi Maharaja Agrasen Institute of Technology, New Delhi, India Ginjupalli Raviteja Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Babita Rawat Uttaranchal University, Dehradun, India Poonam Rawat Uttaranchal University, Dehradun, India Ananapareddy V. N. Reddy Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Duggi Manikanta Reddy Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India M. Babu Reddy Department of Computer Science Engineering, Krishna University, Machilipatnam, Krishna, Andhra Pradesh, India P. Ashok Reddy Department of Computer Science and Engineering, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Jalluri Geetha Renuka Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India B. Rohith Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Mohammed Sadhak Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Pradip Kumar Sahu Department of Information Technology, Veer Surendra Sai University of Technology, Burla, Sambalpur, Odisha, India Sejal Sahu Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India M. Srinivasa Sesha Sai Department of Information Technology, KKR and KSR Institute of Technology and Sciences, Vinjanampadu, Guntur, Andhra Pradesh, India
Editors and Contributors
xxix
Anu Saini Department of Computer Science and Engineering, G. B. Pant DSEU Okhla-1 Campus, DSEU, New Delhi, India Christy Atika Sari Informatics Engineering, Computer Science Faculty, Dian Nuswantoro University, Semarang, Indonesia Sweana Vakkayil Seaban Department of Computer Science, Rajagiri College of Social Sciences, Kalamassery, India K. R. Seeja Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India Dimple Sethi Department of Information Technology, Indira Gandhi Delhi Technical University for Women, New Delhi, Delhi, India Shubham Shah Department of Computer Engineering, Vishwakarma Institute of Technology Pune, Pune, India Shaik Shaheeda Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Mohammad Shahid Department of Commerce, Aligarh Muslim University, Aligarh, India Himanshu Sharma Computer Science and Engineering Department, National Institute of Technology Patna, Patna, India Kavita Sharma Computer Science and Engineering Department, Galgotias College of Engineering & Technology, Greater Noida, India Chitra Shashidhar Department of Commerce and Management, Seshadripuram College, Bengaluru, India; Federal Institute of Ceará, Fortaleza, Brazil J L Shreya Department of Computer Science and Engineering, Dr. SPM International Institute of Information Technology, Naya Raipur, India Apratim Shrivastav Delhi Technological University, Delhi, India Saurabh Shrivastava MANIT, Bhopal, India Shubham Department of Information Technology, Netaji Subhas University of Technology, New Delhi, India Sanyam Shukla MANIT, Bhopal, India C. Sindhu Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India Prasanjeet Singh Computer Science and Engineering Department, National Institute of Technology Patna, Patna, India Satyavir Singh Department of Electrical and Electronics Engineering, SRM University, AP, Andhra Pradesh, India
xxx
Editors and Contributors
Shweta Singhal Indira Gandhi Delhi Technical University, New Delhi, India Shashwat Sinha Kalinga Institute Of Industrial Technology, Deemed to be University, Bhubaneswar, India M. Sivarathinabala Department of ECE, Velammal Institute of Technology, Chennai, Tamil Nadu, India Chereddy Spandana Department of Computer Science and Engineering, Amrita School of Computing, Chennai, India Ippatapu Venkata Srisurya Department of Computer Science and Engineering, Amrita School of Computing, Chennai, India Neeraj Kumar Srivastava Department of Computer Science and Engineering, United Institute of Technology, Naini, Prayagraj, India Nupur Sudhakar Department of Computer Engineering, Netaji Subhas University of Technology, New Delhi, India Shashwath Suvarna Department of Computing Technologies, SRM Institute of Science and Technology, Chennai, India Tridiv Swain Kalinga Institute of Industrial Technology, Deemed To Be University, Bhubaneswar, India Shipra Swati Department of CSE, National Institute of Technology Patna, Patna, India Suryansh Bhaskar Talukdar School of Computer Science and Engineering, VIT Bhopal, Bhopal, MP, India Veera Talukdar Kaziranga University, Jorhat, Assam, India Abhishek Tandon Department of Operational Research, University of Delhi, Delhi, India Arjun Thakur Department of Computer Engineering, Vishwakarma Institute of Technology Pune, Pune, India Randeep Thind Department of Computer Science and Engineering, Chandigarh University, Gharuan, India Rakesh Tripathi Department of Information Technology, National Institute of Technology Raipur, Raipur, India Vaibhav Uniyal Uttaranchal University, Dehradun, India M. Ahmer Usmani Department of CSE, Department of Engineering and Technology, Bharati Vidyapeeth Deemed to be University, Navi Mumbai, India G. Vadivu Department of Data Science and Business Systems, SRM Institute of Science and Technology, Kattankulathur, India
Editors and Contributors
xxxi
Lakshmi Sai Srikar Vadlamani Delhi Technological University, Delhi, India Roopchand Reddy Vanga Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, India Harsh Vardhan Pant School of Computing, Graphic Era Hill University, Bhimtal Campus, Uttarakhand, India; Department of Computer Science, Amrapali Group of Institute, Haldwani, India Vishisht Ved Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India Polani Veenadhari Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Bhavana Vennam Department of Computer Science and Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India Abhigya Verma Indira Gandhi Delhi Technical University, New Delhi, India Sahil Verma Uttaranchal University, Dehradun, India; Computer Science and Engineering Department, National Institute of Technology Patna, Patna, India Swayam Verma Kalinga Institute Of Industrial Technology, Deemed to be University, Bhubaneswar, India Sonakshi Vij Vivekananda Institute of Professional Studies, Pitampura, New Delhi, India Meenu Vijarania Centre of Excellence, CSE, K.R. Mangalam University, Gurugram, Haryana, India M. Vinayaka Murthy School of Computer Science and Applications, REVA University, Bengaluru, India Deepali Virmani Vivekananda Institute of Professional Studies, Pitampura, New Delhi, India Shivam Yadav Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneshwar, India Naeem Th. Yousir College of Information Engineering, Al-Nahrain University, Baghdad, Iraq
Joint Identification and Clustering Using Deep Learning Techniques Dimple Sethi, Chandra Prakash, and Sourabh Bharti
Abstract An effective method for assessing the function of the human limbs is to quantify the angular joint kinematics of the body. Motion-capturing techniques that monitor markers placed on anatomical landmarks are typically used to derive joint angles. The administrative cost, soft tissue artifacts, and intra- and inter-tester variability are disadvantages of this approach. Tracking rigid marker clusters attached to body parts and calibrated concerning anatomical landmarks or known joint angles constitutes an alternate technique. However, a thorough investigation of the accuracy and dependability of using the cluster approach on the body joints has not yet been conducted. We aim to compare three different pose estimation models (Blazepose, Keypoint RCNN, MMPose) by clustering the human joints into four clusters (upper left, upper right, lower left, lower right) using the Expectation Maximization Gaussian Mixture Model. The joint clusters obtained by the cluster models using three pose estimation models were comparable to and correlated to the anatomical model. However, they showed noticeable offsets and differences in sensitivity compared to those from the anatomical model. Overall, the joint clusters generated were accurate, and a cumulative accuracy of 75% was achieved using MMPose, 86% and 67% achieved using BlazePose and Keypoint RCNN, respectively, and they can be substituted for the outputs of anatomical models when calculating kinematic metrics. When evaluating trends in movement behavior, cluster models seem to be an appropriate and even preferable replacement for anatomical models.
D. Sethi (B) Department of Information Technology, Indira Gandhi Delhi Technical University for Women, Kashmere Gate, New Delhi, Delhi 110006, India e-mail: [email protected] C. Prakash Department of Computer Science and Engineering, National Institute of Technology, GT Karnal Road, Delhi, Delhi 110036, India e-mail: [email protected] S. Bharti Nimbus Research Centre, Munster Technological University, Rossa Avenue, Cork T12 P928, Ireland © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_1
1
2
D. Sethi et al.
Keywords Joint clustering · Pose estimation · Gaussian Mixture Model · Body landmarks
1 Introduction The clustering of biomechanical joints is a valuable and promising diagnostic approach for assessing joint diseases and disorders. A full understanding of joint anatomy and biomechanics is required to appropriately identify and treat such disorders [1]. Biomechanics is defined as “the study of the movement of biological things using mechanics” [2], which is characterized as “a branch of physics involved with the characterization of motion and how forces cause motion” [3]. Biomechanics has previously proven to be a valuable technique in orthopedic surgery [4]. The orthopedic sports biomechanics framework [5] established biomechanics’ threefold function in (1) preventing injuries, (2) immediate treatment evaluation, and (3) longterm prognosis evaluation. Labbe et al. [6] proposed expanding this paradigm to include two additional roles: (1) evaluating the effects of an injury on knee joint function and (2) aiding in diagnosis. Upper-body and lower-body kinematics quantified using joint kinematics have been proven to be effective for measuring movement deficits in various groups and evaluating treatment benefits [7]. Impaired upper-body and lower-body mobility can have a substantial influence on one’s ability to perform routine activities. Anatomical marker groups with singleton markers affixed to bony, anatomical landmarks are most typically used in kinematic models to create local segment coordinate systems based on anatomical body axes and to compute intersegmental joint angles [8]. Model correctness is dependent on the accurate assessment of landmark positions by palpation, which takes time and skill and exposes inaccuracies owing to inter and intra-variability [9]. Soft tissue artifacts are inaccuracies introduced by the movement of the soft tissue overlaying bone landmarks [10]. Furthermore, single anatomical markers might be obscured when performing functional activities, and traditional anatomical markers may be hard or impossible to recognize in specific clinical populations, such as prosthesis users [11]. The other kinematic model includes stiff clusters of markers attached to distinct body segments and uses a calibration technique to match the positions of anatomic structures to these marker clusters [12]. Only the clusters must be monitored during functional task trials, as the calibration permits virtual predictions of anatomic structures and joint centers required for computing joint angles. This method may decrease errors related to marker occlusion, but it still needs anatomical structure identification during calibration. We propose to employ the Expectation Maximization Gaussian Mixture Model to deploy three pose estimation models (Blaze-pose, Keypoint RCNN, and MMPose) for clustering human joints into four groups (upper left, upper right, lower left, lower right). The proposed work identifies the human zone using the Detector from the input sequence. It has a high detection accuracy and a fast response time. The joint
Joint Identification and Clustering Using Deep Learning Techniques
3
clusters produced by the suggested cluster models are equivalent to and connected with the anatomical model.
1.1 Contributions • We propose to deploy three pose estimation models (Blaze-pose, Keypoint RCNN, and MMPose) for the identification of human joints into four groups (upper left, upper right, lower left, lower right) under three different day-routine activities. • Estimation of parameters and generation of labeled videos. The labeled video, body landmarks, and parameters are passed as input to the joint clustering module. • Estimation of joint clusters(upper right, upper left, lower right, and lower left) in three different positions, specifically in walking, is done using Expectation Maximization Gaussian Mixture Models.
2 Methods Three marker-less pose estimation models—Blazepose, Keypoint RCNN, and MMPose—were deployed to estimate the body landmarks. Pose estimation is a computer vision approach that assesses a person or an object’s motions by identifying keypoints (skeletal joints of humans or corners of rigid structures). We refer to this approach as “marker-less analysis” since it does not entail physical hardware on the subject’s body. Models for posture estimates can be fed data from either a static picture or a video. Our study focuses on the lateral video input of a walking person to determine human position. Additionally, we analyzed the individual’s standing and sitting stances. The keypoint locations are located in a 2D space. The exemplary results are chosen using three models: BlazePose, Keypoint RCNN, and MMPose. Firstly, the Google posture detection model called BlazePose (Whole Physique) is deployed as it can calculate the (x, y, z) coordinates of 33 skeleton keypoints. Secondly, the MS-COCO (Common Objects in Context) dataset, which comprises annotation types for segmentation, object detection, and image captioning, is employed to train the Keypoint RCNN. Thirdly, MMPose is deployed, and HRNet is the structural foundation of MMPose. A general-purpose convolutional neural network called High-Resolution Net (HRNet) is utilized for tasks including semantic segmentation, object detection, and image categorization. High-resolution representations are sustained throughout the operation. The joint grouping is achieved by using Expectation Maximization clustering using the Gaussian Mixture Model (EM-GMM). Figure 1 shows the detailed flowchart. A detailed discussion of all the approaches is as follows.
4
D. Sethi et al.
Fig. 1 Detailed flowchart for the proposed methodology. The top center shows the input: human walking RGB video (gait video), the left block shows the estimation of body landmark using pose estimation techniques, estimation of parameters and generation of labeled videos, the right block shows the estimation of joint clusters, specifically in walking, using Expectation Maximization Gaussian Mixture Models
2.1 BlazePose BlazePose is a posture estimate model with great fidelity. It features a lightweight architecture and employs a convolutional neural network (CNN). It is designed on top of Mediapipe, a media processing framework. The Blazepose model is made up of two machine learning models: the Detector and the Estimator [13]. The architecture of Detector and Estimator: The human region is identified by the Detector from the input sequence. The Detector’s architecture is built on Single-Shot Detectors (SSD). SSD detects numerous objects by utilizing a single frame image. It has a high detecting accuracy and a rapid speed. The detector generates a bounding box (1, 2254, 12) that encloses the object and a confidence score (1, 224, 224, 3) that represents the probability of the bounding box containing the object. The 12 components of the bounding box are of the type (x, y, w, h, kp1x, kp1y, kp4x, kp4y), with kp1x to kp4y being keypoints. Each of the 2254 pieces requires an anchor, offset and anchor scale. A convolution layer is applied to transform the input image in order to extract attributes from it. The stacking of convolutional layers generates cascade segmentation of the input. Following the convolutional layer, a layer mask
Joint Identification and Clustering Using Deep Learning Techniques
5
known as the pooling layer is deployed. The size of the feature maps are reduced by employing pooling layers. As a result, both the number of parameters to learn and the quantity of processing in the network are lowered. The pooling layer articulates the characteristics existing in a section of the classifier created by a convolution layer. The Estimator accepts a 256 ∗ 256 resolution human area image as input and outputs the keypoints. For training, the Estimator use a heatmap. Colors are employed in heat maps to depict information in two dimensions. The landmarks consist of 165 components representing the 33 critical spots indicated by the BlazePose model as (x, y, z, visibility, presence). The value of z is exactly proportional to the hip orientation of the person. When the value of z is positive, the keypoints are assigned behind the hips; when the value is negative, the keypoints are assigned between the camera and the hips.
2.2 Keypoint RCNN Pytorch’s Keypoint RCNN is a pre-trained pose estimation model. It is a modified version of Mask R-CNN, a cutting-edge technique for segmentation. Keypoint RCNN modifies the present Mask R-CNN model considerably by encoding a keypoint (rather than the whole mask) of the detected item in a single pass. The Keypoint RCNN is being trained using the MS-COCO (Common Objects in Context) dataset, which has numerous annotation classes for Object Detection, Segmentation, and Image Captioning. The original Keypoint RCNN with ResNet uses features acquired from the last convolutional layer of the fourth stage, which we refer to as C4. Our model employs ResNet C4 with FPN [14]. The depth of ResNet and ResNeXt networks is typically 50 or 101 layers. ResNet has developed a “identity shortcut link” that traverses one or more convolutional neural network layers. ResNet’s 34-layer basic network design is based on the VGG19 and offers shortcut or skip connections. The core of residual blocks is formed by skip connections. The difference in outcomes is due to the residual blocks. If the skip connections are not present, input ‘X is multiplied by the layer weights, and a bias term is added. The activation function, f (), is then used, and the result is H (x). H (x) = f (wx + b) or H (x) = f (x)
(1)
Addition of skip connections changes the output to H (x) = f (x) + x
(2)
Our methodology generates RoI suggestions by utilizing the region proposal network. To identify them from their surrounds, these are depicted as dotted line boxes around the items. The RoIs with the greatest confidence scores are finalized by enclosing them in solid line boxes. Mask R-CNN forms a segmentation mask around the topics to be analyzed, in this case, human figures. The keypoints are encoded using
6
D. Sethi et al.
R-CNN to identify and recognize 17 human body keypoints. Confidence scores are calculated for each identified keypoint independently, as well as for the subject as a whole.
2.3 MMPose The backbone of our MMPose model is HRNet V1-W48. In this case, “48” stands for the width (C) of the final three stages of the high-resolution sub-networks [15]. The widths of the other three parallel sub-networks for HRNet W48 are 96, 192, and 384. The main structure is made up of four stages and four parallel convolution streams. They have resolutions of 1/4, 1/8, 1/16, and 1/32. The first stage consists of four residual units, each of which is made up of a 64-width bottleneck. The width of the feature maps is then changed to C using a 3 × 3 convolution. In the second, third, and fourth levels, the modularized blocks are numbered 2, 3, and 4, respectively. Each branch of the modularized block’s multi-resolution parallel convolution has four residual units. There are two 3 × 3 convolutions per unit for each resolution, with batch normalization and the nonlinear activation ReLU arriving after each convolution. The widths (numbers of channels) of the convolutions for the four resolutions are C, 2C, 4C, and 8C, respectively. The high-resolution representations obtained with HRNet are not only properly spatialized but also semantically robust. For this, there are two sources. First, our methodology connects high-to-low resolution convolution streams concurrently rather than sequentially. Second, the bulk of existing fusion techniques likewise blends high-resolution low-level representations with low-level high-resolution representations that have been upsampled. Instead, to enhance both the low- and highresolution representations, we continually mix increasing variants of resolution. Figures 2 and 3 show the stick image generated from the Blazepose MMPose model and Keypoint RCNN.
3 Clustering of Joints The keypoints for the three models were inferred from the network architecture and procedures presented above section. The joint grouping is accomplished through the use of Expectation Maximization clustering with the Gaussian Mixture Model (EM-GMM) [16]. Upper left, Upper right, Lower left, Lower right are the four target joint clusters. Let the parameters affecting the estimation be β. The membership probabilities can be derived as (t) Mb,a =
Mb(t) f (xa ; ρb(t) , ψb(t) ) M (t) f (xa ; ρ (t) , ψ (t) ) + M2(t) f (xa ; ρ2(t) , ψ2(t) )
(3)
Joint Identification and Clustering Using Deep Learning Techniques
7
Fig. 2 a Stick image generated by Blazepose model b and MMPose
Fig. 3 Stick image generated by Keypoint RCNN
The Q function for Estimation step can be formulated as Q(ββ (t) ) = E X R|x
αβ
(t)
[log L(β; x, R)]
(4)
This can be derived as Q(ββ (t) ) =
2 n a=1 b=1
(t) Mb,a [log Mb −
1 log |ψb | 2
d 1 − (xa − δb )T ψb−1 (xa − δb ) − log(2π )] 2 2
(5)
8
D. Sethi et al.
Table 1 Comparative analysis of pose estimation models under different cluster formations and different poses Cluster Blaze Pose Keypoint RCNN MMPOSE POSE Standing
Sitting
Walking
Upper right Upper left Lower right Lower left Upper right Upper left Lower right Lower left Upper right Upper left Lower right Lower left
0.83 0.83 0.6 0.23 1 1 0.6 0.6 0.68 1 0.8 0.6
1 0.67 0.67 0.33 1 1 0.33 0.33 0.33 0.67 0.67 0.67
1 1 0.33 0.33 1 1 0.3 0.33 1 0.67 0.67 0.67
Maximum score achieved is shown in bold
where the value of Q(ββ (t) ) is maximized in the maximization step in which T equals to 1 which is the sum of T1 and T2 , respectively. The next estimates of (δ1(t+1) , ψ1(t+1) ) can be given as δ1(t+1)
n a=1
= n
(t) T1,a xa
a=1
ψ1(t+1)
n =
a=1
(t) T1,a
(t) T1,a (xa − δ1t+1 )(xa − δ1t+1 )T n (t) a=1 T1,a
(6)
(7)
The aforementioned clustering approach is applied to the three discussed pose estimation models, and the outcomes are presented in the results section.
4 Results The clustering results are discussed in this section; the four clusters are generated based on the keypoints detected from the three models under three different human poses (standing, sitting, and walking). Figure 4 show the cluster groups based on the keypoint detection inferred using Blazepose, Keypoint RCNN, and MMPose, respectively.
Joint Identification and Clustering Using Deep Learning Techniques
9
Fig. 4 Pictorial representation of the cluster groups based on the keypoints inferred from the Blazepose model under three different poses, a standing, b sitting, and c walking using the Expectation Maximization Gaussian Mixture Model Table 2 Cumulative accuracy of walking pose frame Model Cumulative accuracy MMPose Blazepose Keypoint RCNN
0.75 0.86 0.67
Visual clustering of the joints indicates that the results were comparable to one other and the Anatomical model. Cross-correlation results (Table 1) showed that most clusters, except lower-left clusters, give good results. Upper-left and Upper-right clusters in the sitting pose have generated good results. Figure 4 shows the pictorial representation of clusters. Table 2 shows the cumulative accuracy of walking pose frames.
10
D. Sethi et al.
4.1 Limitations of the Proposed Work A two-dimensional pose estimation model is deployed in this study for the identification of joints and then clustering is done using EM-GMM. However, threedimensional pose estimation techniques can be deployed to improve accuracy and generate more precise groups. As with three-dimensional data, one can explore more advanced clustering techniques, and the efficiency of clustering can be improved.
5 Conclusion and Future Scope Multiple pose estimation models are discussed in this paper and are deployed to the same dataset. After combining the accuracy results obtained for the 4 clusters (upper right, upper left, lower right, and lower left) in the walking pose frame image of the 3 models, it can be derived that the Blazepose Model was the most efficient in keypoint identification as it has yielded the maximum accuracy. Also, if accuracy, evaluation of common movement patterns, minimizing marker occlusion, and ease of use are priorities, clustering models might be a good substitute for anatomical models for examining body kinematics. This is particularly relevant in light of the possibility that discrepancies between the cluster models and the anatomical model may result from calibration errors or errors involving marker placement and skin mobility. Cluster models are a viable alternative to anatomical models when the objective is to accurately assess movement behavior trends with minimal marker occlusion. In the future, our two-dimensional pose estimation models can also be developed into three-dimensional models by making suitable amends. Posture estimation has wide scope in the near future. It can extensively be used in healthcare departments to closely follow up on the recovery of a patient by observing changes in their locomotive patterns. In sports training, pose estimation can be used to analyze the athletic form of the athlete and make suitable changes in their training regime. This technique can be extensively used to determine various yoga poses. By successfully tracking the location of an individual, this technique can contribute significantly to the gaming and augmented reality industry as well.
References 1. Tanaka E, Koolstra J (2008) Biomechanics of the temporomandibular joint. J Dent Res 87(11):989–991 2. Hogan N (1985) The mechanics of multi-joint posture and movement control. Biol Cybern 52(5):315–331 3. Lubashevsky I (2017) Modeling of human behavior as individual branch of physics and mathematics. In: Physics of the human mind. Springer, pp 1–42 4. Papandrea R, Seitz WH Jr, Shapiro P, Borden B (1995) Biomechanical and clinical evaluation of the epitenon-first technique of flexor tendon repair. J Hand Surg 20(2):261–266
Joint Identification and Clustering Using Deep Learning Techniques
11
5. Major MJ, Stine RL, Heckathorne CW, Fatone S, Gard SA (2014) Comparison of range-ofmotion and variability in upper body movements between transradial prosthesis users and able-bodied controls when executing goal-oriented tasks. J Neuroeng Rehabil 11(1):1–10 6. Alt Murphy M, Häger CK (2015) Kinematic analysis of the upper extremity after stroke—how far have we reached and what have we grasped? Phys Ther Rev 20(3):137–155 7. Nelson-Wong E, Howarth S, Winter DA, Callaghan JP (2009) Application of autocorrelation and cross-correlation analyses in human movement and rehabilitation research. J Orthop Sports Phys Ther 39(4):287–295 8. Slavens, B.A., Harris, G.F.: The biomechanics of upper extremity kinematic and kinetic modeling: applications to rehabilitation engineering. Critical Reviews™in Biomedical Engineering 36(2-3) (2008) 9. Leardini A, Sawacha Z, Paolini G, Ingrosso S, Nativo R, Benedetti MG (2007) A new anatomically based protocol for gait analysis in children. Gait Posture 26(4):560–571 10. Gao B, Zheng NN (2008) Investigation of soft tissue movement during level walking: translations and rotations of skin markers. J Biomech 41(15):3189–3195 11. Sethi D, Bharti S, Prakash C (2022) A comprehensive survey on gait analysis: history, parameters, approaches, pose estimation, and future work. Artif Intell Med 102314 12. Liao R, Yu S, An W, Huang Y (2020) A model-based gait recognition method with body pose and human prior knowledge. Pattern Recogn 98:107069 13. Singh D, Panthri S, Venkateshwari P (2022) Human body parts measurement using human pose estimation. In: 2022 9th international conference on computing for sustainable global development (INDIACom). IEEE, pp 288–292 14. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multiperson pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112 15. Huang J, Zhu Z, Huang G (2019) Multi-stage HRNet: multiple stage high-resolution network for human pose estimation. arXiv preprint arXiv:1910.05901 16. Yang M-S, Lai C-Y, Lin C-Y (2012) A robust EM clustering algorithm for Gaussian mixture models. Pattern Recogn 45(11):3950–3961
Comparative Analysis of Deep Learning with Different Optimization Techniques for Type 2 Diabetes Mellitus Detection Using Gene Expression Data Karuna Middha and Apeksha Mittal
Abstract Type 2 diabetes mellitus (T2DM) is one of the oldest diseases caused in humans, which is chronic metabolic disorder prevailed all over the world and still increasing. This disease is distinguished as two categories commonly Type 1 and Type 2, found in year 1936. Here, Type 2 was considered as syndrome component and dangerous than Type 1 and identifying T2DM in early stages in necessary. T2DM identification using gene expression analysis that depend on microarray technology is high throughput technique and powerful research methodology. In this work, hybrid deep learning (DL) is considered for comparative analysis with various optimization algorithms for analyzing T2DM. Initially, diabetes data is collected, and this data is transformed using Yeo-Johnson (YJ). Here, feature selection from transformed data is carried out by Jaya-Dingo optimization algorithm (Jaya-DOA) that is integration of Jaya optimizer and DOA. From these features selected, data augmentation is carried out by various optimization algorithms that train hybrid DL which is compared. Here, hybrid DL includes rider neural network (RideNN) and deep residual network (DRN), whereas hybrid DL-enabled optimization algorithms include particle swarm optimization (PSO), competitive swarm optimization (CSO), Jaya, competitive multiverse optimization (CMVO), rider optimization algorithm (ROA), competitive multiverse rider optimization (CMVRO), and Jaya-CMVRO. Here, Jaya-CMVRO is combination of Jaya along CMVO and ROA. Finally, output is detected for best method to analyze T2DM using gene expression data. As a result, it is found that Jaya_CMVRO trained by hybrid DL attained high rates of accuracy of 95.4%, specificity of 94.6%, and sensitivity of 94.7%, correspondingly. Keywords Competitive multiverse optimization · Competitive swarm optimization · Particle swarm optimization · Rider optimization algorithm · Deep residual network K. Middha (B) · A. Mittal Computer Science, School of Engineering and Science, GD Goenka University, Gurugram, Haryana, India e-mail: [email protected] A. Mittal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_2
13
14
K. Middha and A. Mittal
Abbreviations T2DM T1DM DL YJ RideNN DRN DOA Jaya-DOA PSO CSO CMVO ROA MVO CMVRO Jaya_CMVRO CHD GEO DEGs
Type 2 diabetes mellitus Type 1 diabetes mellitus Deep learning Yeo-Johnson Rider neural network Deep residual network Dingo optimization algorithm Jaya-Dingo optimization algorithm Particle swarm optimization Competitive swarm optimization Competitive multiverse optimization Rider optimization algorithm Multiverse optimizer algorithm Competitive multiverse rider optimization Jaya competitive multiverse optimization Coronary heart disease Gene expression omnibus Differentially expressed genes
1 Introduction T2DM is characterized using non-regulation of metabolism process in carbohydrate, protein, and lipid. This further resulted in impaired secretion of insulin, insulin resistance, or combination of both categories. Of these three main categories of diabetes, T2DM is more common that account 90% of all above cases than Type 1 DM (T1DM) and gestational diabetes. For past decades, progression and development of T2DM are evolved rapidly causing progressively secretion of impaired insulin from β-cells of pancreases, usually upon background of pre-existing resistance of insulin in adipose tissue, skeletal muscle, and liver [1]. T2DM results from interaction among behavioral, genetic, and environmental risk factors. People living with T2DM are more vulnerable to many forms of both long- and short-term complications that often lead to premature death. This increased tendency of mortality and morbidity which is noticed in patients with T2DM because of late recognition, commonness of T2DM, its insidious onset, and poor resources. Diabetes mellitus is disorder that is found by delayed development of neuropathic and vascular complications and hyperglycemia. Regardless of its issue, this disease is associated with common hormonal defect, such as insulin deficiency, which cause relative or absolute in context of coexisting resistance of insulin. This insufficient insulin effect plays important role in metabolic derangements connected to diabetes. Furthermore, hyperglycemia plays vital role in disease related problems.
Comparative Analysis of Deep Learning with Different Optimization …
15
Metabolic syndrome “X” is characterized using atherogenic dyslipidemia, insulin resistance, high blood pressure, without or with glucose intolerance, proinflammatory state, prothrombin state, and abdominal obesity. Coronary heart disease (CHD) and T2DM are considered to have metabolic syndrome “X”. Although genetics play important role in higher prevalence of metabolic syndrome “X”, study of T2DM without and with family history for knowing role of genetics in the pathobiology is necessary. World Health Organization (WHO) projected that diabetes is 7th leading disease that cause death. Moreover, developing countries account for diabetic patients of 77.6% overall. T2DM is characterized by hyperglycemia in case of impaired insulin secretion and insulin resistance. This is also a multigene heterogeneous disease that forms the result of interaction of environmental factors and genetic factors. Although many genetic factors play vital role in development and occurrence of T2DM, its exact elaboration mechanism is based on identification of susceptibility genes for T2DM. Presently, many researches on gene for T2DM mainly utilize gene chip technology for detecting and analyzing samples of clinical patient or model animals. In many cases, gene expression omnibus (GEO) database helps in identifying T2DM-related differentially expressed genes (DEGs) among normal samples and T2DM samples. The main contribution of this research is given by • Comparison of optimization-enabled hybrid DL for T2DM detection by gene expression data. Here, DRN and RideNN are utilized DL’s trained by various algorithms. Moreover, PSO, CSO, ROA, CMVO, and Jaya are algorithms, along CMVRO and Jaya-CMVRO are hybridized algorithms used for training purpose. These methods are compared to found the best method for T2DM detection. Here, transformation of data is done using YJ, and further, feature is selected by Jaya-DOA. Finally, data augmentation process is done from selected features by various methods and is compared for better results. Remaining structure involves following sections: Sect. 2 describes literature survey. Section 3 describes overall process of this method along description of various hybrid DL based optimization algorithms, and Sect. 4 explains results and discussions along comparison. At last, conclusion is given in Sect. 5.
2 Literature Survey Rao [2] devised a potent optimization algorithm, named Jaya optimization algorithm, in which the solution found for a given problem should progress in the direction of the best solution and stay away from the worst solution. There are no algorithm-specific control parameters needed for this method; only the standard control parameters are needed. Bairwa et al. [3] developed the dingo optimization algorithm. The main idea is to develop this strategy using dingoes’ cooperative and social nature. The proposed method is based on the exploration, encirclement, and exploitation modes of dingo
16
K. Middha and A. Mittal
hunting. To evaluate the effectiveness of the suggested method, all of the aforementioned prey hunting processes are mathematically represented and implemented in the simulator. Binu et al. [4] developed the rider optimization algorithm which is based on the group of riders racing toward a goal destination. Additionally, the suggested technique serves as the neural network’s training process, resulting in the development of a classifier known as RideNN (NN). Wang et al. [5] developed a stochastic optimization method built on swarms which is the particle swarm optimization (PSO) algorithm. PSO algorithm models social behavior in a variety of animals, including insects, herds, birds, and fish. These swarms follow a cooperative method of locating food, and each member of the swarms continuously modifies the search pattern in response to its own and other members’ learning experiences. It has good robustness and, with a little modification, may be used in a variety of application contexts. It has great distributed capabilities since it is simple to implement parallel processing because the algorithm is essentially the swarm evolutionary algorithm. It has a quick convergence rate to the optimal value. It can be easily combined with other algorithms to enhance performance. Cheng et al. [6] for large-scale optimization, a new competitive swarm optimizer (CSO) is suggested. Though the approach is fundamentally inspired by the particle swarm optimization. The suggested CSO does not include updating the particles based on either the global best position (or neighborhood best positions) or the personal best position of each particle. When it comes to large-scale optimization challenges, CSO has proven to outperform a number of state-of-the-art metaheuristics designed for large-scale optimization. Instead, a pairwise competition mechanism is established, where the losing particle will alter its position by gaining knowledge from the winning particle. Benmessahel et al. [7] devised the purpose of resolving global optimization issues, and a brand-new population-based optimization method dubbed the competitive multiverse optimizer (CMVO) which is developed. Although it uses a different framework, this unique approach is primarily based on the multiverse optimizer algorithm (MVO). The fundamental concept is to introduce a pairwise rivalry mechanism between universes and use a cutting-edge updating technique that forces universes to take lessons from the victor. In contrast to MVO, where all universes learn from the optimal universe, the updating mechanism in CMVO uses a bicompetitive system at each generation, allowing the world that loses the competition to learn from the winning universe. The primary goal of this endeavor is to increase the rate of search space exploration.
3 Hybrid DL-Based Various Optimization Algorithms Globally, T2DM is considered as dangerous disease. This is widespread diabetes case and complex disease that acts as complex interplay between environmental, genetic factors, and lifestyle. In this paper, comparison is main intention to analyze and justify the effectiveness of Jaya_CMVRO + hybrid DL for detecting T2DM. Primary step is acquisition of input diabetes data from dataset [8], and this acquired data is fed as input to data transformation module, where YJ [9] is employed to transform data
Comparative Analysis of Deep Learning with Different Optimization …
17
Transformation of data Yeo-Johnson
Input diabetes data
Feature selection Dingo Optimization Algorithm(DOA)
Jaya-DOA
Jaya optimizer
Data augmentation
T2DM detection CMVO+hybrid DL
PSO+hybrid DL
CSO+hybrid DL
CMVROhybrid DL
Jaya_CMVROhybrid DL
Jaya+hybrid DL
ROA+hybrid DL
Detected output
Fig. 1 Block diagram of detecting T2DM using various optimizations-enabled hybrid DL
as desired form. After that, feature selection is carried out using Jaya-DOA, which is integration of Jaya [2] and DOA [3]. Then, data augmentation is done to enlarge dimensionality of data by hybrid DL, such as RideNN [4] and DRN [10]. Moreover, this hybrid DL is trained by Jaya-CMVRO, PSO [5], CSO [6], Jaya [2], ROA [4], CMVO [7], and CMVRO. Figure 1 shows block diagram of detecting T2DM using various optimizations-enabled hybrid DL.
3.1 Data Acquisition Data acquisition is first stage of the process, where dataset A [8] is considered that consists a count of attributes, and this is expressed as } { Bb×c = Cd,e ; (1 ≤ d ≤ g); (1 ≤ e ≤ f )
(1)
where Cd,e is dth record of data in eth attribute, g is total data points, and f indicates total attributes of each data point. Here, acquired data Cd,e is allowed for data transformation stage.
18
K. Middha and A. Mittal
3.2 Data Transformation by YJ Pre-processing transforms data into desired format, and this help in avoiding misrepresentations and also recover quality features for efficiency in processing. In this process, input data Cd,e is fed to YJ transformation that forms main step in detecting T2DM. In this work, pre-processing is carried out by YJ transformation [1, 9]. In common, data transformation is mathematical function-enabled operation that is processed with data. Moreover, YJ transformation is familiar in adapting suppression of enormous data. Furthermore, this technique transforms original data into necessary format and provides capable outcome by improvising input. Mathematical expression of YJ transformation is given as ] [ Db×c = Y Cd,e
(2)
where Y [·] is YJ transformation matrix and resultant outcome is depicted by term Db×c .
3.3 Selection of Features by Jaya-DOA After data transformation process, the transformed data Db×c is allowed for feature selection phase. This helps in selecting significant features to attain efficient T2DM detection, which is carried out by Jaya-DOA with more accurate results. This step also eradicates redundant features. Here, Jaya-DOA is framed by hybridizing Jaya [2] with DOA [3]. DOA [3] is bio-inspired optimization technique that copies hunting techniques of dingoes. Here, three main strategies are followed to attain best solution, such as attacking by persecution, grouping methods, and scavenging character. Moreover, DOA considers dingoes survival probability for attaining best solution and benefits in convergence rate. Jaya [2] is populace dependent metaheuristic method that combines features of both swarm enabled intelligence and evolutionary algorithms. Moreover, it is generated by changing populace of specific solutions. Jaya has no specific control parameters but has dual common control constraints, such as largest iteration count and population size. The combination of both Jaya and DOA is very efficient to solve both maximization and minimization issues along finding best solution in fast manner is possible. Update equation of Jaya_DOA is represented as followed, m(l + 1) =
] [ 1 m p (w)[1 − R. Q](1 − k1 + k2 ) + R(k2 m t (w) − k1 m s (w)) 1 − k1 + k2 − R
(3)
where m(l + 1) is solution at iteration l + 1, random numbers are k1 , k2 among 0 and 1, m s (w) is position of best solution at iteration w, m t (w) is worst solution position at iteration w, and Q and R are vectors, Q = (1, 0) and R = (1, 1). Extracted features are further indicated by term Fe that is allowed for data augmentation.
Comparative Analysis of Deep Learning with Different Optimization …
19
3.4 Data Augmentation After features Fe are selected, they are allowed for data augmentation stage. This is done for investigating selected features and also improves quality of features. Data augmentation is developed as regularizer and aims in diminishing overfitting issues when learning model. Moreover, data augmentation is completed by below mentioned formats. • • • • •
Scaling factor g is selected. Split features based to class label. Find entropy E of feature. For each feature, generate h value among [0, i]. Find new sample G = actual data + ( j/d).
Hence, feature augmented is indicated as M that is further allowed for T2DM detection process.
3.5 T2DM Detection Using Hybrid DL Augmented data M is then allowed forT2DM detection using hybrid DL, where hybrid DL includes two DL models, like RideNN [4] and DRN [10]. This hybrid DL is trained by various optimization algorithms. These two DLs are fused by Tversky index. Here, augmented data M is allowed to DRN and RideNN separately and forms outputs O1 and O2 . Also, training process is done by hybridized Jaya_CMVRO. Then, output of RideNN and DRN is fused by Tversky index for generating hybrid DL to get final detected result O. In addition, the training algorithm of Jaya_CMVRO is combination of CMVRO and Jaya algorithm. Figure 2 represents hybrid DL structure for detection process of T2DM. The structure of RideNN [4] and DRN [10] is exhibited as below.
Augmented data
O1 RideNN
M
Final output O DRN
O2 Fig. 2 Hybrid DL structure for detection process of T2DM
20
3.5.1
K. Middha and A. Mittal
Architecture of RideNN
RideNN [4] classifier is utilized to discover T2DM. It enumerates functioning of human’s brain to recognize relationship among huge data. This is utilized in various applications from marketing research till detecting fraud. It also helps for detecting issues easily and solves common problems. It moreover assists in recording conversation more accurately. Here, RideNN consists of three layers, such as input, output, and hidden layer that consider neurons in every layer.
3.5.2
Architecture of DRN
DRN [10] is employed to generate decision that concern with T2DM detection, which is more accurate. This is utilized to analyze visual imagery and also helps in solving difficult tasks in computer vision and brings out with own group of issues. When more layers are added up with this architecture of DRN, this assists in addressing exploding and vanishing nature of gradients. Also, it poses ability for handling sophisticated DL tasks. DRN comprises many layers, such as convolutional (conv) layers, average pooling layers, linear classifier, and residual blocks.
3.5.3
Training of Hybrid DL Using Different Optimization
Hybrid DL that comprises RideNN [4] and DRN [10] is used for T2DM detection, which is trained by various optimization algorithms. Here, optimization algorithms used for training hybrid DL are PSO [5], CSO [6], Jaya [2], ROA [4], CMVO [7], CMVRO, and Jaya_CMVRO. Moreover, CMVRO is generated by hybridizing CMVO and ROA, and also Jaya_CMVRO is combination of Jaya and CMVRO. Explanation of optimization algorithm used for training hybrid DL is given below.
PSO PSO [5] is stochastic optimization algorithm that is based on swarm, motivated by group behavior of animals, such as birds or fishes. This PSO includes origin of animals with background details and includes individual as particle without volume and mass. Here, structure of topology, algorithm, and selection of parameters is included with proper position and velocity. In PSO, every individual is represented as Cartesian coordinate system, designed with initial position and velocity. Here, best position it ever reached and global best location is clearly identified for better optimization outcomes. Basic algorithmic equation of PSO is indicated below, Vx = Vx + 2 ∗ rand ∗ (pbest x − x) + 2 ∗ rand ∗ (gbest x − x)
(4)
Comparative Analysis of Deep Learning with Different Optimization …
21
where x, y indicate coordinate axis, velocity coordinate is Vx , rand is random number, best position it ever reached is pbest, and global best location is gbest.
CSO CSO [6] is optimization algorithm that is inspired by PSO, but concept varies. Here, updating process does not include global best position and personal best position, but a competition phase is included. This algorithm achieves good balance among exploitation and exploration phases. This is used for solving huge optimization issues and is simple in processing and implementation. The basic equation of CSO includes the below formulae, Vx ( p + 1) = W1 (v, p)Vx ( p) + α2 (r2 − Z x ( p))
(5)
where velocity coordinate is Vx , generation number is p, α2 is orientation, W1 is random vector after vth competition, and Z x ( p) is position of particle at iteration p.
Jaya Optimizer Jaya [2] is powerful optimization algorithm that is evaluated for solving unconstrained and constrained issues. This algorithm uses concept that solution gained for problem move toward best solution and avoids worst solution. Jaya optimizer uses only common control parameters rather than using algorithm-specific parameters to control. Basic algorithmic equation of Jaya is indicated below, ' Sn,o, p = Sn,o, p + u 1,n, p (Sn,best, p− |Sn,o, p |) − u 2,n, p (Sn,worst, p− |Sn,o, p |)
(6)
where Sn,best, p− is n variable for best candidate, Sn,worst, p− is worst candidate for n ' variable, u 1,n, p and u 2,n, p are random variables in range [0, 1], and Sn,o, p is updated value of Sn,o, p .
ROA ROA [4] is optimization algorithm that considers few groups of riders traveling to common target location to become winner of race. Here, four riders are considered, such as bypass rider, overtaker, follower, and attacker. In ROA, bypass rider target location by bypassing path, follower follows leading path, overtaker reach by overtaking leading path, and attacker takes rider position with maximum speed for reaching target location. Here, time and distance play vital role in optimization issues. The basic equation representing the attacker position is given as below, [ ] F p S p+1 (w, q) = S l (l, q) + cos(X w,q ) ∗ S l (l, q) + νwp
(7)
22
K. Middha and A. Mittal
where S l (l, q) is leading rider position, X w,q is steering angle of wth rider in qth p coordinate, and distance is represented as νw . p
CMVO CMVO [7] is optimization algorithm used for solving global optimization issues that is inspired by MVO with different framework. Here, bicompetitive scheme is followed at every generation where updating universe is not confined to optimal universe. This adopts competitive mechanism among universe and finds update strategy, which makes universe to learn from winner. χ
Ulos,I ⎧{ β1 ∗ TDR + β2 ∗ (Uwin,I − Ulos,I ) + β3 ∗ ((UI ) − Ulos,I ) ⎪ ⎪ ⎨ β1 ∗ TDR + β2 ∗ (Uwin,I − Ulos,I ) + β3 ∗ ((UI ) − Ulos,I = ⎪ ⎪ ⎩ χ Ulos,I
(βd1) < WEP (βd1) ≥ WEP
(βd2) < WEP
(8)
(βd2) ≥ WEP
where Uwin,I is winner universe in Ith round, Ulos,I is loser universe in Ith round, χ Ulos,I is Ith round of competition, TDR and WEP are main coefficients, β indicates random values, and UI is mean position.
CMVRO Training of RideNN and DRN is carried out by CMVRO that is developed by joining ROA and CMVO. CMVO [7] is inspired from competition method among universes that makes comprehend from winner. This is effectual for addressing global optimization problems. Aim is increasing rate of exploration in search space by creating pair wise competition and improves exploitation capability via learning from winner. CMVO is simple in structure and shows quick convergence toward Pareto front. On other hand, ROA [4] is motivated from rider sets traveling to overcome general target position for being winner. This CMVRO concluded with improved accuracy of classification. Here, combination of ROA and CMVO is done to enhance complete algorithmic performance. Final update equation of CMVRO is indicated as below, ϒ2 + ϒ 3 E X i+1 (m, u) = ϒ2 + ϒ3 + ∗ Ψ(u) [ ϒ1 ∗ TDR + ϒ2 ∗ X i (s, u) + ϒ3 ∗ X u × ϒ2 + ϒ3
∗ ψ(u) +
] X i (ς, u) ∗ [1 − ψ(u)]
(9) where Y1 , Y2 , and Y3 are random numbers between [0, 1], X i (s, u) is winner universe in uth round of competition, TDR is coefficient, is random number, ς ,ψ express arbitrary number between 0 and 1.
Comparative Analysis of Deep Learning with Different Optimization …
23
Jaya_CMVRO Jaya_CMVRO algorithm is designed by coagulation of Jaya [2] and CMVRO algorithm. Here, CMVRO is designed by hybridizing ROA [4] and CMVO [7]. In CMVO, it is modeled depending on concept of competition for becoming winner. CMVO is effective method because of its simplicity and convergence rate. Important aim of CMVO is progressing exploration rate in area of exploration by utilizing pair wise competition where CMVO algorithm training is less. On the other hand, ROA is optimization technique that is calculated based on riding concept for becoming winner. Advantage of CMVRO is providing accurate detection result with maximum accuracy. Jaya [2] is population dependent metaheuristic strategy that groups features of both evolutionary and swarm-based intelligence algorithms and has no specific control parameters. Purpose of Jaya is that it moves to best solution and avoids worst solution. Hence, combination of advantages of ROA, CMVO, and Jaya algorithm is clearly represented in this hybridized algorithm. Final updated expression for Jaya_ CMVRO algorithm for training weight of RideNN and DRN is given as indicated, X (i + 1) )) ⎤ ⎡(( ϒ1 ∗ μ + ϒ2 ∗ wl (i ) + ϒ3 ∗ wn ∗ ψ(u)(1 − ϒ1 + ϒ2 ) + [ wε (i ) ∗ [1 − ψ(u)]](ϒ2 + ϒ3 ) ( ) ⎥ ⎢ ⎢ +w(i )(1 − ϒ1 + ϒ2 )(ϒ2 + ϒ3 + ∗ Ψ(u)) − ϒ2 wi (i ) − ϒ1 w y (i ) (ϒ2 + ϒ3 + ∗ ψ(u))+ ⎥ ⎥ [ ] =⎢ ⎥ ⎢ 2 − ϒ1 + ϒ2 (ϒ2 + ϒ3 + ∗ ψ(u)) ⎦ ⎣
(10)
where X (i + 1) is position of target at iteration i, and u is round of competition.
4 Results and Discussion Results with discussion of various optimization algorithms that train hybrid DL for T2DM detection based on various metrics are given in this section.
4.1 Experimental Setup This comparative analysis with various optimization algorithms is done in Python tool by using gene expression data.
4.2 Dataset Description Dataset used in this paper is gene expression data [8] that helps for evaluating differences in transcriptome of T2 diabetic human islets and compared to samples of
24
K. Middha and A. Mittal
non-diabetic islets. In this dataset, human islets are separated from pancreas of organ donors by collagenase digestion, which is followed by density gradient purification, and then handpicked and cultured for more than one day in M199 culturing medium.
4.3 Performance Metrics This research uses metrics such as accuracy, specificity, and sensitivity for assessing comparison of various optimization algorithms and is elaborated as below, a. Accuracy: This measures effectiveness of predicting T2DM outcome by utilizing true positives and negatives along false positives and negatives, which is given by AC1 =
I+ + I− I+ + I− + J+ + J−
(11)
where I+ is true positive, I− is true negative, J+ is false positive, and J− is false negative. b. Sensitivity: This measures accurateness of true positives for predicting positive cases, which is given as Se =
I+ I+ + J_
(12)
c. Specificity: This measures accurateness of false positives and predicts negative cases of patients regarding T2DM, which is given as below Sp =
I− I− + J+
(13)
4.4 Comparative Methods Many methods are compared by utilizing hybrid DL with varying performance metrics. They include PSO [5], CSO [6], Jaya [2], ROA [4], CMVO [7], CMVRO, and Jaya_CMVRO.
Comparative Analysis of Deep Learning with Different Optimization …
25
4.5 Comparative Analysis Comparison is done by three performance metrics, such as accuracy, sensitivity, and specificity. Here, assessment is done by changing solution size as well as iteration.
4.5.1
Assessment by Varying Solution Size
Figure 3 represents comparative analysis by changing solution size. Figure 3a shows accuracy-based comparative assessment by altering solution size. For solution size of 50, accuracy is 0.830 for PSO + hybrid DL, 0.840 for CSO + hybrid DL, 0.882 for Jaya + hybrid DL, 0.913 for CMVO + hybrid DL, 0.904 for ROA + hybrid DL, 0.914 for CMVRO + hybrid DL, and 0.951 for Jaya_CMVRO + hybrid DL. Figure 3b depicts sensitivity-based comparative assessment by altering solution size. For solution size = 50, sensitivity values are 0.836, 0.847, 0.874, 0.894, 0.904, 0.933, and 0.942 for PSO + hybrid DL, CSO + hybrid DL, Jaya + hybrid DL, CMVO + hybrid DL, ROA + hybrid DL, CMVRO + hybrid DL, and Jaya_CMVRO + hybrid DL. Figure 3c enumerates specificity enabled comparative assessment by altering solution size. When solution size = 50, then specificity is 0.799 for PSO + hybrid DL, 0.829 for CSO + hybrid DL, 0.850 for Jaya + hybrid DL, 0.869 for CMVO + hybrid DL, 0.889 for ROA + hybrid DL, 0.911 for CMVRO + hybrid DL, and 0.942 for Jaya_CMVRO + hybrid DL.
4.5.2
Assessment by Varying Iteration
Figure 4 represents comparative analysis by changing iteration. Figure 4a shows accuracy-based comparative assessment by altering iteration. When iteration is 100, then accuracy is 0.839 for PSO + hybrid DL, 0.860 for CSO + hybrid DL, 0.890 for Jaya + hybrid DL, 0.900 for CMVO + hybrid DL, 0.920 for ROA + hybrid DL, 0.931 for CMVRO + hybrid DL, and 0.954 for Jaya_CMVRO + hybrid DL. Figure 4b enables sensitivity based comparative assessment by altering iteration. When iteration is 100, then sensitivity is maximum for Jaya_CMVRO + hybrid DL with value of 0.947, whereas other methods such as PSO + hybrid DL, CSO + hybrid DL, Jaya + hybrid DL, CMVO + hybrid DL, ROA + hybrid DL, and CMVRO + hybrid DL show lesser values of sensitivity as 0.860, 0.869, 0.911, 0.915, 0.925, and 0.939. Figure 4c represents specificity-enabled comparative assessment by altering iteration. When iteration = 100, then specificity is maximal of 0.946 for Jaya_CMVRO + hybrid DL, and low values for PSO + hybrid DL with 0.813, CSO + hybrid DL with 0.840, Jaya + hybrid DL with 0.863, CMVO + hybrid DL with 0.884, ROA + hybrid DL with 0.894, and CMVRO + hybrid DL with 0.917.
26
K. Middha and A. Mittal
Fig. 3 Comparative assessment of various optimization algorithms-enabled hybrid DL by varying solution size, a accuracy, b sensitivity, and c specificity
4.6 Comparative Discussion Table 1 enumerates comparative analysis for T2DM detection by many methods that includes PSO, CSO, Jaya, CMVO, ROA, CMVRO, and Jaya_CMVRO. Here, Jaya_CMVRO shows high accuracy of 0.954 when compared with other methods, and this is due to usage of YJ for data transformation. Similarly, high specificity of 0.946 is reached for Jaya_CMVRO due to Jaya_DOA utilized for feature selection. Furthermore, sensitivity is 0.947 which is of higher value as because of utilizing proper augmented data when compared with other techniques.
Comparative Analysis of Deep Learning with Different Optimization …
27
Fig. 4 Comparative assessment of various optimization algorithms-enabled hybrid DL by varying iteration, a accuracy, b sensitivity, and c specificity Table 1 Comparative discussion of various methods Classification Metrics/ methods methods
Varying solution size
Varying iteration
Accuracy
PSO + CSO + Jaya + CMVO ROA + CMVRO Jaya_ hybrid hybrid hybrid + hybrid + hybrid CMVRO DL DL DL hybrid DL DL + hybrid DL DL 0.830
0.840
0.882
0.913
0.904
0.914
0.951
Specificity 0.799
0.829
0.850
0.869
0.889
0.911
0.942
Sensitivity 0.836
0.847
0.874
0.894
0.904
0.933
0.942
0.839
0.860
0.890
0.900
0.920
0.931
0.954
Specificity 0.813
Accuracy
0.840
0.863
0.884
0.894
0.917
0.946
Sensitivity 0.860
0.869
0.911
0.915
0.925
0.939
0.947
The bold values are final output
28
K. Middha and A. Mittal
5 Conclusion T2DM is worldwide health crisis with significant mortality, morbidity, and disability, which affects much number of people worldwide. Many methods are available for T2DM detection, but most probably available method is using hybridized algorithms for training DL that estimate disease. Here, many methods are compared for detection accuracy that includes many algorithms as well as optimization algorithms. Moreover, in this work, hybrid DL is followed with RideNN and DRN that are trained by many algorithms, such as PSO, CSO, ROA, Jaya, CMVO, CMVRO, and Jaya_ CMVRO. Here, data transformation is done by YJ, and feature is selected by Jaya_ DOA. Also, the combination algorithms used are Jaya_DOA, formed by Jaya and DOA, as well as, Jaya_CMVRO that is formed by Jaya along CMVO and ROA. As a result of this comparative assessment, it is noted that Jaya_CMVRO trained by hybrid DL is highly effective with maximal accuracy of 95.4%, maximal specificity of 94.6%, and maximal sensitivity of 94.7% for detecting T2DM. This comparison can further be extended by utilizing some other optimization algorithms for better results.
References 1. He Y, Zheng Y (2018) Short-term power load probability density forecasting based on YeoJohnson transformation quantile regression and Gaussian kernel function. Energy 154:143–156 2. Rao R (2016) Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Ind Eng Comput 7(1):19–34 3. Bairwa AK, Joshi S, Singh D (2021) Dingo optimizer: a nature-inspired metaheuristic approach for engineering problems. Math Probl Eng 4. Binu D, Kariyappa BS (2018) RideNN: a new rider optimization algorithm-based neural network for fault diagnosis in analog circuits. IEEE Trans Instrum Meas 68(1):2–26 5. Wang D, Tan D, Liu L (2018) Particle swarm optimization algorithm: an overview. Soft Comput 22(2):387–408 6. Cheng R, Jin Y (2014) A competitive swarm optimizer for large scale optimization. IEEE Trans Cybern 45(2):191–204 7. Benmessahel I, Xie K, Chellal M (2020) A new competitive multiverse optimization technique for solving single-objective and multiobjective problems. Eng Rep 2(3):e12124 8. Gene expression data will be obtained from: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE25724. Accessed Oct 2022 9. Tsiotas G (2009) On the use of non-linear transformations in Stochastic Volatility models. Stat Methods Appl 18(4):555–583 10. Chen Z, Chen Y, Wu L, Cheng S, Lin P (2019) Deep residual network based fault detection and diagnosis of photovoltaic arrays using current-voltage curves and ambient conditions. Energy Convers Manag 198:111793
Differential Analysis of MOOC Models for Increasing Retention and Evaluation of the Performance of Proposed Model Harsh Vardhan Pant and Manoj Chandra Lohani
Abstract MOOCs attract a diverse group of learners with different motivations to stay engaged in the course. While there are existing models to improve retention rates in MOOCs, there is still a need for an ideal model. To address this, researchers used structural equation modeling to identify previously unexplored factors that influence learner retention in MOOCs. The study revealed a new model that could enhance learner satisfaction and improve retention rates. In continuation to this research, current paper has evaluated the performance and validity of the retention model of MOOC. For doing the same, study has integrated some extra variables like credit mobility, content localization, and latest trend course with TAM. The current paper has also made the significant comparative analysis of variance explained (R2 ), with the original TAM and some previous models, so that we can find out how much the proposed model is good to fit. Keywords Model · Evaluation · MOOCs
1 Introduction The development of a predictive model for identifying the intention to learn has emerged as a major problem in the fields of learning analytics and educational data mining. An efficient and trustworthy methodology for learner’s satisfaction and motivation for retention and continuation in the MOOCs is required for the growth of MOOCs. Even though MOOC education is very well-liked, it also has a lot of issues like: low retention rate of learners, a lack of motivations for learning, lack of selfregulation, need of content localization in course, and less recognition of MOOC certificates and low rate of getting the jobs after completion the MOOC course [1]. H. Vardhan Pant (B) · M. Chandra Lohani School of Computing, Graphic Era Hill University, Bhimtal Campus, Uttarakhand, India e-mail: [email protected] H. Vardhan Pant Department of Computer Science, Amrapali Group of Institute, Haldwani, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_3
29
30
H. Vardhan Pant and M. Chandra Lohani
Fig. 1 Conceptual framework of the proposed model. Source [2]
Low passing rates and inconsistent assessment methods for MOOCs are the results of these issues. However, there are a number of literatures for conceptual framework of retention in MOOC but very few of them have mentioned the performance and accuracy of the conceptual model. Therefore, the current study focuses on developing and evaluating models to answer the following research questions: RQ1: To what extent does the newly proposed model effectively account for the intention to stay engaged in MOOCs? RQ2: What potentially strong relationships exist between the motivating variables that influence learners’ intentions to retain in the MOOC course? As mentioned earlier that the proposed model and motivational factors for retention have already identified and published in a Scopus index reputed journal by the authors [1, 2]. In these research papers, authors have decrypted the learners’ retention factors in massive open online courses and proposed a model as mentioned in Fig. 1. In continuation with the above-mentioned research and published papers, the authors have focused only model performance measures through explanatory power (R2 ), predictive relevancy (Q2 ), and fit indices approaches in the current paper.
2 Literature Review However, there are immense number of literatures related to retention/continuation/ dropout MOOC learners. They all suggest precious motivational factors and give the conceptual framework for reducing the dropout learners [3–5]. When any conceptual framework is explained in a study, there is a good sign of clarity of any research that if it mentioned and verified how much the proposed model good to fit for the proposed environment? But authors found that very few [6, 7] of literatures have mentioned the prediction performance and accuracy of the conceptual framework. Beside this, each related research has found some new motivational factors or known factors with different demographic data and environment; So, there should be mentioned
Differential Analysis of MOOC Models for Increasing Retention …
31
and calculated the structural model or predictive model performance also. Therefore in this paper, authors have analyzed the structural model assessment and predicted the performance of the given model [2].
3 Methodology According to Hair et al. [8], the phase of assessing the structural model involves three assessments, namely calculating the predictive relevancy (Q2 ), calculating the coefficient of determination (R2 ), and evaluating the significance of path coefficients. This paper’s main objective is to evaluate the proposed predictive model’s overall performance concerning MOOCs. To assess model performance, various measures are used, primarily involving comparing the model’s predicted values with the known values of the dependent variable in a dataset. The current study measures the following steps to measuring the proposed MOOC retention model. Step 1 (Model Evaluation): During this step, the study has determined the level of reliability of the model’s predictions. In other words, the focus was on evaluating how effective the model is. To find the answer of the above question study of Goodness-of-fit (GoF), measure has been used for some continuous dependent variables like CL, LTC, CM, and SI. It is formulated as MSE( f, X , y) =
n n 1 1 2 (yi − yi )2 = r n i n i i
The residual for the ith observation is represented as r i in this equation, and as a result, the mean squared error (MSE) is calculated as the sum of squared residuals. MSE is calculated on a different scale from the dependent variable, making it less interpretable. Therefore, a more comprehensible version of this measure is the coefficient of determination (R2 ).
R 2 ( f, X , y) = 1 −
MSE f, X , y MSE( f 0,X ,y)
The formula presented in this context involves the function f 0 (), which serves as a baseline model. In the case of classical linear regression, f 0 () refers to the model with only the intercept, which predicts the mean value of Y for all observations. The coefficient of determination (R2 ) is a normalized meaning that a model that fits perfectly would result in R2 = 1, while R2 = 0 would indicate that the model is not performing any better than the baseline model.
32
H. Vardhan Pant and M. Chandra Lohani
In current study, the authors have integrated some hypnotized extra variables with popular TAM model like CM, CL, LTC SI, etc. Thus, to find out the RQ1, this study added these variables one by one with the TAM model and identified the explanatory power of TAM. Step 2 (Model Comparison): The current study also compared the calculated values of explorative power of R2 with earlier models in order to verify the proposed model. These all steps were done with the help of Smart PLS tool and which are illustrated below.
4 Result In this section of the study, the improvement in R2 values is demonstrated by including additional variables in the original TAM model after eliminating irrelevant paths. Table 1 presents the increase in explained variance for perceived usefulness and intention to continuation or retention as new factors are added to the original TAM model. The integrated TAM, which incorporated the CM, CM, LTC, and SI variables, resulted in a significant increase in R2 compared to the original TAM, and all other models presented in Table 3. The outcomes of incorporating the components into the original TAM model are shown in Figs. 2, 3, 4, 5, and 6. Based on the path analysis results, it appears that four variables, namely, LTC, CR, PU, and BI, are significant predictors of retention. The R2 value of 0.507 for RET suggests that these variables explain 59.1% of the variance in retention, indicating a good predictive power of the research model. Moreover, the analysis also reveals that LTC and SI together explain 52.5% of the variance in BI, which suggests that LTC and SI are important predictors of BI. However, in the case of PU, only CL explains 74.1% of the variance, indicating that CL is the most important predictor of PU. Table 1 shows how incorporate more variables to TAM for increases its capacity The model
Explained variance in PU (%)
Explained variance in behavioral intention (BI) (%)
Explained variance in retention RET
Original TAM model
25.40
45.10
43.50
Integrated TAM with CL
35.16
48.24
48.5
Integrated TAM with CL and CM
46.60
50.20
52.30
Integrated TAM with CL, CM, and LTC
58.50
51.30
58.20
Integrated TAM with CL, CM, LTC, and SI
74.1
52.5
59.6
Differential Analysis of MOOC Models for Increasing Retention …
33
Fig. 2 Original TAM model
Fig. 3 Integrating TAM with CL and BI
Fig. 4 Integrating TAM with CL, BI, and CM
Overall, the path analysis results provide valuable insights into the relationships between the different variables in the research model and their impact on retention, BI, and PU. These findings can help MOOCs platform better understand the factors that influence learners’ retention and take appropriate measures to improve it.
34
H. Vardhan Pant and M. Chandra Lohani
Fig. 5 Integrating TAM with CL, BI, CM, LTC
Fig. 6 Integrating TAM with CL, BI, CM, LTC, SI
To ensure the accuracy and reliability of the research model, it is essential to evaluate its performance using goodness-of-fit indices. According to Ref. [9], Table 2 presents the fit indices that were used to assess the model’s performance. The results indicate that the dataset is consistent with the study model, as all of the actual values of the indices fall within the recommended range. This suggests that the model is a good fit for the data and provides an accurate representation of the relationships
Differential Analysis of MOOC Models for Increasing Retention …
35
Table 2 Fit indices Index
Recommended value/condition
Actual value
SRMR—“standardized root mean square residual”
< 0.08
0.036
NFI—“normed fit index”
> 0.9
0.924
d_ULS—“unweighted least squares”
“d_ULS < bootstrapped HI 95% of d_ULS and d_G < bootstrapped HI 95% of d_G”
0.466
* “Chi square (χ 2 )”
χ 2 /df < 3
2.46
N, number of participants (380) [2]
between the variables in the study. Therefore, the model can be considered valid and reliable for further analysis and interpretation of the results.
5 Tabulation of Comparing the Performance of Concerned Research Model with Earlier Studies In above explained, that the perceived usefulness and intention to retention have relatively moderate explained variances of 74.1% and 59.6%, respectively. To address the second research question, this study compared the performance of its model to similar MOOC studies by calculating the R2 value, which indicates the amount of variance explained in perceived usefulness and behavioral intention (BI)/continuance intention/intention to retention (Table 3). As per [13] the squared correlation values of 0.741 and 0.596 (Table 3) in PLS path, models are regarded as considerably moderate. However, it has not outperformed Table 3 Contrasting the explained variation in the research model’s PU, PEU, and RET/BI/CI with that in models suggested in prior studies
Study
Variance explained (R2 ) PU (%)
PEU (%)
BI/CI/IRET (%)
[3]
None
None
69.9
[4]
64
61
64.1
[5]
40.4
23.9
50.7
[6]
94.8
46.8
95.7
[7]
37.8
None
79.4
[10]
34.4
37.1
47.2
[9]
60
47
62.2
[11]
42
None
66
78
79
None
59.6
[12] Current research
74.1
36
H. Vardhan Pant and M. Chandra Lohani
from earlier models in all the parameters but has shown distinct improvement. As shown in Table 3, the R2 values of BI/CI/IRET depicted in the current research are lower than the values estimated by the most of the earlier studies approximately. This outcome can be a result of the model being incorrectly described owing to the omission and/or inclusion of a variable or parameter. However, some researches like [7, 14] the R2 values are increased than the earlier studies. Besides this, Table 3 also depicts that the value of PU is significantly increasing (74.1%) and representing that when used MOOCs courses with regional language, it will be more useful. In order to improve the predictive power of future studies on learners’ intention to retention in MOOCs, it is suggested to incorporate additional influential predictors that can increase the explained variance.
6 Conclusion The evaluation model that is proposed derives from an extensive literature review on MOOCs, published it [2], and it was examined by a very popular MOOC platforms of India, i.e., SWAYAM. To assess the significance of the proposed paths in the research model, the authors of this study used bootstrapping to calculate the R2 of the structural model and determine its explanatory power. The R2 value that is also representing the model’s performance was calculated as a moderate model. So, this study also endorse the policies of Indian Government that is promoting the credit mobility features and regional language courses in MOOCs.
References 1. Pant HV, Lohani CM, Pande J (2021) Exploring the factors affecting learners’ retention in MOOCs: a systematic literature review. IJICTE 17(4):1–17 2. Harsh H, Lohani MC, Pande J (2022) Decrypting the learners’ retention factors. J Learn Dev 9(1) 3. Chiappe A, Castillo B (2021) Retention in MOOCS: some key factors. Ensaio: aval pol públ educ 29(110) 4. Marcela GG-ZL, Garza ADL (2016) Research analysis on MOOC course dropout and retention rates. Turk Online J Distance Educ 17(1) 5. Dominic P, Munib H (2016) Exploring the factors associated with MOOC engagement retention and the wider benefits for learners. Eur J Open Distance E-Learn 19(2) 6. Yamini G, Rinkaj G (2020) On the effectiveness of self-training in MOOC dropout prediction. Open Comput Sci 7. Ahmad SA-A (2020) Investigating the drivers and barriers to MOOCs adoption: the perspective of TAM. Educ Inf Technol 8. Hair Jr FJ, Sarstedt M, Hopkins L, Kuppelwieser VG (2014) Partial least squares structural equation modeling (PLS-SEM): an emerging tool in business research. Eur Bus Rev 26(2):106– 121 9. Henseler J, Hubona G, Ray A (2016) Using PLS path modelling in new technology research: updated guide-lines. Ind Manag Data Syst 116(1):2–20
Differential Analysis of MOOC Models for Increasing Retention …
37
10. Bido D, Silva dD, Ringle C (2014) Structural equation modeling with the SmartPLS. Braz J Mark 13(2):57–73 11. Meet R, Kala D, Al-Adwan A (2022) Exploring factors affecting the adoption of MOOC in generation Z using extended UTAUT2 model. Educ Inf Technol 12. Al-Adwan AS, Albelbisi NA, Hujran O, Al-Rahmi WM, Alkhalifah A A (2021) Developing a holistic success model for sustainable e-learning: a structural equation modeling approach. Sustainability 13 13. Wu B, Chen X (2017) Continuance intention to use MOOCs: integrating the technology acceptance model (TAM) and task technology fit (TTF) model. Comput Hum Behav 67:221–232 14. Yang M, Shao Z, Liu Q, Liu C (2017) Understanding the quality factors that influence the continuance intention of students toward participation in MOOCs. Educ Technol Res Dev 65(5):1195–1214
Deep Convolutional Neural Networks Network with Transfer Learning for Image-Based Malware Analysis V. S. Jeyalakshmi, N. Krishnan, and J. Jayapriya
Abstract The complexity of classifying malware is high since it may take many forms and is constantly changing. With the help of transfer learning and easy access to massive data, neural networks may be able to easily manage this problem. This exploratory work aspires to swiftly and precisely classify malware shown as grayscale images into their various families. The VGG-16 model, which had already been trained, was used together with a learning algorithm, and the resulting accuracy was 88.40%. Additionally, the Inception-V3 algorithm for classifying malicious images into family members did significantly improve their unique approach when compared with the ResNet-50. The proposed model developed using a convolution neural network outperformed the others and correctly identified malware classification 94.7% of the time. Obtaining an F1-score of 0.93, our model outperformed the industry-standard VGG-16, ResNet-50, and Inception-V3. When VGG-16 was tuned incorrectly, however, it lost many of its parameters and performed poorly. Overall, the malware classification problem is eased by the approach of converting it to images and then classifying the generated images. Keywords Malware · VGG-16 · ResNet-50 · Inception-V3 · Deep learning · Transfer learning
V. S. Jeyalakshmi (B) · N. Krishnan Centre for Information Technology and Engineering, Manonmaniam Sundaranar University, Tirunelveli, India e-mail: [email protected] N. Krishnan e-mail: [email protected] J. Jayapriya Department of Computer Science (YPR Campus), Christ University, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_4
39
40
V. S. Jeyalakshmi et al.
1 Introduction The term “malware” is used to describe any malicious software with the intent to hurt or damage its target. They come in a wide variety, some of which are quite dangerous while others pose little to no threat at all. Malware falls into such a wide variety of types and distinguishing one kind of malware from another that the necessity for categorization becomes immediately apparent. Given these several justifications, it is clear that malware categorization is essential. The first benefit of categorizing malware is that it allows us to determine the potential damage it poses, the risks it poses, and the order of priority for dealing with those threats and malware. It also aids in the discovery of countermeasures with which to deal with it. Analyzing and learning more about various forms of malware. Scholars have suggested a wide variety of novel ways, with some using already established categorization schemes. For the most part, malware samples have been classified using shallow machine learning approaches. Since its inception in the 1990s, convolution neural networks have been used and studied for a variety of computer vision-related applications, including those involving images, audio, and time series [1]. It has also been shown to be quite effective in a variety of other contexts, such as object identification, categorization issues, recognition, and more. Look at the publications at [2–4]. CNN’s effectiveness in classification issues has prompted a flurry of studies exploring how this technology may be used for the classification of malware [5–7]. The convolution neural network (CNN) is a kind of deep neural network that uses convolution layers, pooling layers, and fully connected layers to mechanically and seamlessly learn and provides advanced data. CNN’s design intertwines learned characteristics with data input and automates feature extraction, making it a useful tool for classifying images or data with a high number of characteristics. If a dataset is already in an image-based format, applying CNN and identifying malware are a simple operation, but for other formats, it might be time-consuming. Convolutional neural networks are used in this article to classify malware, and the results are interpreted and compared to those produced by using a transfer learning procedure with a VGG-16 model on the Malimg malware dataset [8]. Grayscale images with textural patterns may be created by converting malicious data binaries, 8 bits at a period, to binary. In Fig. 1a, we can observe how several textures appear in different parts of a malware code. Malware may be categorized based on these patterns. The Malimg dataset is used in this work; it is collection of .jpg files that represent malware binaries in grayscale. Figure 1b depicts several common types of malware families. The work’s contribution is the use of the back propagation approach, which allows CNN to learn the ability to adapt advanced information. CNNs are deep neural networks that rely heavily on convolution techniques and consist of layers including a convolution, pooling layer, and fully connected layer. CNN is useful for classifying pictures or data with many characteristics since its design intertwines learned characteristics with input information and employs automation to extract those features. Implementing CNN and categorizing malware is a simple operation for datasets that
Deep Convolutional Neural Networks Network with Transfer Learning …
41
Fig. 1 a Image representation of a transportable executable program. b Image samples of malware from several families
already exist in the image-based format, but it is time-consuming for other formats. Convolutional neural networks are used in this study to classify malware, and the results are interpreted and compared to those produced using the transfer learning method and the VGG16 model on the Malimg malware dataset.
2 Related Works The malware has been detected and categorized using a wide variety of methods. There is a plethora of papers on automated detecting attacks, analysis, and categorization [9]. The development of machine learning and deep learning, meanwhile, has presented many new opportunities for the categorization as well as analysis of computer viruses and other malicious software. Though several other types of deep neural networks exist, CNN is often regarded as the most effective neural network for image categorization. Hubel and Wiesel’s ideas on simple and complicated cells served as inspiration for the 1979 Kunihiko Fukushima Noncognition, which in turn influenced the creation of CNN. After 8 years of study, the breakthrough came in 1989 with the introduction of Lenet, a convolution neural net specifically designed for handwriting recognition. Practical implementation was also done afterward to read postal codes on mail. They have now found widespread use in the field of
42
V. S. Jeyalakshmi et al.
image categorization. Malware has been categorized in a variety of ways. In their study, CNN converted its binary representation into the Malimg dataset [10]. They employed K-nearest neighbors for the classification and zeroed down on the image’s surface and focus on various aspects to achieve 98% accuracy. Using their convolution neural network, they dubbed M-CNN, and they were able to classify malware with an accuracy of 98.52% on the Malimg dataset and 99.77% on the Microsoft dataset. They manually converted the binary data into an image format. To classify malware, [2] used a pre-trained neural network and fine-tuned it to do image-based classification. The author compared the pre-trained versions of VGG-16, ResNet-50, and Google Inception-V3 on the color and monochrome Malimg and IoT android phone datasets. Achieving best-in-class accuracy of 98.82% on the Malimg dataset and 97.35% on the IoT android mobile dataset, respectively. It was also noted that the colorful image dataset had higher accuracy than the grayscale one. To train the grayscale images for identifying malware groups, [11] used Simhash followed by CNN, achieving the best accuracy of 99.260%. I built a specialized CNN, did a full model training, and then utilized transfer learning to fine-tune it for the Malimg datasets. This results in a 98.61% success rate and an F-score of 0.96. In addition, [12] conducted experiments on the same dataset, this time employing a more advanced CNN-based model for multi-family classification that made use of SVM as an activation function. An accuracy of 99.59% was achieved as a consequence of this. To construct an n-gram visual feature of malware, researchers have utilized “hierarchical convolutional neural networks deployed at functional and mnemonic levels.” At last, it is being put to use in the 2016 Microsoft malware categorization dataset. Using CNN, we were able to improve upon the state of the art in malware categorization and achieve a better rate of success. Experiments were conducted using a variety of methods, ranging from machine learning to deep learning, to extract fresh characteristics and construct a reliable classifier. Accuracy has been a primary emphasis for the majority of writers. However, the training–testing accuracy gap and the training–testing loss gap are equally crucial to developing a reliable classifier. The goal of this study is to find a way to minimize the gap between two distinct measures of accuracy and loss. Given the difficulty of developing a reliable malware classifier, we investigated methods for enhancing the F1-score of a malware classifier based on images. Consider and compare many alternative methods for doing this task.
3 Proposed Work Implementation The majority of the testing was conducted in a Jupyter notebook utilizing the integrated development environment and the TensorFlow 2.3.1 Keras package on an Intel Core I5 1.7 GHz CPU. More specifically, a proprietary model was trained using the benchmark image database of size 1.2 GB for the experiment. There was first a basic CNN model, and then several improvements and additions were made to boost its accuracy. Proposed work transfer learning compared to VGG-16 was trained for
Deep Convolutional Neural Networks Network with Transfer Learning …
43
just 20 epochs, whereas the customized models were given 25 epochs at an 80:20 ratio. Moreover, the pre-trained model’s Oxford VGG-16 model was used for transfer learning. Accuracy was fine-tuned for better performance. To discover the optimal model, an early checkpoint was established to record the validation loss’s change at the 5th decimal place (i.e., 0.00001), and tolerance was set to 6 iterations. If validation loss did not vary between epochs, the acquisition rate was decreased by a factor of 0.01 to produce a better model. The Adam optimizer was implemented to determine the optimal weighting and training rate for the model utilized in the compilation process.
3.1 Convolution Neural Network The layers of a convolutional neural network are the convolution layer, the pooling/ subsampling layer, and the convolution layer with pooling layers and local connection. So, no more time-consuming and error-prone manual feature extraction are needed, and there is minimum preliminary work involved. According to Yann and LeCun, their work was motivated by the finding of locally sensitive, direction neurons in the visual system of a cat by Hubel and Wiesel. LeCunn himself showed the first practical implementation of handwriting recognition [13]. Since then, it has been implemented in image-sharing platforms like Google and social media platforms like Facebook and Apple’s Face ID. In the same way, you may learn everything about illness diagnosis and categorization, document analysis, NLP, RPGs, and braincomputer interfaces by reading. In addition to these, it has found use in several other domains, such as data security [14]. Interconnection between CNN’s three layers— the convolution layer, the pooling layer, and the fully linked layers—allows the network to function sequentially. Figure 2 shows CNN architecture.
Fig. 2 CNN architecture
44
V. S. Jeyalakshmi et al.
3.2 Transfer Learning The idea of transfer learning is to build upon a previously trained model for a different application. The concept is to use the best pre-trained algorithms from one context and apply them to another by adjusting the appropriate parameters. The most popular pretrained models are created using ResNet, Oxford VGG, and Google Inception. Before 2015, these models dominated the annual ImageNet competition. However, since the increasing use of transfer learning in business began in 2010, transfer learning has been a hot issue. A wide variety of fields and activities benefit from transfer learning, including classification tasks, sentimental analysis, simulation, gaming, and many more. Malware categorization with precise tweaking has been an application case for transfer learning. The process of creating a model from start is time-consuming and laborious, but transfer learning makes the process quick and easy. Selecting a layer, freezing unnecessary layers, establishing weights, and hyperparameter adjustment are all challenging tasks in transfer learning [15].
3.3 VGG-16 In 2014, K. Simonyan unveiled (see Fig. 3), VGG-16, a CNN-based deep learning architecture, which went on to win the ILSVR (ILSVR) competition. There are a total of 16 layers, and they are as follows: 5 convolution layers, 5 max-pooling layers, 3 completely linked layers, and 1 dense layer. To train our model, we fed an image with dimensions of 224 × 224 into VGG-16’s various layers and then added an average global pooling layer and two dense layers to the original model to divide malware into 25 classes.
Fig. 3 VGG-16 architecture
Deep Convolutional Neural Networks Network with Transfer Learning …
45
Fig. 4 Inception-V3 architecture
3.4 Inception-V3 The Inception-V3 (see Fig. 4) model, which won ILSVRC in 2016, has 42 layers and is an upgrade on GoogleNet’s Inception-V1 model, which won in 2015. An initial 5× Inception module A, 4× Inception module B, 2× Inception module C, and 2× grid reduced size to form the basis of the Inception-V3 model architecture, with one grid size reduction being modified and the other implemented as-is. To further enhance the outcomes, an additional classifier is used as a secondary layer of processing.
3.5 ResNet-50 Residual networks are the third model they use (ResNet-50). In 2016, ResNet-50 (see Fig. 5) triumphed at ILSVRC. This model pioneered a new method of connecting non-adjacent convolutional layers employing “shortcut connections,” which result in more connections between the layers. The model was able to reduce its loss and improve its performance by skipping over layers with vanishing gradients, thanks to this method. The network’s astounding depth of 152 layers made it 8 times as deep as a standard VGG network. A 28% increase in accuracy in image classification was achieved compared to the VGG-16 model using faster R-CNN. Its original ResNet-50 architecture is seen in Fig. 5.
46
V. S. Jeyalakshmi et al.
Fig. 5 ResNet-50 architecture
4 Dataset Few malware datasets are accessible for academic study. Malimg is one such data collection. There are 9342 malicious images in the sample, split across 25 distinct virus families. Table 1 displays the Malimg dataset’s familial structure. Malware images come in a wide range of sizes. Dialer, Backdoor, Worm, WormAutoIT, Trojan, Trojan-Downloader, Rouge, and PWS are just some of the families of malware that have been used to make these images. All malware images started as PE files, which were subsequently transformed into something like an 8-bit vector binary. There were adjustments made to the dimensions of the malware images so that they could be fed into a convolutional neural network.
5 Architecture Our convolutional neural network was built with three layers of convolution and three of maximum pooling. After the 1-dimensional flattening of the obtained trainable parameters, five successive thick layers were applied. The sigmoid function was used to shape the output of the dense layers, while the ReLU activation function was used for the other layers. If the neuron’s input is positive, ReLU will activate it; otherwise, it will be turned off. In the end, the sigmoid function calculates the probability weight for each. At long last, we reached a total of 1,145,305 trainable params. Regularizes and dropout layers around 0.1 were present during experimental model building,
Deep Convolutional Neural Networks Network with Transfer Learning … Table 1 Malimg dataset
47
Malware family
Malware type
Adialer.C
dialer
Malware samples 122
Agent.FYI
bd
116
Allaple. A
worm
2949
Allaple. L
worm
1591
Alueron.gen!J
trojan
198
Autorun.K
worm
106
C2LOP.gen!g
trojan
200
C2LOP.P
trojan
146
Dialplatform.B
dialer
177
Dontovo.A
dl
162
Fakerean
rogue
381
Instantaccess
dialer
431
Lolyda.AA1
pws
213
Lolyda.AA2
pws
184
Lolyda.AA3
pws
123
Lolyda.AT
pws
159
Malex.gen!J
trojan
136
Obfuscator. AD
dl
142
Rbot!gen
bd
158
Skintrim.N
trojan
Swizzor.gen!E
dl
128
Swizzor.gen!I
dl
132
VB.AT
worm
408
Wintrim. BX
dl
80
97
but they did not improve the final product; thus, they were eliminated. Similarly, we tried out models with a horizontal flip, a shear ranging from 0.1 to 0.3, and a zoom ranging from 0.1 to 0.4, but they fell short of our accuracy goals and were ultimately left out of the final model. VGG-16 architecture was utilized as the input and worldwide average pooling layer for the transfer learning, and two successive dense layers were inserted at the output terminal to categorize the images. Out of the total parameters 14,783,577, 68,889 were able to be trained. For ResNet-50 and Inception-V3, we classified 25 malware at the output layer while leaving the other layers unmodified. In society wise, adaptation is used for AIRPORT BAG Checking and ARMY Soldier’s Bomb detections.
48
V. S. Jeyalakshmi et al.
6 Results and Discussion The custom model performed the best in this trial with the lowest loss. Since all parameters were used during training in the custom model, there was no feature attrition in the case of transfer learning, there was a significant loss of parameters, and hence, the model did not perform any worse than usual. The difficulty of adjusting caused parameter loss since unable to optimize the model’s hyperparameters for this dataset. With an accuracy of 99.76%, validation accuracy was only 98.2% for the custom CNN model, 88.40% for the transfer learning approach using VGG-16, 90.2% using ResNet-50, and 92.4% using Inception-V3. The validation accuracy was 98.07%, with 35 out of 1868 training data misclassified and the rest of 1833 successfully classified. The dataset’s sparseness was a major contributor to the errors in categorization. Additional data would have helped the model perform better, and the dataset’s uneven shape and size as shown in Fig. 6 were other factors. Figures 7, 8, 9, and 10 and Table 2 represent different modeling schemes of our working results.
Fig. 6 Dataset visualization
Deep Convolutional Neural Networks Network with Transfer Learning …
49
Fig. 7 Binary images for Malimg datasets
Fig. 8 Confusion matrix
7 Conclusion and Future Work Many antivirus solutions now available use deep learning to detect and eliminate viruses. When applied to Windows PE binaries, deep learning algorithms have shown promising results in identifying malicious software. We have compared the effectiveness of three classifiers using an image dataset of malware constructed from PE files. We classified grayscale malware images using the ImageNet models and three
50
V. S. Jeyalakshmi et al.
Fig. 9 a Train and test —loss. b Train and test—accuracy
96 94 92 90 88 Accuracy (%)
86 84 82
Fig. 10 Comparative analysis for custom CNN, VGG-16, ResNet-50, and Inception-V3
Table 2 Comparsion of Malimg dataset Model
Accuracy (%)
Precision
Recall
F1-score
Custom CNN
94.7
0.94
0.95
0.95
VGG 16
87.42
0.85
0.86
0.85
ResNet 50
88.8
0.88
0.88
0.88
Inception V3
90.45
0.90
0.90
0.90
additional CNN models. All three models were successfully trained on the Malimg datasets, and the findings demonstrate that CNN performs the best compared to the other two. As far as we are aware, its effectiveness in classifying grayscale virus images is unparalleled. Additional models from image classification competition
Deep Convolutional Neural Networks Network with Transfer Learning …
51
leaderboards will be employed in future experiments. As part of the categorization process, we’d want to colorize virus images by converting them to RGB format.
References 1. LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, vol 3361, no 10 2. Vasan D, Alazab M, Wassan S, Naeem H, Safaei B, Zheng Q (2020) IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture. Comput Netw 171:107138 3. Kalash M, Rochan M, Mohammed N, Bruce ND, Wang Y, Iqbal F (2018) Malware classification with deep convolutional neural networks. In: 2018 9th IFIP ınternational conference on new technologies, mobility and security (NTMS). IEEE, pp 1–5 4. Gibert D, Mateu C, Planes J (2019) A hierarchical convolutional neural network for malware classification. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8 5. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551 6. Nataraj L, Karthikeyan S, Jacob G, Manjunath BS (2011) Malware images: visualization and automatic classification. In: Proceedings of the 8th international symposium on visualization for cyber security, pp 1–7 7. Nataraj L, Karthikeyan S, Jacob G, Manjunath B (2011) Available at: https://www.dropbox. com/s/ep8qjakfwh1rzk4/malimg_dataset.zip?dl=0 [Online] 8. Ahmadi M, Ulyanov D, Semenov S, Trofimov M, Giacinto G (2016) Novel feature extraction, selection, and fusion for effective malware family classification. In: Proceedings of the sixth ACM conference on data and application security and privacy, pp 183–194 9. Gandotra E, Bansal D, Sofat S. Malware analysis and classification: a survey. J Inf Secur (2014) 10. Abusitta A, Li MQ, Fung BC (2021) Malware classification and composition analysis: a survey of recent developments. J Inf Secur Appl 59:102828 11. Ni S, Qian Q, Zhang R (2018) Malware identification using visualization images and deep learning. Comput Secur 77:871–885 12. Lad SS, Adamuthe AC (2020) Malware classification with ımproved convolutional neural network model. Int J Comput Netw Inf Secur 12(6) 13. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 14. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 15. Asam M, Khan SH, Jamal T, Zahoora U, Khan A (2021) Malware classification using deep boosted learning. arXiv preprint arXiv:2107.04008
Analysis of Network Failure Detection Using Machine Learning in 5G Core Networks Anjali Rajak and Rakesh Tripathi
Abstract Mobile service providers must consistently offer reliable and high-quality Internet services to support the 5G mobile network. Additionally, since the Internet is run cooperatively among the providers, an unexpected failure in a provider’s domain can rapidly spread all over the world, but only highly experienced operators can tackle these network failures. In order to address unexpected failures in the core network, machine learning plays an important role. Machine learning-based network operations operate efficiently and automatically, and it will also reduce operational costs. In this study, we used machine learning to analyze the network failures in the 5G core network. To identify the suitable approach, we analyze the performance of three ensemble learning-based machine learning algorithms—XGBoost, LGBM, and random forest, from different perspectives: preprocessing of training data, normal/ abnormal samples, and feature importance. The results demonstrated that XGBoost provides higher accuracy with a smaller number of features. Similarly, LGBM and RF improve their performance while reducing the number of features. Overall, our proposed work has achieved a higher detection accuracy of 98.39% and a 100% detection rate in the three types of failures with XGBoost. Keywords Feature importance · Feature rank · Machine learning · 5G core network · Failure detection
1 Introduction The advancement of network technologies such as network function virtualization and 5G has dramatically changed the telecom industry and brought faster speed to end users. The monitoring system confronts a variety of challenges as a result of A. Rajak (B) · R. Tripathi Department of Information Technology, National Institute of Technology Raipur, Raipur, India e-mail: [email protected] R. Tripathi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_5
53
54
A. Rajak and R. Tripathi
the growth of the communication systems and networks in terms of the number of users and the volume of generated traffic, including the storage and analysis of traffic data as well as the integration, validation, security, and acquisition of traffic data. The network complexity is amplified by the unprecedented growth in the data volume and the number of connected nodes, requiring continuous studies to examine and monitor networking performance. Furthermore, the availability of a huge and heterogeneous amount of traffic data necessitates adopting new approaches for monitoring and analyzing network management data [1, 2]. Due to these challenges, most studies focus specifically on monitoring and analyzing, e.g., traffic classification, anomaly detection, or QoS. Mobile service providers must consistently offer reliable and highquality Internet services to support the 5G mobile network. Once an unexpected failure happens in a domain on the core network, the influence of the failure will be rapidly spread all over the world since mutual operations among operators. Since only highly experienced operators tackle these network failures. In order to address unexpected failures in the core network, machine learning (ML) plays an important role [1]. AI techniques have developed quickly in recent years, and many new ML algorithms have been proposed and widely used in various domains [3, 4]. ML algorithms, such as ANN, Bayesian network, and SVM, have steadily turned out to be the alternative resolution for rapidly detecting a failure in next-generation networks. Machine learning-based network operation can operate efficiently and automatically as well as it will also reduce operational costs. However, as databases obtained from various sites have different failure scenarios, a single sophisticated ML algorithm might not be a consideration when building a detection model. Consequently, it is required to use ensemble methods in next-generation networks. Compared to single ML and other statistical approaches, ensemble methods are superior. Therefore, it is encouraging to promote its use for further application in the recent technology. In this paper, we have proposed a failure detection model using machine learning in the 5G core network in NFV-based test environment. In this study, we aimed to detect network failure using AI/ML techniques. The main contribution of this study is as follows: • The proposed failure detection model identifies the significant features that effectively analyze the parameters and contribute to detecting or classifying failures in the network. • We present a failure detection model using machine learning in the 5G core network. In the initial step, preprocessing is performed on the training data, which includes the steps, feature mapping, and normalization. • Subsequently, we have checked the rank of the features by applying the XGBoost and RF feature importance methods in order to rank the features according to their significance scores. • Finally, employ the three most popular ensemble learning-based machine learning models, light gradient boosting machine (LGBM), XGBoost, and RF, and comparative testing reveals that the XGBoost outperforms others in detecting or classifying network failures.
Analysis of Network Failure Detection Using Machine Learning in 5G …
55
The following section of this study is structured as follows. Section 2 describes a survey of the related work; Sect. 3 describes the proposed network failure detection model; Sect. 4 describes the dataset and the experimental result. In the end, Sect. 5 concludes this study.
2 Related Work In recent years, many failure detection systems have been proposed which are used to detect a failure in networks. An ML-based fault classification is put forth in [5], to analyze the root cause of failures in the network function virtualization environment. This method collects 41 attributes from the network environment and evaluates these attributes using machine learning algorithms: random forest, multilayer perceptron, and support vector machine, to detect three types of root causes including nodedown, CPU overload, and interface-down. In [1], authors have presented an efficient method to predict network failure from huge amounts of unstructured log files in a real-time scenario. Their proposed uses three steps: feature extraction, feature refinement, and feature reduction. They tested their work using six ML algorithms: decision tree (DT), XGBoost [6], LGBM [7], SVM, MLP, and RF. DT has achieved an accuracy of 80.95%, XGBoost has achieved an accuracy of 93.69%, LGBM has achieved an accuracy of 93.33%, SVM has achieved an accuracy of 79.05%, MLP has achieved an accuracy of 81.31%, and RF has shown an accuracy of 92.74%. Results demonstrated that XGBoost outperforms other in detecting failure. However, their study only focuses on extraction, refinement, and reduction of the features that they have not included any approach to select the optimized feature set. Similarly, [8] proposed a machine learning-based anomaly detection method to identify the root causes of resource utilization and service level agreement violations. This approach includes 25 attributes or features and machine learning approaches, such as extreme gradient boost (XGBoost), gradient boosting, random forest, and deep learning, to identify the root causes such as lack of memory and high CPU utilization. However, the balance between the accuracy and time complexity of the machine learning algorithm has not been taken into account in these studies. In [9], authors have proposed a failure detection in the 5GC network in the urban, middle, and rural areas. XGBoost and LGBM machine learning models were employed in their work. They have selected machine learning models based on classification accuracy. XGBoost model is selected as it gives better accuracy in all tasks. In [10], the authors compared the link fault detection capabilities of three machine learning models, including SVM, random forest, and MLP. The measures taken from the conventional traffic flow, including E2E delay, packet loss, and aggregate flow rate, are analyzed by the authors in order to build a three-stage ML for link fault identification and localization (ML-LFIL).
56
A. Rajak and R. Tripathi
3 Proposed Work In this section, the proposed failure detection model for the 5GC network is discussed. Figure 1 shows the proposed failure detection model. In the initial step, preprocessing is performed on the training data, which includes the feature mapping and normalization steps. Subsequently, we have checked the rank of the features using the XGBoost and RF feature importance methods to rank the features according to their significance scores and select only those features that are ranked by XGBoost instead of feature ranking obtained from the random forest approach. Consequently, dropping a feature that has no contribution to the models will not affect the performance of the models and will also help reduce the computation time taken by the models. Finally, we considered the top 20 features ranked by the XGBoost and applied machine learning models to evaluate the models’ performance. Finally, employ the three most popular ensemble learning-based machine learning models, LGBM, XGBoost, and RF, to detect and classify network failures.
Fig. 1 Proposed failure detection model for 5GC networks
Analysis of Network Failure Detection Using Machine Learning in 5G …
57
4 Experiment Results and Discussion This section discussed the performance of ML models, with results analyzed using various performance measures. This study is also compared with other existing work found in related literature. Google Colab is used as a simulation tool for this experiment. All the machine learning models have been trained on this platform. Google Colab provides an option to run the codes on a Python notebook without any configuration. To measure the performance of machine learning models, we have applied the different performance measures [1]: DR, PR Acc, and F1-score. Equations (1)–(4) specified the numbers of true positive (TP), true negative (TN), false negative (FN), and false positive (FP). PR (Precision) =
TP TP + FP TP TP + FN
(2)
2 ∗ recall ∗ precision recall + precision
(3)
DR (Detection Rate) = F1-score =
(1)
Acc (Accuracy) =
TP + TN TP + FP + TN + FN
(4)
The dataset [11] used for this study consists of data from three regions: the urban region (UR), the rural region (RR), and the middle region (MR). Our goal is to detect and classify the failure across the entire region. To maintain this aim, we combine these three regions of data into a single region and apply machine learning to detect the failure of this combined or single area. The dataset consists of 33 features and 268,849 instances. The three regions dataset contains 116,865 instances from the urban region, 116,881 instances from the rural region, and 35,103 instances from the middle region, respectively. We combine these instances and form a single region of data consisting of 268,849 instances. In the 5G core network, a dataset with labels indicating the type of network fault is used. The dataset includes six types (five failure types and one normal case), where cases are represented by labels: normal (0), bridge-delif {addif (1)}, interface-down {up (2)}, interface-loss-start {stop (3)}, memory-stress-start {stop (4)}, vcpu-overload-start {stop (5)}. The types 0, 1, 2, 3, 4, and 5 include the instances 252,044, 3534, 3381, 3356, 3276, and 3258, respectively. The complete whole-region dataset is divided into a testing (30%) and training (70%) set of the total instances, and the training samples are further split into a training and validation set. We have used RF [12], XGBoost, and LGBM for detecting and classifying network failure. Reducing the features allows us to decrease the cost of network monitoring in terms of network management. Therefore, endeavoring to reduce the number of features becomes crucial. XGBoost and random forest often have an inbuilt function that determines the significance of features. Figure 2 shows
58
A. Rajak and R. Tripathi
Fig. 2 Feature importance ranking by RF: left and XGBoost: right
the feature relevance bar graph plot based on XGBoost and random forest approaches. The features are sorted based on their relevancy. In both approaches, features F2, F1, F0, and F16 show relatively high importance compared to the other features. However, in terms of feature importance ranking (FIR), there is a large difference between XGBoost and RF results. For instance, F31, F27, and F29 have a higher rank in the results of the FIR of RF, while they have much lower rank features in the XGBoost approach. Some studies have shown that the FIR built-in function of random forest is unreliable and biased [13]. Considering that, we selected the features that are ranked by XGBoost instead of ranking features obtained from the random forest approach. Consequently, dropping features that have no contribution to the models will not affect the performance of the models and also help to reduce the computation time taken by the models. Finally, we took the top 20 features ranked by the XGBoost and used machine learning models to evaluate the performance of the ML models. Table 1 shows the results obtained by the RF, XGBoost, and LGBM models in terms of DR, Acc, PR, and F1-score. The top 20 important features get the best performance, which is 100% rate with type memory-stress-start {stop (4)}, vcpu-overload start {stop (5)} type, and 100% detection rate with normal (0) and achieve 98.39% accuracy with xGBoost. Figure 3 shows the confusion matrix of different machine learning models, and Fig. 4 shows the result comparison of LGBM, XGBoost, and RF models shown in (a), (b), and (c) respectively. And we also compared our work to some existing works and found that the proposed work outperformed other existing models. Table 2 compares the proposed work to some other already existing work.
Analysis of Network Failure Detection Using Machine Learning in 5G …
59
Table 1 Results comparison of RF, XGBoost, and LGBM models Models
Random forest
XGBoost
LGBM
Accuracy (%)
97.97
98.39
98.21
Labels
PR
DR
F1
PR
DR
F1
PR
DR
F1
Normal (0)
0.98
1.00
0.99
0.99
1.00
0.99
0.98
1.00
0.99
Bridge-delif(addif) (1)
0.89
0.32
0.47
0.84
0.54
0.66
0.81
0.50
0.62
Interface-down(up) (2)
1.00
0.96
0.98
1.00
0.95
0.98
0.99
0.95
0.97
Interface-loss-start(stop) (3)
0.91
0.15
0.26
0.81
0.34
0.48
0.70
0.27
0.39
Memory-stress-start(stop) (4)
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
vcpu-overload-start(stop) (5)
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
Table 1 shows that normal (0), failure types 4 and 5, and failure type 5 achieved a 100% detection rate
Fig. 3 Comparison of confusion matrix of different machine learning models
60
A. Rajak and R. Tripathi 1
Scale
Scale
1 0.5
0
0 0 Precision Recall F1-Score
0.5
1
2
3
4
0 Precision Recall F1-Score
5
Labels (a) LGBM
1
2
3
4
5
Labels (b) xGBoost
Scale
1 0.5 0 Precision Recall F1-Score
0
1
2
3
4
Labels
5 (c) RF
Fig. 4 Result comparison of ML models a LGBM, b XGBoost, and c RF, respectively
Table 2 Accuracy comparison with an existing model Authors
Year
Models
Accuracy (%)
Shota et al. [9]
2021
XGBoost
96.86
Our proposed work
2023
Random forest
97.97
2023
XGBoost
98.39
2023
LGBM
98.21
5 Conclusion Ensemble learning approaches are very popular and efficient machine learning tools for failure classification and detection problems. In this study, we have used an ensemble learning method-based machine learning approach to automatically detect network failure in the entire “abc” region of the 5G core networks. In the first step, data preprocessing is performed. Preprocessing includes the mapping of features that transform categorical variables into numeric ones and normalization. Then, to calculate relative feature importance, we applied the XGBoost and RF feature importance ranking approaches and finalized those features that were selected by XGBoost. Then, three machine learning models—RF, xGBoost, and LGBM—are applied and compared. After a comparative study, we reveal that XGBoost outperforms others in detecting network failure. Overall, experimental results show a reliable approach for detecting and classifying network failures. In our future work, we are aiming to compare different ML models with a different dataset.
Analysis of Network Failure Detection Using Machine Learning in 5G …
61
References 1. Fei X et al (2021) Analysis on route information failure in IP core networks by NFV-based test environment 2. Abbasi M, Shahraki A, Taherkordi A (2021) Deep learning for network traffic monitoring and analysis (NTMA): a survey. Comput Commun 170:19–41 3. Boutaba R et al (2018) A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. J Internet Serv Appl 9(1):1–99 4. Akter M et al (2020) Construing attacks of internet of things (IoT) and a prehensile intrusion detection system for anomaly detection using deep learning approach. In: International conference on innovative computing and communications. Springer, Singapore 5. Kawasaki J, Mouri G, Suzuki Y (2020) Comparative analysis of network fault classification using machine learning. In: IEEE/IFIP network operations and management symposium (NOMS). IEEE 6. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 7. Ke G et al (2017) LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30 8. Hong J et al (2020) Machine learning based SLA-aware VNF anomaly detection for virtual network management. In: 16th international conference on network and service management (CNSM). IEEE 9. Shota A (2021) https://github.com/ITU-AI-ML-in-5G-Challenge/ITU-ML5G-PS-015-Net work-failure-detection-in-5GC-team-YOTA-YOTA/blob/main/5G-challenge_presentation_sli des_yota-yota.pdf 10. Srinivasan SM, Truong-Huu T, Gurusamy M (2019) Machine learning-based link fault identification and localization in complex networks. IEEE Internet Things J 6(4):6556–6566 11. AI for good ITU. https://challenge.aiforgood.itu.int/match/matchitem/57 12. Shaik AB, Srinivasan S (2019) A brief survey on random forest ensembles in classification model. In: International conference on innovative computing and communications. Springer, Singapore 13. Strobl C et al (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8(1):1–21
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty, Reliable, and Timely Emergency Message Dissemination in VANET Mahabaleshwar Kabbur and M. Vinayaka Murthy
Abstract Recently, the demand of wireless communication systems has increased drastically. Several types of wireless communication systems have been adopted in real-time applications for safety and monitoring purpose. In this field of wireless communication, the ad-hoc network-based systems have gained huge attraction from research community due to their multiple advantages. Based on this assumption, vehicular-ad-hoc network is considered as promising research in the field of wireless communication to develop road-safety application for intelligent transport systems. However, achieving a desirable quality of service (QoS) is considered as prime objective of any research, and several schemes have been introduced in this field, but minimization of emergency message transmission delay remains a challenging issue. Hence, ensuring trust, reliability, and low latency for emergency message dissemination is a critical requirement in VANET in presence of problems like broadcast storming, fake message propagation, etc. In this paper, a self-organizing virtual backhaul for emergency messages is constructed as a multi-criteria optimization to ensure trust, reliability, and latency. The combination of VANET and LTE interface is used to propagate the emergency message over virtual backhaul. This virtual backhaul is constructed in terms of a delay optimized Steiner path connecting multiple road side unit (RSUs). An event confidence model based on cooperative observations is also proposed in this work to filter fake emergency messages from propagation at nearest source. Keywords VANET · Delay · Virtual backhaul · LTE
M. Kabbur (B) · M. Vinayaka Murthy School of Computer Science and Applications, REVA University, Bengaluru 64, India e-mail: [email protected] M. Vinayaka Murthy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_6
63
64
M. Kabbur and M. Vinayaka Murthy
1 Introduction Traffic fatalities could be reduced by dissemination of emergency safety messages in vehicular area network (VANET). Many emergency message propagation protocols have been proposed since VANET inception [1]. But emergency message dissemination in VANET faces three important challenges of trust, reliable delivery, and lower latency. The flooding method is the most adopted method for reliable delivery of emergency message. Flooding creates frequency rebroadcasts resulting in collision, contention, and redundancy referred as broadcast storm problem [2]. Many solutions have been proposed to reduce the broadcast storm problem by reducing the number of vehicles doing the rebroadcast. The existing methods, on reduction of number of rebroadcasts, could be grouped into five categories: probabilistic schemes, counterbased schemes, distance-based schemes, location-based schemes, and cluster-based schemes. Though these schemes could reduce the problems of collision, contention, and redundancy, multiple emergency messages generated in smaller time duration can still amplify the collision, contention, etc., and affect the reliable delivery of emergency messages. False emergency messages can be propagated by malicious users affecting the reliability of the network. Propagation of fake emergency messages creates a trust deficit factor in VANET, and these vehicles cannot ascertain the creditability of the messages. Some works have been proposed to establish trust between nodes [3]. These approaches calculate trust score for nodes based on their past behaviors. But it is very difficult to calculate reputation score in presence of spare scenarios. In addition, trust and reliable delivery, third important challenge is timely delivery of emergency messages. Congestion and contention resulting from broadcast or partial broadcast overload the link level queues and add latency in propagation of messages. Clustering-based approaches further add to the latency by propagation only through cluster heads. Though many solutions have been proposed to solve each of the challenges of trust, reliable delivery and lower latency in isolation, solutions with joint consideration of all these three challenges are very few in literature. Thus, this paper intends to address the above-mentioned problem and proposes a solution with joint consideration of all the three challenges of trust, reliable delivery, and lower latency. The proposed solution establishes a virtual self-organizing backhaul for emergency message dissemination in VANET. The self-organizing backhaul is built on multicriteria optimization of trust, reliability, and latency. The nodes in the backhaul are selected based on the multi-criteria optimization. This backhaul is designed to reach all the nodes in the network with least delay. Instead of flooding-based broadcast, a mix of unicast and broadcast is realized using the backhaul for timely, reliable, and trustworthy delivery of emergency messages. This mix has higher probability of unicast and lower probability of selective broadcast adapting to network dynamics.
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty …
65
2 Related Work Ullah et al. [4] proposed a trust-based vehicular social network to solve the problem of false message injection. A trust score is calculated for each node based on social utility, behavior, and contribution. Dissemination is done both using vehicle to infrastructure (V2I) and vehicle-to-vehicle mode (V2V). RSU infrastructure is used for authenticating the emergency message before it is disseminated. Considering the volume of vehicles and sparseness in their relation, reputation scores are updated with low frequency. So, it becomes erroneous to authenticate emergency message based on the reputation score alone, and the method has higher false alarms. Dua et al. [5] proposed a intelligent data dissemination protocol to mitigate the broadcast storm problem in VANET. In this approach, game theory is used for calculating the reliability of links. Paths are constructed on reliable links according to Dijisktra algorithm, and emergency message is routed on the reliable paths instead of broadcast. But this solution assumes uniform speed and trajectory known well in advance to construct the path. Also the method needs global view of nodes locations for calculating the routing path. Establishing global view involves higher communication complexity. Zouina et al. [6] proposed a combined use of unicast and multicast for emergency message dissemination. The message is first sent in unicast mode, and thereafter, broadcast is triggered sensing failure. But this solution performs like pure broadcast in presence of hidden node problems, and thus, it is not able to mitigate broadcast storm. Qiu et al. [7] proposed a spider Web like transmission mechanism for emergency message dissemination. This mechanism combines dynamic multi-priority queue management and restricted greedy forwarding strategy. Queue management is done to prioritize the packets, and forwarding is realized in using greedy forwarding strategy. The method assumes vehicle moving with constant speed in absence of it, and the reliability of routing is distorted. Ali et al. [8] combine clustering with position-based broadcasting to reduce the latency in delivery of emergency message. Vehicle moving in same direction with similar speed and in the vicinity is grouped to a cluster. The emergency messages are disseminated in unicast mode with target vehicle selection based on position and direction of movement of vehicle. But this method is based on the assumption of certain number of vehicles always crossing the intersection, and in absence, there is a huge latency and higher probability of message dissemination. Lee et al. [9] proposed a emergency message dissemination scheme assisted by cluster head in each cluster and the RSU. Overlapping clusters with rate constrains is found by solving it as a joint problem of beamforming and clustering. But the solution makes un-realistic assumption of vehicle movement and speed. Ucar et al. [10] proposed a clustering-based strategy for emergency message dissemination. Vehicles are clustering using a novel clustering algorithm called vehicular multi-hop algorithm for stable clustering (VMaSC). A relative mobility metric is calculated as the average relative speed with respect to neighboring vehicles is considered for selecting the cluster heads. Nodes select cluster heads-based distance. A mix of both unicast and multicast is used for emergency message dissemination over two interfaces of VANET and LTE. Using this dual architecture adds cost and
66
M. Kabbur and M. Vinayaka Murthy
redundant packet processing. Liu et al. [11] proposed a emergency data dissemination scheme based on clustering and probabilistic broadcasting (CPB). The vehicles are clustered based on the direction of movement and connection duration. Cluster member sends packet to cluster head with a calculated probability. Probabilistic forwarding is employed to disseminate the packets by the cluster head toward the transmission direction. Since same path is used for emergency and normal packet dissemination, QoS distortion can occur on emergency messages during peak traffic scenarios. Zhu et al. [12] proposed a hybrid emergency transmission architecture using software defined networks (SDNs). Emergency message is sent to RSU, and from here, it is sent to SDN controller. SDN controller forwards the message to relevant RSU to broadcast to vehicles. The cost of the solution increases with number of RSU as each RSU must be connected via connection overhaul to the SDN controller. Ullah et al. [13] compared different congestion avoidance schemes during emergency message dissemination. Schemes are analyzed in categories of transmission control, transmission power control, segmentation, and aggregation. Authors proposed a fog computing-based architecture to mitigate broadcast storm. Cell phone acts like RSU and transmit message to base station. Base station can propagate to VNET through RSU infrastructure. This way backhaul is established for emergency message propagation through cellular network. Paranjothi et al. [14] proposed a hybrid approach for emergency message propagation. Two different propagation modes are designed for obstacle shadowing regions and non-obstacle shadowing regions. Fog computing along with cellular infrastructure is used for message dissemination in obstacle shadowing regions, and multi-hop forwarding is used in non-obstacle shadowing regions. Relying on cellular infrastructure opens up network to further fake message dissemination. Tapia et al. [15] proposed a virtualization layer on top of TCP/IP stack to disseminate messages in VANET. The virtualization layer is based on deployments of virtual nodes (VN) in the network in different places, and their position is almost stable. Trust of nodes is managed through blockchain. Certificates are used to prevent tampering of emergency messages during transit. But this work is not standalone on VANET backbone alone and needed cellular backhaul for realizing trust management. Azzaoui et al. [16] proposed a dynamic clustering topology for emergency message dissemination. This solution assumes vehicles to be equipped with both DSRC and LTE cellular communication interface. Most stable nodes are selected as cluster heads. Cluster nodes to cluster heads communication are over DSRC channel. Inter-cluster head communication is over cellular interface. Dual interface opens the dissemination to fake and tampering risks. Costa et al. [17] proposed a emergency message dissemination protocol based on complex network metrics for urban VANET scenarios. Each vehicle maintains the information of 1 and 2 hop neighbors. Using this information, sub-graph is constructed. Each of the sub-graph is analyzed to find the best set of relay nodes based on two criteria centrality and degree centrality. The solution works only for slow moving urban VANET conditions, and overhead for frequent communication of sub-graphs for relay selection is high. Oliveira et al. [18] proposed an adaptive data dissemination solution to ensure higher reliability of emergency message delivery. Relay nodes are selected in the network based on local density and distance from neighboring nodes. Multi-hop
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty …
67
broadcast is carried out among the relay nodes alone to ensure reliable delivery of messages. By resting the number of number broadcasting, the broadcast storm in controlled in this method. However, the solution makes many un-realistic assumptions in terms of speed and neighborhood in finding the relay nodes. Mazouz et al. [19] proposed network coding scheme for emergency message dissemination. The emergency messages are combined with other messages and propagated in the network. Though this approach is able to reduce the collisions, the latency for delivery of emergency messages is high. Bujari et al. [20] proposed a fast multi-hop broadcast protocol to propagate emergency messages. The solution is distributed. Each vehicle dynamically estimates the transmission range and speeds up multi-hop propagation through broadcast according to the estimate. But the solution fails in high way scenarios. Gonzalez et al. [21] proposed a present delay broadcast protocol for a fast and reliable dissemination. The solution is a subset of existing schemes of like count-based, geographical, distance-based, and opportunistic emergency message dissemination. But the solution has higher retransmission and works only for dense scenarios.
3 Self-organizing Virtual Backhaul From the survey, the following observations are made. Observation 1 Most of the solutions based on vehicular infrastructure alone make certain assumption on speed, density, etc. But solution involving cellular infrastructure backhaul is able to work beyond these assumptions. Thus, cellular infrastructure appears to be promising alternative to cover up transmission in very low vehicular density scenarios. On the other hand, using up cellular infrastructure brings higher risk of fake messages and message tampering, which is not considered in existing cellular infrastructure backhaul solutions. Observation 2 Cutting down the number of broadcasts reduces the broadcast storm problem, but it also affects the reliability of delivery. Thus, there must be a balance in use of broadcasts and unicast to reduce the broadcast storm problem without affecting the reliability. Observation 3 There are not much solutions addressing joint consideration of trust, reliability, and low latency in emergency message dissemination. But this joint consideration is needed in era increased network attacks and network traffic congestions.
68
M. Kabbur and M. Vinayaka Murthy
4 Proposed Solution The solution designed in this work is built on these three observations, and a proposed solution has been designed for infrastructure-based VANET network model shown in Fig. 1. Each vehicle could be communicated with one another and to RSU via IEEE 802.11p wireless standard. RSUs are equipped with LTE interface for communication with other RSU but always try to use this LTE interface as last resort and always attempt to communicate via carrier vehicles using IEEE 802.11p wireless standard. Transmission of LTE interface is least preferred due to cost and security factor. The position of RSUs is assumed to be known beforehand. A self-organizing virtual backhaul is constructed in the VANET network with vehicles, or RSU can be participant nodes in this virtual backhaul. The virtual backhaul is constructed with a joint consideration of trust, reliability, and delay. The flow diagram of proposed solution is shown in Fig. 2. The process of construction of the virtual backhaul starts taking the set P on n RSUs, and a path is constructed to connect all the RSUs with minimal total length of path. Once the path length is minimized, the delay for packet dissemination is also minimized. The problem of finding the path connecting n RSUs with minimal path length is solved as Steiner minimal tree (SMT) problem. It is solved as Steiner tree rather than minimal spanning tree due to the feature of introduction of intermediate junctions other than RSU to provide more degree of freedom in path construction provided by Steiner tree. Given a set P on n RSUs, Steiner minimal tree is constructed with S Steiner points in a way that minimum spanning tree (MST) cost over P ∪ S is minimized. The solution for Steiner minimal tree problem is found using graph iterated 1-Steiner (GI1S) algorithm with KMP heuristics [22]. In GI1S, a weighted
Fig. 1 VANET network model with LTE interface
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty …
69
Fig. 2 Flow diagram of proposed solution
graph G is created with V RSUs. The edge weight between any two RSU (A and B) is given as the delay cost. Delay is modeled as probability mass function of delay distribution as, ⎧∞ ⎪ ⎪ f i (a) · f i (b), x =0 ⎨ Delay = i=0 ∞ ∞ ⎪ ⎪ f i (a) · f 2x+i (b) + f i (b) · f 2x+i (a), x > 0 ⎩ i=0
i=0
where a and b are forward and backward directions from transmitter to receiver, and f (z) is the probability mass function of delay of direction z. GI1S finds a set S of potential Steiner points such that, KMB(N , S) = cost(KMB(N )) − cost(KMB(N ∪ S)) where N is the subset of V RSUs. The output of GI1S algorithm is the Steiner points connecting all the V RSUs with minimal path length. An example of Steiner points and path through Steiner points for a sample RSU deployment returned by GI1S algorithm is given in Fig. 3. The path connecting RSU to Steiner points and between Steiner points is the virtual backhaul for EM dissemination. This virtual backhaul is frequently reconstructed based on the observation of delay over a period of time. RSU embeds the location of the Steiner points near its vicinity and sends in the beacon packets. Vehicles check
70
M. Kabbur and M. Vinayaka Murthy
Fig. 3 Steiner points for a sample RSU deployment
these beacons, and if it finds location is very near to the Steiner points, it does 1 hop broadcasts of any of EM packets it has. The EM message is propagated via 1-hop broadcast along the virtual backhaul. Any RSU receiving the EM broadcast, rebroadcasts in its coverage area to reach to all vehicles. Steiner broadcasts sometimes cannot reach to next hop far away Steiner points. In that case, when a vehicle approaches a nearby Steiner point, broadcasts could happen again. When a vehicle does not move in that trajectory, then reliability of EM dissemination is affected. RSU solves this, by forwarding the EM in LTE interface to the RSU near to that Steiner point. By this way, LTE interface is used by RSU only to reduce the delay in EM dissemination. In this scenario of LTE use, there is a high risk of message tampering due to higher resource capability available with attackers on internet end. RSU adds its digital signature the EM message and sends over LTE interface to other RSU near the Steiner point. The receiving RSU verifies the RSU digital signature before taking that message to further processing. With the use of selective LTE forwarding to compensate for separated Steiner points, a delay in EM propagation is reduced with least costing due to LTE usage. The virtual backhaul must be re-organized over a certain time period based on the past observation of delay experienced between the RSUs. This re-organization must be done to provide lower delay for EM message dissemination. By deciding the number of Steiner points and distribution of the Steiner points based on past delay observations, a low latency path for EM dissemination is established. This re-organization is facilitated by measuring the delay over the period of time between RSU whenever the EM message is received. From this delay, the weighted graph is constructed, and GI1S is invoked to find the new virtual backhaul. Trust is handled as the authenticity of emergency messages (EM) and an ability to filter it as a false alarm before it gets propagated in the network. Event confidence model adapted from [23] is used to detect the authenticity of the EM. Differing from earlier works of depending on reputation score based on past behavior, this work model trust of any EM in terms of confidence level of it as seen by the nearest
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty …
71
Fig. 4 Structure of event confidence model
neighbors of event sources. The nearest sources are the vehicles in vicinity of event origin till the virtual backhaul connection point as shown above in Fig. 4. In the figure above, EM is generated by source S1, and its confidence is calculated from time to where it is originated till time t f , till it reaches the backhaul connector for propagation along the virtual backhaul. Let the measure of agreement among observers at time t is represented as τ (t). It is given as τ (t) = {γik (t)}, −1 ≤ γik (t) ≤ 1. γik is the agreement co-efficient between the event observers k and i at time instant t. The agreement co-efficient is computed iteratively averaging past value with current observation as γik (t) =
1 (1 − 2 × abs( pi (t) − pk (t))) + γik (t − 1) 2 pi (t) = P(E t |Mi ) pk (t) = P(E t |Mk )
pi (t), pk (t) are the individual probabilities of occurrences of event E based on event sources Mi and Mk , respectively, at time instant t ≥ 1. At time t = 0, it is given as γik (0) =
1 [(1 − 2 × abs( pi (0) − pk (0)))] 2
72
M. Kabbur and M. Vinayaka Murthy
The probabilities represent the decision about the events. The value γik (t) is 1 when there is full agreement and γik (t) is − 1 when there is complete disagreement. The value of γik (t) is checked at time t f by the backhaul connector node, and if the γik (t) is greater than a threshold, it is accepted as creditable EM message and taken for propagation on virtual backhaul, else it is dropped as in creditable EM message. The proposed solution has following contributions. • An event confidence model to check the authenticity of the EM in earlier stage before it is propagated and creates ill effects on VANET. • Construction of self-organizing virtual backhaul for EM dissemination with higher reliability and lower latency. • Though the proposed solution uses a hybrid interface of IEEE 802.11p and LTE, LTE interface is used only at RSU, and it is used minimally to compensate for predicted time delay in EM dissemination. The EM is secured from false injection attacks using RSU digital signature while using the LTE interface. The flow of the proposed system is given in Fig. 5. The EM message sent by vehicle is first checked for credibility using the proposed event confidence model. False EM messages detected by event confidence model are dropped. True EM messages are forwarded by any node in the virtual backhaul. The EM message is propagated along the virtual backhaul either in VANET mode or LTE mode. When predicted delay to next hop is higher than threshold, LTE mode is selected, for forwarding to next hop, otherwise VANET mode is selected for forwarding to next hop. When message is propagated via LTE mode, digital signature is added to the message to detect any tampering in the network end. The summary of how proposed solution is different from some of the recent existing research works which is presented in Table 1.
5 Results The proposed solution is simulated in NS2.34 simulator with SUMO [24] for vehicle mobile pattern generation. Simulation is conducted with traffic pattern given in Fig. 6. The simulation was conducted with following simulation parameters shown in Table 2. The performance is measured in terms of packet delivery ratio (PDR) and average delay for different node density and speed. The performance of the proposed solution is compared against EEMDS proposed by Ullah et al. [25] and virtual layer solution proposed by Tapia et al. [15].
5.1 Packet Delivery Ratio by Varying the Node Density The PDR is measured for various node densities, and the result is given in Fig. 7.
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty …
73
EM
Event Confidence model
Crediable ?
N
Drop message
Y
Forward to nearest Virtual backhaul endpoint
Delay to next hop in Virtual backhaul > Threshold
Y
N
Add digital singature to message Y Forward to next hop in Virtual backhaul over LTE interface
Virtual backhaul through vehicle Forward to next hop in
Verify digital signature at next hop
N
Drop message
Valid signature ?
Fig. 5 Flowchart of proposed system
It could be observed that the increasing density has a positive impact on the performance of TBEMD, DVCAST, and EEMDS. The reason is that when the number of nodes increases, the network connectivity increases, which increase the successful delivery of the packets among the nodes. However, as the network becomes denser, the transmission of packets increases, which results in higher congestion and packet drops. However, the proposed solution always had better PDR compared to existing solution even at higher node density due to the use of only 1-hop broadcast and especially at Steiner points. Since the Steiner points are well controlled, the collision and congestion are also better controlled in the proposed solution. Due to this, PDR has increased in the proposed solution. The average PDR in proposed solution is at least 2% higher compared to EEMDS and 8% higher compared to virtual layer solution.
74
M. Kabbur and M. Vinayaka Murthy
Table 1 Comparison of existing works with proposed solution Author
Solution
Value adds of proposed solution
Ullah et al. [4]
Considered trust of EM message using difficult to Implemented trust of EM as realize vehicular social network architecture cooperative confidence model using neighborhood observations
Due et al. [5]
Considered reliability of EM message by selecting the reliable links in path with assumption of traffic dynamics
The proposed solution is able to ensure reliability using virtual backhaul without any assumption on traffic dynamics
Qiu et al. [7] Considered reliability of EM transmission with queue management and greed forwards with assumption of traffic dynamics
The proposed solution is able to ensure reliability using virtual backhaul without any assumption on traffic dynamics
Lee et al. [9] Considered reliability using clustering-based topology with overhead on vehicles for cluster formation
Overhead is only on RSU for virtual backhaul construction
Liu et al. [11]
Clustering-based probabilistic broadcasting to reduce broadcast storm problem
Only vehicles approaching Steiner point can do 1 hop broadcast in proposed solution
Ullah et al. [13]
Cell phones of vehicle owner use LTE interface for reliable transmission of EM. But it is less secure and costly
Proposed solution use LTE interface minimally only at RSU and uses in a secure mode
Paranjothi et al. [14]
Used hybrid mode to ensure reliability with use of both LTE and IEEE 802.11p interface depending on over-shadowing. But there is no minimal use guarantee and security for LTE interface
Proposed solution use LTE interface minimally only at RSU and uses in a secure mode
Tapia et al. [15]
Considered reliability by placing of designated virtual nodes in the network
Provided reliability without any of use of special nodes
Azzaoui et al. [16]
Considered reliability with use of hybrid interface Used hybrid interface only at of LTE and IEEE 802.11p in each vehicle RSU
da Costa et al. [17]
Considered reliable path construction using graph Graph theory is used for theory for slow moving traffic virtual path construction without any assumption on traffic dynamics
Oliveira et al. [18]
Considered reliability of EM with use of multi-hop broadcast
Bujari et al. [20]
Speed up based on transmission range adjustment Speed up based on construction of minimal latency virtual backhaul with minimal LTE interface use
Used only 1-hop broadcast for reliable delivery of EM
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty …
75
Fig. 6 Traffic pattern in NS2 simulator
Table 2 NS2 simulation parameters
Parameters
Values
Propagation model
Two way ground
Mobility model
Krauss
Transmission range
300 m
Transmission power
20 mW
Simulation area
4000 m * 4000 m
Simulation time
500 s
EM size
170 bytes
Speed
20–100 kmph
Density
25–150 km
5.2 Packet Delivery Ratio by Varying the Speed The PDR is measured for various speed of the vehicle, and the result is given in Fig. 8. As the speed increases, the PDR drops. But the PDR only drops from 93 to 90% (3%) for speed from 20 to 100 km/h in proposed solution. The drop in EEMDS is from 92% at 20 km/h to 86% at 100 km/h (6%). The drop in virtual layer is from 84% at 20 km/h to 80% at 100 km/h (4%). The proposed solution has better resistance to PDR change for variations in the speed.
76
M. Kabbur and M. Vinayaka Murthy
Fig. 7 Node density versus PDR
Fig. 8 Speed versus PDR
5.3 Average Delay The average delay is measured for various node densities, and the result is given in Fig. 9. As the node density increases, the number of redundant transmission increases, and this creates a congestion and increases the average latency for packet delivery. But the delay is lower in proposed solution due to use of only 1-hop broadcast and propagation only along the virtual path.
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty …
77
Fig. 9 Node density versus delay
5.4 Average Cost The cost due to use of LTE interface is measured in the proposed solution in terms of number of bytes transferred and compared against hybrid solution proposed by Azzaoui et al. [16] for different node densities, and the result is given in Fig. 10. The average cost is 32% lower in proposed solution compared to the hybrid solution proposed by Azzaoui et al. [16]. It is because that LTE is used only for delay reduction in far separated Steiner points in the proposed solution.
Fig. 10 Node density versus cost
78
M. Kabbur and M. Vinayaka Murthy
Fig. 11 Fake message detection accuracy
5.5 Fake Message Detection Accuracy The performance of the proposed event confidentiality model in detecting the fake message is measured for different vehicular density per 1 km, and the result is given in Fig. 11. The accuracy improves with increase in vehicular density, as proposed event confidentiality model results are dependent more on observations from multiple vehicles.
5.6 Stability of Virtual Backhaul The stability is measured for various vehicle densities, and the result is given in Fig. 12. The stability of virtual backhaul is defined as stb = 1 −
Number of forwards on LTE mode Total number of forwards on virtul backhaul
From the results, it could see that as the vehicle density increases, the stability increases, and a greater number of forwards are on VANET end instead of LTE. Reduction in LTE mode usage also reduces the security risks due to attacks on Internet end.
MVR Delay: Establishing Self-organizing Virtual Backhaul for Trusty …
79
Fig. 12 Virtual backhaul stability
6 Conclusion In this proposed work, a self-organizing virtual backhaul for emergency message dissemination is designed with joint consideration of trust, reliability, and latency. Virtual backhaul is constructed with knowledge of position of RSUs and past delay distribution between RSUs. Vehicles are used as the main channel for emergency message dissipation most of the times and in areas where higher delay is predicted, LTE interface is used securely. The proposed solution does not make any assumptions related to traffic dynamics. It provides higher PDR and lower delay compared to existing solutions even at higher speeds. The event confidence model in the proposed solution is able effectively to detect fake EM message at 94% accuracy even at lower vehicular density of 25/km. The proposed solution is also able to maintain a good stability of virtual backhaul at 0.76 for vehicle density of 50/km. The usage of LTE interface in the proposed solution gives better performance but its little bit cost effective to upgrade VANET from old technology architecture.
References 1. Latif S, Mahfooz S, Jan B, Ahmad N, Cao Y, Asif M (2018) A comparative study of scenariodriven multi-hop broadcast protocols for VANETs. Veh Commun 12:88–109 2. Wu J, Lu H, Xiang Y, Wang F, Li H (2022) SATMAC: self-adaptive TDMA-based MAC protocol for VANETs. IEEE Trans Intell Transp Syst 23(11):21712–21728 3. Li W, Song H (2016) Art: an attack-resistant trust management scheme for securing vehicular ad hoc networks. IEEE Trans Intell Transp Syst 17(4):960–969 4. Ullah N, Kong X, Tolba A, Alrashoud M (2020) Emergency warning messages dissemination in vehicular social networks: a trust-based scheme. Veh Commun 100199. https://doi.org/10. 1016/j.vehcom.100199 5. Dua A, Kumar N, Bawa S (2017) Reidd: reliability-aware intelligent data dissemination protocol for broadcast storm problem in vehicular ad hoc networks. Telecommun Syst 64(3):439–458
80
M. Kabbur and M. Vinayaka Murthy
6. Zouina D, Moussaoui S, Haouari N, Delhoum M (2012) An efficient emergency message dissemination protocol in a VANET. Commun Comput Inf Sci 293:459–469 7. Qiu T, Wang X, Chen C, Atiquzzaman M, Liu L (2018) TMED: a Spider-Web-like transmission mechanism for emergency data in vehicular ad hoc networks. IEEE Trans Veh Technol 67(9):8682–8694 8. Ali M, Malik AW, Rahman AU, Iqbal S, Hamayun MM (2019) Position-based emergency message dissemination for Internet of Vehicles. Int Sens Netw 15(7):Art. no. 155014771986158 9. Lee Y-W, Chien F-T (2019) Vehicles clustering for low-latency message dissemination in VANET. In: 2019 IEEE 4th international conference on computer and communication systems (ICCCS), pp 644–649 10. Ucar S, Ergen SC, Ozkasap O (2016) Multihop-cluster-based IEEE 802.11p and LTE hybrid architecture for VANET safety message dissemination. IEEE Trans Veh Technol 65(4):2621– 2636 11. Liu L, Chen C, Qiu T, Zhang M, Li S, Zhou B (2018) A data dissemination scheme based on clustering and probabilistic broadcasting in VANETs. Veh Commun 13:78–88 12. Zhu W, Gao D, Zhao W, Zhang H, Chiang H-P (2017) SDN enabled hybrid emergency message transmission architecture in Internet of-Vehicles. Enterpr Inf Syst 12(4):471–491 13. Ullah A, Yaqoob S, Imran M, Ning H (2019) Emergency message dissemination schemes based on congestion avoidance in VANET and vehicular FoG computing. IEEE Access 7:1570–1585 14. Paranjothi A, Tanik U, Wang Y, Khan MS (2019) Hybrid-Vehfog: a robust approach for reliable dissemination of critical messages in connected vehicles. Trans Emerg Telecommun Technol 30(6):e3595 15. Vintimilla Tapia P, Bravo-Torres J, López-Nores M, Gallegos P, Ordóñez-Morales E, Cabrer M (2020) VaNet Chain: a framework for trustworthy exchanges of information in VANETs based on blockchain and a virtualization layer. Appl Sci 10:7930. https://doi.org/10.3390/app 1021.7930 16. Azzaoui N, Korichi A, Brik B, Fekair MA (2021) Towards optimal dissemination of emergency messages in internet of vehicles: a dynamic clustering-based approach. Electronics 17. da Costa JBD, de Souza AM, Rosário D et al (2019) Efficient data dissemination protocol based on complex networks metrics for urban vehicular networks. J Internet Serv Appl 10:15 18. Oliveira R, Montez C, Boukerche A, Wangham MS (2017) Reliable data dissemination protocol for VANET traffic safety applications. Ad Hoc Netw 63:30–44 19. Mazouz A, Semchedine F, Zitouni R (2020) Enhancing emergency messages dissemination in vehicular networks using network coding. Wirel Pers Commun 113. https://doi.org/10.1007/ s11277-020-07318-x 20. Bujari A, Gottardo J, Palazzi CE, Ronzani D (2019) Message dissemination in urban IoV. In: Proceedings of the 23rd IEEE/ACM international symposium on distributed simulation and real time applications (DS-RT ‘19). IEEE Press, pp 211–214 21. Gonzalez S, Ramos V (2016) Preset delay broadcast: a protocol for fast information dissemination in vehicular ad hoc networks (VANETs). J Wirel Commun Netw 2016:117 22. Kou L, Markowsky G, Berman L (1981) A fast algorithm for Steiner trees. Acta Informatica 15:141–145 23. Atrey PK, Kankanhalli MS, El Saddik A (2007) Confidence building among correlated streams in multimedia surveillance systems. In: Proceedings of the 13th international conference on multimedia modeling, vol Part II (MMM’07). Springer, Berlin, Heidelberg, pp 155–164 24. Krajzewicz D, Erdmann J, Behrisch M, Bieker L (2012) Recent development and applications of SUMO—simulation of urban mobility. Int J Adv Syst Meas 2012(5):128–138 25. Ullah S, Abbas G, Waqas M, Abbas ZH, Tu S, Hameed IA (2021) EEMDS: an effective emergency message dissemination scheme for urban VANETs. Sensors 21(5):1588
Machine Learning Algorithms for Prediction of Mobile Phone Prices Jinsi Jose , Vinesh Raj, Sweana Vakkayil Seaban, and Deepa V. Jose
Abstract The drastic growth of technology helps us to reduce the man work in our day-to-day life. Especially mobile technology has a vital role in all areas of our lives today. This work focused on a data-driven method to estimate the price of a new smartphone by utilizing historical data on smartphone pricing, and key feature sets to build a model. Our goal was to forecast the cost of the phone by using a dataset with 21 characteristics related to price prediction. Logistic regression (LR), decision tree (DT), support vector machine (SVM), Naive Bayes algorithm (NB), K-nearest neighbor (KNN) algorithm, XGBoost, and AdaBoost are only a few of the popular machine learning techniques used for the prediction. The support vector machine achieved the highest accuracy (97%) compared to the other four classifiers we tested. K-nearest neighbor’s 94% accuracy was close to that of the support vector machine. Keywords Mobile phone · Phone price prediction · Machine learning · Support vector machine · K-nearest neighbor · Price range
1 Introduction An electronic gadget that is portable and linked to a cellular network is referred to as a “mobile phone.” In 1973, Motorola created the first portable cell phone. According to Cisco Annual Internet Report (2018–2023), over 70% of the global population will have mobile connectivity by 2023. The total number of global mobile subscribers will grow from 5.1 billion (66% of the population) in 2018 to 5.7 billion (71% of the population) by 2023 [1]. Mobile phones are designed to allow people to communicate via phone and email. In addition to making phone calls and sending emails, people J. Jose (B) · V. Raj · S. V. Seaban Department of Computer Science, Rajagiri College of Social Sciences, Kalamassery, India e-mail: [email protected] D. V. Jose Department of Computer Science, Christ University, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_7
81
82
J. Jose et al.
can now access the Internet, send text messages, and play games. Mobile phones are equipped with a variety of technologies and features. The most recent mobile phone technologies are WIFI, high-quality cameras, more cores, and large memory capacities. The performance of the mobile phone varies depending on the features used. As performance improves, so does the price range, which shifts from low to high. The primary and foremost aim of the work is to predict the mobile phone price based on attributes considered as the specification for mobile phone usage. It was the right approach to predict the future prices of smartphones accurately. Customers and business owners benefit from this information when purchasing a phone. Predicting the pricing will allow consumers to make more informed judgments when selecting a new phone. A few features considered the smartphone’s features are display, processing power, memory, camera, thickness, battery life, and connectivity. A commodity’s intrinsic value is often misunderstood. Inadequate tools for cost–benefit analysis lead to poor decision-making. In today’s world, being without a smartphone is practically impossible. In an earlier technological revolution, mobile phones were used only for communicating with others during our mobility. The first mobile phone was Motorola DynaTAC 8000X, launched in 1973 by Dr Martin Cooper [2, 3]. The first product needed ten hours to charge and weigh a kilogram. In the present scenario, from this basic structure of the functioning mobile phones have a vital role in everybody’s life worldwide. The outcome of the momentous development of technology leads to more involvement of mobile phones in our day-to-day life. Because it is a portable device capable of bringing everything to our fingertips in a fraction of a second. The mobile phone has various applications, such as education, business, banking, and entertainment. Even though the mobile phone has various applications, the features are very important in pricing [4]. Machine learning (ML) is a subset of artificial intelligence capable of performing as human intelligence. Artificial intelligence systems are used to solve complex tasks efficiently. Machine learning methods adopted various methods based on learning capabilities to learn and analyze the problem. The different machine learning techniques are supervised, unsupervised, semi-supervised, and reinforcement learning to solve real-world problems [5, 6]. This article focuses on predicting the price range of mobile phones using twenty different attributes of a dataset that are features of various mobile phones used worldwide. According to the mobile phone properties, a phone is classified into one of four price ranges ranging from zero to three. Where zero denotes a low-budget mobile, one denotes an upper middle-budget mobile, two denotes a middle-budget mobile, and three denotes a high-budget mobile. The structure of the paper has organized as follows: features and relevance of machine learning techniques in mobile phone price prediction and various methods implemented by various researchers are given in Sect. 2. Section 3 gives an idea about the implemented methodologies and describes the different model building, followed by results and discussion in Sect. 4. Finally, Sect. 5 concludes the paper.
Machine Learning Algorithms for Prediction of Mobile Phone Prices
83
2 Literature Review Analyzing the previous data and predicting the future of the upcoming product is unavoidable in every machine learning research. The researchers worked on different machine learning algorithms for mobile price prediction based on feature selection methods [7]. This work identified a better feature selection algorithm and good classifier to get higher accuracy. From the comparison, the result can be concluded that the decision tree (DT) classifier achieved 87% of maximum accuracy. Another study carried out by I. Nasser et. al. predicted the mobile phone price range by using artificial neural networks (ANNs) [8]. After the training and validation, the model yielded an accuracy of 96.31%. Another study explained the prediction price using three classifiers: random logistic regression and SVM [9]. In terms of accuracy, researchers concluded was the best classifier with 81% accuracy. A study was carried out by P. Arora et. al. on a prediction model using the WEKA tool [10]. The researchers implemented ZeroR algorithm, Naïve Bayes (NB) algorithm, and J48 decision tree algorithm. The results have shown J48 decision tree algorithm achieved better accuracy. Another work done is developing machine learning models for the prediction of new mobile phone prices by using support vector machine (SVM), random forest (RF) classifier, and logistic regression (LG) [11]. By the analysis of the result understood that SVM achieved a high accuracy with 97% rather than the other two classifiers. The researchers K. Karur and K. Balaje presented K-nearest neighbor (KNN) for predicting mobile phone prices [12]. In this work, researchers were focused on feature selection and based on ram size decided the phone’s price range. The researchers implemented six machine learning algorithms for price prediction. The researchers used the ANOVA f-test for the feature selection, and the linear support vector machine (SVC) yielded high accuracy in price prediction [13]. Another study used supervised machine learning algorithms for price prediction [14]. The researchers considered the confusion matrix and accuracy as the evaluation metrics. Compared to other supervised classifiers, linear discriminant analysis (LDA) achieved high accuracy with 95%. In another study, researchers focused on a hybrid model for mobile price prediction. The authors implemented the decision tree and random forest method and achieved 83% and 84% accuracy, respectively [15].
3 Materials and Methods 3.1 Dataset The dataset was about mobile prices across different areas of the world. It contains 21 attributes. The attributes are the details of mobile phones like battery power, internal memory, ram capacity, price range, and all. The source of the dataset is Kaggle. Every attribute in the dataset has been used to classify the data. Ram indicates the
84
J. Jose et al.
Fig. 1 Correlation of attributes
ram capacity of the mobile phone, and the price range varies in four ranges from zero to three. We are classifying the price range by considering other attributes, and we will be predicting the price range after the training of the dataset. Figure 1 shows the correlation between features in the dataset.
3.2 Preprocessing Data preprocessing is the method by which the raw data is into a robust, understandable format. Data in the raw format is frequently inconsistently formatted, contains human errors, and may be incomplete. Such issues are resolved by data preprocessing, which makes datasets completer and more efficient for data analysis. It is an important step that can influence the success of projects involving data mining and machine learning. It speeds up knowledge discovery from datasets and may eventually affect the performance of machine learning models. After preprocessing, the above dataset was divided into train and test data. Train data and test data are in the ratio of 80:20.
Machine Learning Algorithms for Prediction of Mobile Phone Prices
85
3.3 Model Building Seven machine learning algorithms obtained the price range prediction from the given attribute: AdaBoost, decision tree, K-nearest neighbor, logistic regression, Naïve Bayes, support vector machine, and XGBoost in data mining. The models are evaluated using accuracy, precision, recall, and F1-score. AdaBoost. It is also known as the adaptive boosting algorithm. It follows an ensemble learning methodology, building a robust classifier from weak ones. In this model, iteratively build models until the model becomes free of errors. To improve classifier accuracy, the AdaBoost classifier combines many classifiers. The AdaBoost classifier combines several weak classifiers to create a strong, accurate classifier. It has a 57% accuracy rate. Table 1 gives the evaluation metrics of the AdaBoost algorithm. Decision Tree Classifier. The decision tree follows a supervised machine learning approach. This algorithm solves problems using tree representation. Each leaf node is a class label in this representation, and the internal node represents attributes. Decision nodes are the points at which data is split. This approach can be used to solve classification and regression problems. This algorithm aims to create a training model that can predict the class range of a target variable with the aid of simple decision rules learned from training data. The accuracy after predicting the price range is 82%. Table 2 shows the evaluation metrics of the decision tree classifier. K-Nearest Neighbors Algorithm. A convenient supervised machine learning classification approach is K-nearest neighbors. It categorizes a data point based on its nearness to its neighbors. Parameter tuning is the technique of choosing an appropriate value for K. The principle of the KNN algorithm is that a new data point falls into the class of points to which it is close. There is no specific method for determining the optimal K value. It is determined by the type of problem at hand and the business scenario. Five is the most preferred value for K. Choosing a K value of one or two can be noisy and result in outliers in the model, resulting in overfitting. The algorithm performs well on the training set compared to its true performance on unseen test data. To predict the values after fitting the trained data and running, the algorithm yielded 94% accuracy. Table 3 shows the KNN classification report.
Table 1 Classification report of AdaBoost Evaluation metrics
Accuracy
Precision
Recall
F1-score
Percentage
57
69
59
55
Table 2 Decision tree classifier Evaluation metrics
Accuracy
Precision
Recall
F1-score
Percentage
82
88
90
89
86
J. Jose et al.
Table 3 Classification report of KNN Evaluation metrics
Accuracy
Precision
Recall
F1-score
Percentage
94
97
94
95
Logistic Regression. A fundamental classification technique is logistic regression. It is associated with polynomials and linear regression and is a member of the linear classifier family. The logistic regression findings are simple to interpret, and the process is rapid and simple to comprehend. It may be used to solve multiclass issues even though it is a binary classification approach. The accuracy of the prediction was 63%. Table 4 given the evaluation metrics of the logistic regression. Naïve Bayes. Instead of supplying a test point’s label, the Naive Bayes classifier algorithm returns the likelihood that it belongs to a certain class. This model is one of the simplest Bayesian network models. However, it can attain better levels of accuracy when paired with kernel density estimation. This method is exclusively appropriate for classification jobs, unlike many other ML algorithms, which can often handle both regression and classification tasks. The naïve Bayes method is so-called because it is practically hard to establish evidence for its assumptions in empirical data. To get the sum of the component probabilities, conditional probability is used. The accuracy calculated by Naïve Bayes is 80%. Table 5 shows the evaluation metrics of Naïve Bayes algorithm. Support Vector Machine. Support vector machine is an old, well-known, and sophisticated algorithm. The SVM classifier is widely regarded as one of the best linear and nonlinear binary classifiers available. SVM regressors are also becoming popular as an alternative to traditional regression algorithms like linear regression. In N-dimensional space, the SVM method seeks a hyperplane that can distinguish between data points (N—the number of features). Various hyperplanes are used to separate two classes of data points. The largest margin or distance between data points from both classes is what we are looking for in a plane. Increasing the margin distance, makes it possible to classify next data points with more assurance. Table 6 given the classification of SVM.
Table 4 Evaluation metrics of logistic regression Evaluation metrics
Accuracy
Precision
Recall
F1-score
Percentage
63
69
73
71
Table 5 Evaluation metrics of Naïve Bayes Evaluation metrics
Accuracy
Precision
Recall
F1-score
Percentage
80
88
81
85
Machine Learning Algorithms for Prediction of Mobile Phone Prices
87
Table 6 Classification of SVM Evaluation metrics
Accuracy
Precision
Recall
F1-score
Percentage
97
98
99
99
Table 7 Classification of XGBoost Evaluation metrics
Accuracy
Precision
Recall
F1-score
Percentage
90
94
90
92
XGBoost. XGBoosting stands for extreme gradient boosting, in which each predictor corrects its predecessor’s error. Here, decision trees are created in sequential form. It falls under boosting ensemble learning. Artificial neural networks often outperform all other algorithms or frameworks for unstructured data prediction issues. 89.5% accuracy is acquired via XGBoost. Table 7 shows the classification report of XGBoost algorithm.
4 Experimental Results and Discussion We used several algorithms to make our price range predictions, including the support vector machine, the decision tree, naïve Bayes, the K-nearest neighbor, and logistic regression algorithms. The most accurate prediction was made by the support vector machine method (97%) and the K-nearest algorithm (94%). Predictions of future prices were also made using XGBoost (89.5%), decision tree (82%), a naive Bayes model (80%), logistic regression (63%), and AdaBoost (57%). The performance of the given algorithms is shown in Fig. 2. Fig. 2 Accuracy comparison of different models
Comparison of Accuracy 120 100
80
80 60 40 20 0
57
63
82
89.5
94
97
88
J. Jose et al.
Therefore, the support vector machine algorithm can be considered the most effective for this task. This algorithm considers a wide variety of mobile phone characteristics, including the storage capacity, the number of processor cores, the battery life, and so on, to make an accurate price prediction. After achieving a 98% accuracy rate on the training data and a 97% accuracy rate on the test data, we can confidently declare that our method is effective.
5 Conclusion The strategies utilized in this article to estimate the price range include the support vector machine, K-nearest neighbor algorithm, decision tree, naive Bayes algorithm, and logistic regression. The support vector machine algorithm had the highest prediction accuracy of 97%, while the K-nearest algorithm had 94%, which was closest to the SVM algorithm. Other price prediction algorithms, such as decision tree, naive Bayes, and logistic regression, achieved 82%, 80%, and 63% accuracy, respectively.
References 1. Cisco Annual Internet Report—Cisco Annual Internet Report (2018–2023) White paper—Cisco, https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/ann ual-internet-report/white-paper-c11-741490.html, Last accessed 26 Nov 2022 2. Hossain R, Hasan MR, Sharmin M (2022) A Short review on the history of mobile phones. J Android, IOS Devel. Test. 7(2):33–39 3. Evolution of Smartphone (2022) https://www.researchgate.net/publication/355041882_Evolut ion_of_Smartphone. Last accessed 30 Nov 2022 4. Poppe E, Jaeger-Erben M, Proske M (2020) The smartphone evolution-an analysis of the design evolution and environmental impact of smartphones. In: Electronics Goes Green 2020, Berlin, Germany, pp 1–9 5. Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2(160):4–21. https://doi.org/10.1007/s42979-021-00592-x 6. Liakos KG, Busato P, Moshou D, Pearson S, Bochtis D (2018) Machine learning in agriculture: a review. Sensors (Switzerland) 18(8):1–29. https://doi.org/10.3390/s18082674 7. Asim M, Khan Z (2018) Mobile price class prediction using machine learning techniques. Int J Comput Appl 179(29):6–11. https://doi.org/10.5120/ijca2018916555 8. Nasser IM, Al-Shawwa M (2019) Developing artificial neural network for predicting mobile phone price range. Int J Acad Inform Syst Res 3(2):1–6 9. Subhiksha S, Thota S, Sangeetha J (2020) Prediction of phone prices using machine learning techniques. In: Raju K, Senkerik R, Lanka S, Rajagopal V (eds) Data engineering and communication technology. Advances in Intelligent Systems and Computing, vol 1079. Springer, Singapore. https://doi.org/10.1007/978-981-15-1097-7_65 10. Arora P, Srivastava S, Garg B (2020) Mobile price prediction using Weka. Int J Sci Dev Res 5(4):330–333 11. Kalaivani KS, Priyadharshini N, Nivedhashri S, Nandhini R (2021) Predicting the price range of mobile phones using machine learning techniques. In: AIP conference proceedings. https:// doi.org/10.1063/5.0068605
Machine Learning Algorithms for Prediction of Mobile Phone Prices
89
12. Karur K, Balaje K (2021) Prediction of mobile model price using machine learning techniques. Int J Eng Adv Technol 11(1):273–275. https://doi.org/10.35940/ijeat.a3219.1011121 13. Cetin M, Koc Y (2021) Mobile phone price class prediction using different classification algorithms with feature selection and parameter optimization. In: ISMSIT 2021—5th international symposium on multidisciplinary studies and innovative technologies, proceedings. IEEE, Turkey, pp 483–487. https://doi.org/10.1109/ISMSIT52890.2021.9604550 14. Varun Kiran A (2022) Prediction of mobile phone price class using supervised machine learning techniques. Int J Innovative Sci Res Technol 7(1):248–251 15. Sakib AH, Shakir AK, Sutradhar S, Saleh MA, Akram W, Biplop KBMB (2022) A hybrid model for predicting mobile price range using machine learning techniques. In: ACM international conference proceeding series. Association for Computing Machinery, Thailand, pp 86–91. https://doi.org/10.1145/3512850.3512860
Customized CNN for Traffic Sign Recognition Using Keras Pre-Trained Models Vaibhav Malpani, Sanyam Shukla, Manasi Gyanchandani, and Saurabh Shrivastava
Abstract Machine learning (ML) is the process of teaching a machine to understand and make decisions on real-world problems using an efficient set of algorithms. With almost every single task being automated, one such big leap in the fields of artificial intelligence and machine learning is the development of advanced driver assistance systems (ADAS). With the increasing demand for the intelligence of vehicles and the advent of self-driving cars, it is extremely necessary to detect and recognize traffic signs automatically through computer technology. Among all the components that come together to make an efficient and highly accurate ADAS, one of them is traffic sign recognition. Amidst all the publicly available datasets, this paper uses the GTSRB dataset. Our proposed model for traffic sign recognition is better than the other state-of-the-art models as we harness the power of 1 * 1 convolution blocks. In this paper, we propose a customized convolutional neural network (CNN), which extends the architecture of the existing pre-trained Keras models. This paper also tries to analyze the effect of different hyperparameters like batch size, number of hidden layers and hidden neurons, and different activation functions, on the accuracy with which it classifies the different traffic signs. Keywords Activation functions · Convolutional neural networks · Deep learning · GTSRB · Image classification · MobileNetV2 · Traffic sign recognition
1 Introduction Traffic signs are an essential component of the transportation system. It is extremely crucial to abide by traffic regulations and respect the traffic signs to maintain safety. Road safety concerns both car drivers and bystanders [1]. Several of the fatalities and injuries are due to the misinterpretation of traffic signals [2]. There are more than three traffic signs, as opposed to one traffic signal. For fundamental comprehension, they V. Malpani (B) · S. Shukla · M. Gyanchandani · S. Shrivastava MANIT, Bhopal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_8
91
92
V. Malpani et al.
may be classified into three types: mandatory signs, cautionary signs, and informative signs. For many businesses, machine learning has become a crucial competitive differentiation. Classical machine learning is often classified by how an algorithm learns to improve its prediction accuracy. There are four fundamental techniques for learning: supervised learning [3], unsupervised learning [4], semi-supervised learning [5], and reinforcement learning [6]. The algorithm that data scientists employ is determined by the sort of data they wish to forecast. The identification of traffic signs is a multi-category classification issue with unequal class frequencies, as shown in Fig. 1. Traffic signs differ greatly across classes in terms of color, form, and the use of pictograms or text. Deep learning techniques have the potential for solving problems in all aspects of life. The development of self-driving cars is by far the most substantial advancement in this phase. A vehicle can travel from one location to another using this method without a motorist. To accomplish this, several features, including traffic sign recognition, must also be developed. Whenever a self-driving vehicle shares the road with humans, it is supposed to follow traffic rules to ensure a safe ride [7]. Deep learning finds its applications in various computer vision problems like image classification, object detection, etc. The convolutional neural network (CNN) is the most well-known type of algorithm in deep learning domains, and it has achieved great success in a wide range of real-world applications. CNN designs have numerous layers, such as input layers, hidden layers, and output layers [8]. Using a large dataset, some intelligent pre-trained convolution networks provide higher accuracy. The major contributions of this paper have been delineated as below: • • • •
Propose a model that classifies traffic signs accurately Harness the power of 1 * 1 convolution blocks Use the pre-trained CNN models to our advantage, and Prove that our proposed model is better than other state-of-the-art models.
Fig. 1 Traffic sign frequency graph
Customized CNN for Traffic Sign Recognition Using Keras …
93
This paper is divided into 6 sections. Section 2 discusses the literature review. Section 3 highlights the proposed methodology. Section 4 talks about an experimental study and results. Conclusion and future work can be found in Sect. 5.
2 Literature Survey Although CNN is commonly utilized for detecting and recognizing traffic signals, there are certain issues with these systems that can undermine the model’s accuracy. The difficulties include brightness effects on signage, which might degrade the picture quality [2]. The photographs on the street may not always be perpendicular to the cameras installed on the vehicles. As a result, direction deviations may occur, causing the recognition failure. Even foliage, other traffic, and buildings might cause impediments, resulting in a lack of identification of the signs. This section discusses a few related works that attempted on increasing the accuracy of modern-day traffic sign recognition. Fang et al. [9] presented an innovative technique known as multi-layer adversarial domain adaptation (MLADA) that used a layered architecture to incorporate data from all levels. A DC at the feature level was introduced to learn a representation that didn’t depend on the domain. A classifier on the prediction level was added to the decision layer to reduce domain disputes. Following that, a union classifier was used to balance the FDC and PDC’s joint distribution restrictions. As a result, with proper training, their architecture achieved domain-invariant representation. The features recovered by each layer or convolutional block were then categorized as low-level or high-level. After each block, a kernel with size 1 × 1 was added to minimize the size of the recovered features. Using the GTSRB dataset, Persson et al. [10] suggested a pre-trained VGG16 network to identify traffic signs. The study looked at the impact of hyperparameters such as learning rate, batch size, and dropout rate. The learning rate may range from 1e−1 to 1e−8, the batch size can range from 5 to 100, and the dropout rate can range from 0.2 to 0.9. Konstantinos et al. [11] developed a model that performed well in a variety of unsupervised situations, as well as visualizations of the private and shared representations that allow understanding of the domain adaptation process. Given a labeled dataset, they trained a classifier to generalize the model in ordinance with both the target and source domains. They put forth their function of reconstruction loss to verify that the private representations are still relevant and to enhance generalization. They suggested a methodology that generates a common representation that is comparable for both domains and a domain-specific private representation. Ganin et al. [12] suggested including domain adaptation into the learning representation process such that final classification judgments are based on both types of features: be it invariant or be it discriminative. Their proposed method focused more on the unsupervised case. They demonstrated that by adding a few conventional layers and a new gradient reversal layer to practically any feed-forward model, they
94
V. Malpani et al.
could accomplish this adaptive behavior. The enhanced architecture results can be trained using conventional backpropagation and stochastic gradient descent, and it can therefore be done with minimal difficulty using any of the deep learning tools. The study focused on learning characteristics that combine discriminating behavior with domain invariance. Kuniaki et al. [13] offered a unique technique for unsupervised domain adaptation that uses task-specific classifiers to align distributions. The research focused on the development of a feature generator to decrease the disparity between target samples. To find target samples that are distant from the source’s support, they suggested increasing the difference between the outputs of the two classifiers. A feature generator learns to build target features near the support to reduce the gap. Patel et al. [14] explored different regularization techniques and loss functions to check the impact on accuracy for sign recognition. Whereas in the other paper [15], they have experimented on different loss functions.
3 Proposed Methodology 3.1 Approach This section outlines the proposed methodology for enhancing traffic sign recognition efficiency. We propose a customized CNN model that performs better, as compared to the one proposed by Fang et al. [9]. The model proposed in [9] makes use of 10 fully connected layers in total, and four 1 × 1 convolution blocks, that are used for feature reduction. In contrast to this, we built a customized CNN model. A very lightweight pre-trained CNN model MobileNetV2 is selected to form the base architecture, followed by a series of fully connected layers, with a linear spread difference in the number of neurons across them. The advantage of the proposed model lies in the fact that MobileNetV2 makes use of seven 1 × 1 convolution blocks, instead of four. Despite using less number of fully connected layers, the proposed model performs better. A major reason accounting for the accuracy includes choosing to spread the number of neurons in the dense layers in a linear manner, rather than giving it a drastic drop from 1240 nodes to 43 nodes. This paper attempts to check the effect of various hyperparameters like batch size, the number of dense layers, and the effect of different activation functions on the efficiency of the CNN model that we have built. The process follows the steps given below: • Step 1. Lightweight pre-trained CNN models that work well on images with low resolution, like MobileNetV2 and ResNet50, are selected for comparison. • Step 2. While training the model over various batch sizes: 8, 16, 32, 64, and 128. • Step 3. The best working model is then selected, and instead of using multiple dense layers, we use just a single fully connected layer and check the impact on accuracy.
Customized CNN for Traffic Sign Recognition Using Keras …
95
• Step 4. The best working model from Step 3 is selected to evaluate the effect of different activation functions on the accuracy of the model. The different activation functions used in this paper are Linear, ReLU, Leaky ReLU, Sigmoid, Softmax, Softplus, Selu, Elu, and Exponential.
3.2 Block Diagram The block diagram below (Fig. 2) portrays the architecture of the custom CNN model. We pickup a pre-trained CNN model from the list of available ones. Across the various papers studied, and taking into consideration the block diagram of architectures like MobileNetV2 and ResNet 50, we can definitely harness the power of increased number of 1 * 1 convolution blocks. To the pre-trained CNN model, we attach to several blocks of dense layers. Number of neurons are reduced by half after every block of dense layer. Sudden drop of neurons affects the accuracy because of the inter-dependency in the fully connected layers.
3.3 Dataset There are many publicly available datasets for traffic signs. Of all the available ones, we chose the GTSRB dataset. This dataset was obtained from Kaggle Library by German Traffic Sign Recognition Benchmark (GTSRB), and it is freely available. A total of 51,839 images are used, with 39,209 and 12,630 images accounting for training and testing sets, respectively, which makes it the perfect choice to be selected
Fig. 2 Proposed custom CNN architecture using MobileNetV2
96
V. Malpani et al.
as an underlying dataset. The images have been resized to 32 * 32 before they are sent to the model. The link to the GTSRB dataset is as below: Kaggle GTSRB Dataset
4 Experimental Setup and Results After choosing GTSRB from the various publicly available datasets for traffic sign recognition, we head on to the Keras library to check the available pre-trained CNN models. Across the literature surveys done, the two most favorable choices that came out in highlight were: MobileNetV2 and ResNet50. We use the base architectures of the above two models and add our fully connected layers to the end. The last layer makes use of the Softmax activation function. The rest of the dense layers use the linear activation function. We then run this setup iterative over different batches sizes: 8, 16, 32, 64, and 128. The best out of these is selected further for training. The tables and graphs given below show the effect of batch size and activation functions, across the models. Table 1 plots the accuracy, loss, precision, recall, and F-score of the proposed architecture across different batch sizes of 8, 16, 32, 64, and 128, for both the CNN models: MobileNetV2 and ResNet50 and across different activation functions. The accuracy and loss can be observed across 100 epochs for different batch sizes and activation functions as plotted in Fig. 3.
Table 1 Performance against different CNN, batch size, and activation function Pre-trained CNN Batch Activation Loss Accuracy Precision Recall size function MobileNetV2 ResNet50 MobileNetV2 ResNet50 MobileNetV2 ResNet50 MobileNetV2 ResNet50 MobileNetV2 ResNet50 MobileNetV2 MobileNetV2 MobileNetV2 MobileNetV2 MobileNetV2 MobileNetV2
8 8 16 16 32 32 64 64 128 128 32 32 32 32 32 32
Linear Linear Linear Linear Linear Linear Linear Linear Linear Linear Sigmoid SELU ELU ReLU Softplus Softsign
3.1530 – 0.6841 0.7291 0.3525 0.7898 0.3808 0.4431 0.6500 0.3616 0.5436 0.4436 0.4063 0.3128 0.3623 0.4215
44.81% – 81.31% 93.25% 94.71% 92.58% 94.00% 92.79% 87.96% 91.92% 89.60% 94.28% 94.81% 95.23% 94.46% 92.76%
52.01% – 83.28% 94.06% 94.85% 93.37% 94.49% 93.33% 89.06% 92.35% 90.75% 94.74% 52.01% 95.51% 95.11% 93.32%
44.81% – 81.31% 93.25% 94.71% 92.58% 94.00% 92.79% 87.96% 91.92% 89.60% 94.28% 44.81% 95.23% 94.46% 92.76%
F-score 40.73% – 81.10% 93.24% 94.63% 92.65% 93.96% 92.83% 88.05% 91.89% 89.43% 94.32% 40.74% 95.25% 94.48% 92.81%
Customized CNN for Traffic Sign Recognition Using Keras …
97
Fig. 3 100 epochs, batch size 32, ReLU
5 Conclusion and Future Work The proposed architecture is a lightweight CNN model that gives us an overall accuracy of 95.23% with the hyperparameter combination of batch size 32 and activation function as ReLU. Unlike other activation functions, it doesn’t excite all neurons at the same time. This means that neurons will be deactivated only if linear transformation is negative. When negative input values are used, the result is 0, and that neuron isn’t activated. ReLU is far faster and more accurate since it only engages a few nodes. It is quite evident from the results section that a batch size of 32 gives us the optimum, yet best results. The more the batch size, the slower will be the learning, and the more will be over-fitting. Training the data on too small batch sizes can lead to poor accuracy, as with smaller batches, outliers can be detected well. In future, we aim to test our models on the other available pre-trained neural networks as well. To get better accuracy and results, techniques like early stopping can be used too. Due to computational limitations, the image has been resized to not more than 32 * 32 pixels in size. In the later years, the effect of image size on feature extraction can also be looked upon, as an impact factor, and can be analyzed for achieving better accuracy.
References 1. De la Escalera A, Armingol JM, Mata M (2003) Traffic sign recognition and analysis for intelligent vehicles. Image Vision Comput 21(3):247–258 2. Atif M, Zoppi T, Gharib M, Bondavalli A (2021) Quantitative comparison of supervised algorithms and feature sets for traffic sign recognition. In: Proceedings of the 36th annual ACM symposium on applied computing, pp 174–177 3. Cunningham P, Cord M, Delany SJ (2008) Supervised learning. In: Machine learning techniques for multimedia. Springer, Berlin, pp 21–49
98
V. Malpani et al.
4. Hastie T, Tibshirani R, Friedman J (2009) Unsupervised learning. In: The elements of statistical learning. Springer, Berlin, pp 485–585 5. Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach Learn 3(1):1–130 6. Wiering MA, Van Otterlo M (2012) Reinforcement learning. Adapt Learn optimization 12(3):729 7. Agostinelli F, Hoffman M, Sadowski P, Baldi P (2014) Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830 8. Chauhan R, Ghanshala KK, Joshi RC (2018) Convolutional neural network (cnn) for image detection and recognition. In: 2018 first international conference on secure cyber computing and communication (ICSCCC). IEEE, pp 278–282 9. Fang Yuchun, Xiao Zhengye, Zhang Wei (2021) Multi-layer adversarial domain adaptation with feature joint distribution constraint. Neurocomputing 463:298–308 10. Persson S (2018) Application of the German traffic sign recognition benchmark on the vgg16 network using transfer learning and bottleneck features in keras 11. Bousmalis K, Trigeorgis G, Silberman N, Krishnan D, Erhan D (2016) Domain separation networks. Adv Neural Inf Proc Syst 29 12. Ganin Yaroslav, Ustinova Evgeniya, Ajakan Hana, Germain Pascal, Larochelle Hugo, Laviolette François, Marchand Mario, Lempitsky Victor (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2030–2096 13. Saito K, Watanabe K, Ushiku Y, Harada T (2018) Maximum classifier discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3723–3732 14. Patel V, Shukla S, Shrivastava S, Gyanchandani M (2022) Regularized cnn for traffic sign recognition. In: 2022 international conference on smart technologies and systems for next generation computing (ICSTSN). IEEE, pp 1–5 15. Patela V, Shuklaa S, Gyanchandania M (2022) Analysis of different loss function for designing custom CNN for traffic sign recognition
Underwater Image Enhancement and Restoration Using Cycle GAN Chereddy Spandana, Ippatapu Venkata Srisurya, A. R. Priyadharshini, S. Krithika, S. Aasha Nandhini, R. Prasanna Kumar, and G. Bharathi Mohan
Abstract Underwater image augmentation is a technique for recovering lowresolution underwater photographs to produce equivalently high-resolution images. Deep learning-based techniques for enhancing photos commonly use paired data to train the model. Another crucial issue is how to effectively keep the fine details in the improved image. In order to address these problems, providing a revolutionary unpaired underwater picture-enhancing technique that fixes the underwater photos using a cycle generative adversarial network. At last test results on two datasets of unpaired underwater image datasets showed the recommended model’s utility by surpassing cutting-edge image enhancement methods. The CycleGAN generator includes a content loss regularizer that retains the important information of one lowresolution image in the corresponding clear image that is generated. The results are given based on mean square error and mean absolute error after the findings have
C. Spandana · I. V. Srisurya · A. R. Priyadharshini · S. Krithika · R. P. Kumar (B) · G. B. Mohan Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India e-mail: [email protected] C. Spandana e-mail: [email protected] I. V. Srisurya e-mail: [email protected] A. R. Priyadharshini e-mail: [email protected] S. Krithika e-mail: [email protected] G. B. Mohan e-mail: [email protected] S. A. Nandhini Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattangulathur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_9
99
100
C. Spandana et al.
been optimized using the ADAM optimizer. This architecture has a 4 × 4 kernel and a 2 × 2 stride, using Adam as the optimizer and a learning rate of 0.002. Keywords Image enhancements · CycleGAN · Enhancing underwater visual perception · ImageNet · Cycle loss · Generator · Discriminator
1 Introduction 1.1 Motivation/Challenges For the examination of underwater infrastructure and the detection of numerous man-made items, underwater picture enhancement is crucial. Improved knowledge of marine biology and environmental assessment is also necessary. It is also useful for studying monuments that are immersed under water. Enhancing underwater images is essential for underwater vehicle control systems, as well as for identifying the cultural significance of underwater archaeological sites. Additionally, underwater picture enhancement plays a significant role in the study of underwater structures like coral reefs. Because of scatters, underwater photos have low contrast and distorted colours. Visibility issues make underwater photos difficult to capture. Water must be limpid or clear in order to capture a clean visible underwater image; however by nature, all water is turbid due to sand and mineral particles. However, much as airborne particles produce distortion in outdoor photographs, waterborne particles induce distortion in underwater images. As water depth rises, underwater images become increasingly cloudy or difficult to see. Usually, there are two causes for underwater visual distortion. One is the effect of light scattering, while the other is the effect of colour change. As water is a denser medium than air, light that enters the water is refracted, absorbed, and scattered, reducing the quantity of light that is transmitted from the air to the water and scattering it in various directions. Light is distorted by scattering, which also lowers the colour contrast. Different wavelengths of light travelling in water experience different levels of attenuation, which results in colour change. The diffraction of light and colour shift distortions that underwater photographs experience cannot be handled by any available underwater processing techniques. Research in areas, like underwater ecosystems and geological changes can be helped and advanced by the underwater images that underwater robots take. Underwater robots employ visual sensors to perceive their environment and determine the best action when performing underwater jobs. The generally low quality of underwater photographs is caused by a variety of environmental conditions. The predominant colours of the image are green and blue; however, there is little contrast and the details are blurry. Researching underwater picture-enhancing methods are so essential.
Underwater Image Enhancement and Restoration Using Cycle GAN
101
The main factors that affect underwater images are light absorption, refraction, and scattering. Different wavelengths of light get absorbed at different levels. Red light is particularly absorbed by seawater, while only green and blue light may travel farther underground. The two elements of the optical picture production paradigm for underwater photos are the directly transmitted light and the background scattered light. The item itself is the source of the directly transmitted light, which will attenuate the transmission channel. Underwater photos will have colour distortion due to the attenuation effect, which is brought on by light absorption and light scattering. The background dispersed light is not a result of the object’s radiation, but rather results from the light in the surroundings being scattered by a lot of tiny water particles. The primary cause of the loss of contrast in underwater photographs is background dispersed light. In addition to this light which does not travel in a straight line causes refraction, and it is also a reason for image distortion.
1.2 Paper Organization There are five sections in the paper. The opening segment introduces how underwater imaging works and provides the motivation behind the need for image enhancement in underwater image processing. A survey of contemporary techniques and methods employed in the field of underwater picture processing and enhancement is in the second section. The methodology we propose is discussed in the third section of this study. This section briefly covers the process flow, model architecture of our model—CycleGAN, training, and validation process. The final section of this paper infers the model results using plots, visuals of the enhanced image, model accuracy, result discussion, and future scope.
1.3 Objective Due to the various impacts of the underwater medium, the image that is caught underwater is foggy. Unmanned underwater vehicles (UUVs) are frequently utilized for surveillance based on visual cues. Imaging becomes rather challenging in complex underwater situations due to colour deterioration, low contrast, and feature loss (especially edge information). The main objective is to enhance the underwater image, remove noise, and provide a clear enhanced image. The input for the model is a real-time underwater image and the expected output is an enhanced and noiseless image.
102
C. Spandana et al.
2 Literature Survey 2.1 Related Works For instance, underwater information is essential for exploration and use of the undersea environment by humans, in the domains of underwater archaeology [1], localization [2], maintenance of underwater machinery and other architectures [3], target recognition underwater [4], underwater searching and salvage [5], environment monitoring in underwater [6], etc. Both optical and physical technologies are employed to gather data underwater. Optical photos and movies, in contrast, provide us with a more perceptual knowledge of underwater goals. Due to the peculiarities of the underwater environment, videos suffer from colour deviation, blur and reduced contrast, which lowers the visibility and calibre of underwater images. The attenuation and dispersion of light caused by its passage through water help to partially explain the deterioration. Additional factors that contribute to deterioration include the flow of water, underwater life [7], temperature, salinity [8], and sounds like Gaussian noise, salt-pepper noise, and marine snow [9]. Modern tools may be used to lessen the deterioration of underwater images and videos, such as an underwater imaging system using Lidar sensors [10] and a multistate underwater using a laser line scan system [11]. An underwater imaging physical model illustrates the relationship between damaged and restored/clear underwater images. By calculating the background brightness of the underwater environment and measuring the light’s transmittance, it is feasible to retrieve the recovered underwater photographs. For underwater photography, the Jaffe-McGlamery model [12] serves as a good physical representation. Based on this model, Trucco et al. [13] suggested an auto-tuning filter for the restoration of underwater images. A two-stage method [14] for restoring underwater images worked well. A restoration technique for underwater photographs that prioritizes visual quality was discussed by Wagner et al. [15]. A normalized transformation was employed [16] to accomplish contrast restoration. The DCP model also called as dark channel prior model, which is frequently used in underwater picture restoration, was created by He et al. [17]. It was modelled after the JaffeMcGlamery framework [18]. It is recommended using a red channel approach to fix underwater photos in [19]. Employed Greyworld using DCP in order to enhance the contrast of the underwater pictures. An improved DCP method is used [20] for preparing underwater vision pictures. A background light estimate [21] and DCP are used to enable underwater photo restoration. Underwater image denoising technique incorporates a double transmission mapping, homomorphic, and dual-image wavelet fusion [22]. GAN’s were widely used in various sectors. One recent application of GAN is in biomedicals for ligand designing [23] and also designing high-resolution facial images [24].
Underwater Image Enhancement and Restoration Using Cycle GAN
103
3 Methodology 3.1 Proposed Methodology This issue can be thought of as an image-to-image translation difficulty when a distorted image is converted into an undistorted image. Some nonlinear mapping is required to fix the image’s distortion. Therefore, this issue may be solved using the conditional GAN, or CGAN. Our solution makes itself novel by improving the existing deep learning-based generative model by combing filters and GANs. Both the visual quality and the quantitative measurements will be better with the proposed methodology than with the existing ones. Figure 1 shows the proposed methodology. In order to automatically improve images, we need to train a mapping for generator G : X → Y , from the given source domain X to the required target domain Y. With the help of an adversarial discriminator growing in an iterative min–max game, the generator in our cycleGAN-based model tries to learn this mapping. A conditional GAN-based model learns the mapping (1) G : {X, Z } → Y , where X (Y ) denotes the source domain and z indicates random noise. Following are some examples of the conditional adversarial loss function. L cG AN (G, D) = E X,Y log D(Y ) + E X,Y log(1 − D(X, G(X, Z )))
(1)
In this case, discriminator D seeks to increase L cG AN while generator G seeks to decrease it. To further boost the model’s precision, we incorporated three other traits: global similarity, local texture, image content, and style.
Fig. 1 Overall process flow of the proposed solution
104
C. Spandana et al.
3.2 Underwater Imaging Underwater settings have an impact on how light propagates through water, which causes light to be attenuated and scattered. Influence elements include undersea suspended particles, water velocity, water temperature, salinity, and refraction, and also the absorption of light by water. Consequently, colour divergence, blur, and low contrast are frequent issues with optical underwater photos. In underwater photography, light attenuation and scattering result in three components that can be added to characterize the intensity of light captured by the lens: I = Ed + E f + Eb
(2)
where (2) I stands for the total amount of light, E d for the portion of reflection, E f for the portion of scattering in the forward direction, and E b for the portion of scattering in the backward direction. Light scattering is not taken into account for the direct portion E d , just attenuation is taken. E d (x) = J (x)e−cd(x) = J (x)t(x)
(3)
where (3) d(x) represents the Euclidian distance between the camera and the object underwater, and J(x) is the amount of light received by the object from the source; here the variable, c is the coefficient of attenuation of light and as the transmittance, t(x), is introduced with a definition. The suspended particles in the water are a typical disruption. It is possible to calculate E f for small-angle scattering using a convolution (4). E f (x) = E d (x) ∗ g(x) = (J (x) ∗ t(x)) ∗ g(x)
(4)
where (5) g(x) is the point spread function. Instead of coming from the object to be photographed, the backscattering of light E b is caused by reflection by underwater suspended particles. It can thus be thought of as underwater photography model noise. E b (x) = B∞ (x) ∗ (1 − t(x))
(5)
where B∞ is the water background. The original equation for the received light intensity by a camera can be rewritten as (6) I (x) = J (x) ∗ t(x) + (J (x) ∗ t(x) ∗ g(x) + B∞ (x) ∗ (1 − t(x))
(6)
When the image is nearer to the object to be photographed, the forward scattering of light E f often contributes significantly less to the model than do reflection E d and back scattering E b . Thus, the previous equation can be expressed more simply
Underwater Image Enhancement and Restoration Using Cycle GAN
105
as (7). I (x) = J (x) ∗ t(x) + B∞ (x) ∗ (1 − t(x))
(7)
While I (x) represents the actual image captured by a sensor-like camera, J (x) indicates an image that has been restored. The attenuation of light in water is denoted by the symbol t(x), while the backward light scattering in water is denoted by the symbol B(x) ∗ (1 − t(x)). In an ideal scenario, I (x) = J (x) holds, since there is no longer any light attenuation or scattering in the water. However, the attenuation and light scattering are inevitable in real underwater photography situations and must be taken into account. Following the acquisition of I (x) by a sensor, J (x), the recovered picture is dependent on t(x) and B(x).
3.2.1
GAN Paired Training
A target function tells G to train to enhance the quality of the perceptual image, such that the resultant image resembles the appropriate ground truth in regards to the appearance as a whole and high-level feature presentation. However, D will disregard the image if it has a geographically inconsistent texture or style. We explicitly use the goal function given below (8) for paired training, G ∗ = argmin G
max D L cGAN (G,
D) + λ1 L 1 (G) + λc L con (G)
(8)
Here, the scaling variables λ1 = 0.7 and λc = 0.3 were empirically calibrated as hyper-parameters.
3.2.2
GAN Unpaired Training
Since there is no pairwise ground truth for unpaired training images, the global similarity and content loss requirements are not applied. Instead, (9) it is intended to comprehend both the forward mapping G F : {X, Z } → Y and the re-construction G R : {Y, Z } → X simultaneously while preserving cycle consistency, G ∗F , G ∗R = argmin G F ,G R
max DY ,D X L cGAN (G F ,
DY ) + L cGAN (G R , D X ) + λcyc L cyc (G F , G R )
(9)
3.2.3
NonLinear Mapping of Distorted Image
This step converts the distorted image into a distortion-free image with the help of deep learning concepts known as conditional GANs, also called as CGANs. It contains two convolutional blocks, one is the discriminator and the other is the
106
C. Spandana et al.
Fig. 2 Work flow of GAN
Table 1 Count of paired images of imageNet in training and validation
Name of the dataset
Training sets
Validation sets
Total
Underwater dark
5560
572
11,675
Underwater ImageNet
3750
1280
8675
Underwater scenes
2180
135
4550
generator. The generator tries to create new distortion-free images, whereas the discriminator classifies the output generated by the generator as a distorted image or distortion-free image. Figure 2 shows the work flow of GAN.
3.3 Data Set Analysis The enhancing underwater visual perception (EUVP) dataset includes numerous sets of paired and unpaired image samples of low and high perceptual quality to assist in the supervised training of underwater picture improvement models.
3.3.1
Paired Data
The following statistics apply to three paired datasets (Table 1):
3.3.2
Unpaired Data
See Table 2.
Underwater Image Enhancement and Restoration Using Cycle GAN Table 2 Count of unpaired images
107
Low resolution
High resolution
Validation
Total
3190
3145
340
6670
4 Results and Inferences 4.1 Results The proposed methodology was tested on a dataset containing 100 underwater images labelled as TrainA and 100 clear images labelled as TrainB; similarly, 30 underwater images as TestA and 30 clear images as TestB. The CycleGAN was trained for 1300 epochs covering the training set. As mentioned in the methodology part, the CycleGAN will try to generate images from train A to train B and vice versa. MSE and MAE have been used for metrics evaluation. The mean square error (MSE) in (10), where n is the data size, calculates the average of the squares of the errors, or the average-squared deviation between the predicted values and the actual value, MSE =
n 2 1 Yi − Yi n i=0
(10)
Mean absolute error is a metric for errors between paired observations, reporting the same phenomena (MAE). Comparing expected data to observed data, subsequent time to starting time, and one measurement technique to another are a few examples of Y versus X comparisons. The MAE is calculated by dividing the total residual error by the sample size. According to (11) n ME =
i=1
yi − xi n
(11)
In Fig. 3, there are two sample outputs generated by the CycleGAN. The first images are the input image which are underwater images (real images shown in Fig. 3), the second image in the row is the enhanced image i.e. image in TestB and the last image in the row is again reconstruction of the image in TestA. Figure 4 shows the loss graphs of discriminator of dataset-TrainA, discriminator of dataset-TrainB, and overall generator loss. As we can see from the graph, the generator loss converges around 10 whereas the discriminator loss converges around 0.1. Figure 5 shows the comparison of the existing models with our proposed methodology. The first column represents the input given to the model. The second column shows the results of the proposed methodology (modified cycle GAN). The next three columns represent the results of the FuineGAN, CycleGAN, and conditional GAN (CGAN) [25] and [26] uses the heart disease prediction using machine
108
C. Spandana et al.
Fig. 3 a Input image x, b Generated image G(x), and c Reconstructed image F(G(x)
Fig. 4 Loss graphs of the CycleGAN
can be improved by GAN and image restoration, classifier-based duplicate record elimination [27].
5 Conclusion and Future Enhancements In this paper, the main objective is to enhance the underwater image with the help of CycleGAN. This methodology and results can be further extended by setting the correct hyper-parameters and epochs as GANs are subject to overfitting or model crashing frequently. So, care should be taken while training the CycleGAN and discriminator, and generator should be sufficient enough to contradict each other but not to dominate other, else the results may not be accurate. In this proposed methodology, two connections were removed while training which greatly reduced training time. The training of GAN is done using ADAM optimizer which avoids the overfitting of the model.
Underwater Image Enhancement and Restoration Using Cycle GAN
109
Fig. 5 Comparison of other results with the proposed method
To make the network more generalizable, future work will concentrate on building a larger and more varied dataset from underwater objects. The dataset’s diversity could be increased by adding noise, such as particle and lighting effects, to the data produced by CycleGAN. In order to assess the effectiveness of our strategy, we also plan to look into various quantitative performance criteria. In the future, we would also endeavour to further enhance image clarity by focusing on image sharpness and the determination of CLAHE parameters, hence raising the images’ resolution. Additionally, for more precise underwater image colour correction and also image enhancement further, we will combine the spatial perception data from multiple sensors including echo sounders and Doppler rangefinders. We will further develop the existing technique to achieve underwater image distortion correction and image enhancement in a single framework for distortion brought on by light refraction.
References 1. Singh H, Adams J, Mindell D, Foley B (2020) Imaging underwater for archaeology. J Field Archaeol 27:319–328 2. Boudhane M, Nsiri B (2016) Underwater image processing method for fish localization and detection in submarine environment. J Vis Commun Image Represent 39:226–238 3. Shi H, Fang SJ, Chong B, Qiu W (2016) An underwater ship fault detection method based on Sonar image processing. J Phys Conf Ser 679:012036 4. Ahn J, Yasukawa S, Sonoda1 TT, Ura T, Ishii K (2017) Enhancement of deep-sea floor images obtained by an underwater vehicle and its evaluation by crab recognition. J Mar Sci Technol 22:758–770 5. Gu L, Song Q, Yin H, Jia J (2018) An overview of the underwater search and salvage process based on ROV. Sci Sin Inform 48:1137–1151
110
C. Spandana et al.
6. Watanabe J-I, Shao Y, Miura N (2019) Underwater and airborne monitoring of marine ecosystems and debris. J Appl Remote Sens 13:044509 7. Powar O, Wagdarikar N (2017) A review: underwater image enhancement using dark channel prior with gamma correction. Int J Res Appl Sci Eng Technol 5:421–426 8. Zhang X, Hu L (2019) Effects of temperature and salinity on light scattering by water. In: Ocean sensing and monitoring II; SPIE, vol 7678. Washington, DC, USA, pp 247–252 9. Hu K, Zhang Y, Weng C, Wang P, Deng Z, Liu Y (2021) An underwater image enhancement algorithm based on generative adversarial network and natural image quality evaluation index. J Mar Sci Eng 9(7):691. https://doi.org/10.3390/jmse9070691 10. He D, Seet G (2020) Divergent-beam Lidar imaging in turbid water. Opt Laser Eng 41:217–231 11. Ouyang B, Dalgleish F, Vuorenkoski A, Britton W (2013) Visualization and image enhancement for multistatic underwater laser line scan system using image-based rendering. IEEE J Ocean Eng 38:566–580 12. Jaffe J (1990) Computer modeling and the design of optimal underwater imaging systems. IEEE J Ocean Eng 15:101–111 13. Trucco E, Olmos-Antillon A (2006) Self-tuning underwater image restoration. IEEE J Ocean Eng 31:511–519 14. Wang N, Qi L, Dong J, Fang H, Chen X, Yu H (2016) Two-stage underwater image restoration based on a physical model. In: Proceedings of the Eighth international conference on graphic and image processing (ICGIP 2016), Tokyo, Japan, p 10225, 29–31 Oct 2016 15. Wagner B, Nascimento ER, Barbosa WV, Campos MFM (2018) Single-shot underwater image restoration: a visual quality-aware method based on light propagation model. J Vis Commun Image Represent 55:363–373 16. Shi Z, Feng Y, Zhao M, Zhang E, He L (2020) Normalized gamma transformation based contrast limited adaptive histogram equalization with color correction for sand-dust image enhancement. IET Image Process 14:747–756 17. He K, Sun J, Tang X (2011) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal 33:2341–2353 18. Galdran A, Pardo D, Picón A, Alvarez-Gila A (2015) Automatic red-channel underwater image restoration. J Vis Commun Image Represent 26:132–145 19. Li C, Guo J, Wang B, Cong R, Zhang Y, Wang J (2016) Single underwater image enhancement based on color cast removal and visibility restoration. J Electron Imag 25:033012 20. Tang Z, Zhou B, Dai X, Gu H (2018) Underwater robot visual enhancements based on the improved DCP algorithm. Robot 40:222–230 21. Xie H, Peng G, Wang F, Yang C (2018) Underwater image restoration based on background light estimation and dark channel prior. Acta Opt Sin 38:18–27 22. Yu H, Li X, Lou Q, Lei C, Liu Z (2020) Underwater image enhancement based on DCP and depth transmission map. Multimed Tools Appl 79:20373–20390 23. Mukesh K, Ippatapu Venkata S, Chereddy S, Anbazhagan E, Oviya IR (2023) A variational autoencoder—General adversarial networks (VAE-GAN) based model for ligand designing. In: Gupta D, Khanna A, Bhattacharyya S, Hassanien AE, Anand S, Jaiswal A (eds) International conference on innovative computing and communications. lecture notes in networks and systems, vol 473. Springer, Singapore 24. Aishwarya G, Raghesh Krishnan K (2021) Generative adversarial networks for facial image inpainting and super-resolution. J Phys Conf Ser 2070:012103 25. Anivilla S, Sajith Variyar VV, Sowmya V, Soman KP, Sivanpillai R, Brown G (2020) Identifying epiphytes in drones photos with a conditional generative adversarial network (C-GAN). In: ISPRS—International archives of the photogrammetry, remote sensing and spatial information sciences, pp 99–104 26. Prasanna Kumar R (2021) An empirical study on machine learning algorithms for heart disease prediction. IAES Int J Artif Intell 27. Kalpana G, Prasanna KR, Ravi T (2010) [IEEE computing (TISC)—Chennai, India (2010.12.17–2010.12.19)] Trendz in Information Sciences & Computing(TISC2010)—Classifier based duplicate record elimination for query results from web databases. pp 50–53. https:/ /doi.org/10.1109/tisc.2010.5714607
Implementation of Machine Learning Techniques in Breast Cancer Detection Mitanshi Rastogi, Meenu Vijarania, and Neha Goel
Abstract Breast cancer is still the most common disease among females worldwide. When breast cancer is detected early, the survival rate improves because better treatment can be provided to patients at the earliest. Sometimes, doctors may make incorrect decisions to identify non-cancerous tumors (benign) as cancerous tumors (malignant). Computer-aided detection systems (CAD) with the help of machine learning algorithms are able to deliver precise breast cancer diagnoses. In this paper, the WISCONSIN dataset was taken from Kaggle.com for the research. Overall, the collection contains 569 records, in which 357 are innocuous and 212 are cancerous. This dataset consists of 32 attributes. A comparative study has been done on the dataset using ML techniques like SVM, KNN, RF, and logistic regression on the basis of N-fold cross-validation techniques accuracy, f1-score, and recall. As a result, SVM outperforms with maximum accuracy of 0.95. Keywords Machine learning · CAD · SVM · KNN · Logistic regression
1 Introduction Over the last decade, there has been a surge in interest in machine learning (ML) [1]. Inexpensive computing power and affordable memory help to increase interest. Machine learning plays an integral part in diverse application areas, for instance, stock market prediction, healthcare, face recognition, air pollution, natural language
M. Rastogi (B) CSE, K.R. Mangalam University, Gurugram, Haryana, India e-mail: [email protected] M. Vijarania Centre of Excellence, CSE, K.R. Mangalam University, Gurugram, Haryana, India N. Goel VIPS, Pitampura, Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_10
111
112
M. Rastogi et al.
processing, and earthquake detection [2–4]. This paper aims at breast cancer detection. A set of preoperative tests have augmented breast cancer detection such as mammography, biopsy, and ultrasound [5, 6]. The above-mentioned diagnostic mechanism, anyhow, has its own disadvantages. Identification of disease at an early stage leads to lowering the fatality of patients suffering from breast cancer [7]. Henceforth, advancement is already available methodologies that are essential for prognosis of disease as early as possible. Breast cancer occurred due to a cancerous cyst when the cell’s expansion becomes uncontrollable [8]. Malignant cells are transmitted all over the body, resulting in various stages of cancer. Various types of cancer occur due to circulation of cells and tissues in the body. The types of breast cancer are as follows [9]: i. DCIS: This disease arises when the aberrant cells proliferate outside the breast. ii. IDC: This disease develops when the aberrant tissues of the breast proliferate throughout the cells, and this cancer is most commonly detected in males. iii. MTBC: The cancer occurs due to abnormal growth in duct and lobular cells. LBC: The disease takes place within the breast. iv. CBC: It is an acronym for Mucinous Breast Cancer (MBC), which develops due to invasive ductal cells. It happens when the aberrant tissues expand throughout the duct. This paper offers a roadmap for the detection of breast cancer. Wisconsin Diagnostic Breast Cancer Dataset (WDBC) is being used for the research work. The research depicts the contribution of machine learning techniques. The techniques can support in the early diagnosis of breast cancer with high precision [10]. Furthermore, this serves as the foundation for conducting a comparative analysis of these techniques. Lastly, this research is critical in choosing an appropriate machine learning technique when developing a unified intelligent model. This research paper is divided into four sections. Section 2 represents the work done to date in the form of a literature survey. The proposed approach has been discussed in Sect. 3. In Sect. 4, the outcomes are discussed and compared with other machine learning models. It also includes the conclusion obtained from the model.
2 Related Work In the current section, a few similar works are done already on breast cancer detection by different researchers with the help of ML techniques like SVM, RF, KNN, DT, CNN, and logistic regression on the different datasets using cross validation techniques as comparison parameters. Shailaja et al. [11] analyze the performance of classification algorithms on WDBC dataset was used for the research. The results depict that SVM was most appropriate with 96.40% of accuracy.
Implementation of Machine Learning Techniques in Breast Cancer …
113
Ming et al. [12] analyze ML techniques with existing BCRAT and BOADICEA using accuracy and AU-ROC as comparison parameters. By using machine learning model, accuracy increases up to 30 to 35% more than already existing technologies. Dhahri et al. [13] use different classification techniques for prognosis of breast cancer with the dataset having cross validation as analyzes parameters. Genetic programming is able to identify the most appropriate model using a combination of preprocessing techniques and ML models. Ganggayah et al. [14] analyze the DT and RF on real-time dataset from Malaysia taking accuracy as the key parameter. As a result, the RF has higher accuracy of 82.7%. Islam et al. [15], conducted a comparative study on classification and deep learning models using the UCI dataset with performance parameters. Experiment outcome revealed that ANN was able to achieve highest accuracy of 98.57%, sensitivity of 97.82%, and f1-score of 0.9890. Gupta et al. [16], diagnose and analyze breast cancer disease using well-known machine learning classification algorithms using accuracy and time as comparison parameters on UCI dataset. The outcomes show an extreme learning machine gives 99% accuracy. Kamal and Kumari [17] state that AI or ML depicts better efficiency in tumor treatment during coronavirus pandemic. The objective is to visualize up to what degree the impact appears on chest scans because of COVID-19. Bhise et al. [18] conducted a comparative study of ML and DL models using BreaKHis 400X dataset and performance parameters. As per the study, CNN outperforms in terms of accuracy and precision. Masood [19] applied preprocessing, feature selection, and extraction to minimize the total number of dimensions. The author uses numerous ML and ensembled models. As per the report, most appropriate methodology is support vector machine. Alanazi et al. [9] propose a technique that gives accurate results for breast tumor. Convolutional neural network with improved accuracy of 87% as compared to other machine learning techniques. Rastogi et al. [2] briefed about algorithms used in different sectors on the basis of different parameters like area, algorithms used for, purpose, and applications. This will help in understanding in detail machine learning techniques. Abdur Rasool et al. [20] demonstrate the comparison analysis of ML algorithms on different datasets WDBC and WDDC. LR, Polynomial SVM, KNN, and EC are the algorithms used for the study. Polynomial SVM has the highest accuracy 99.3%.
3 Proposed Model The experimental work used in this research paper mainly considered classification algorithms. The activities are split into two categories: (a) ML models involve training the machines using the provided dataset. (b) Testing is being carried out in the second phase. Python is used to create and test various ML techniques (Fig. 1).
114
M. Rastogi et al.
Fig. 1 Flow of research process
3.1 Data Elicitation Data elicitation means a collection of data. The dataset used for the experimental work is Wisconsin dataset from www.kaggle.com. The dataset has 569 records in total, out of which 357 records are non-cancerous (benign) and 212 are cancerous (malignant). The same has been depicted in Fig. 2 given representing ‘0’ as benign and ‘1’ as pernicious cyst.
Fig. 2 Representation of benign and malignant cells
Implementation of Machine Learning Techniques in Breast Cancer …
115
3.2 Data Preprocessing and Selection On any given dataset, data preprocessing is done to improvise the quality by removing irrelevant data. The preprocessing step is done in three steps: data cleaning, transformation, and reduction. The dataset used in this paper for research work includes 32 attributes. All attributes given in the dataset as depicted in Fig. 3. To identify whether the data is balanced or not is very crucial. The dataset is not evenly balanced as shown in Fig. 4. There are roughly twice as many benign cells as there are cancerous cells. A heat map is represented to visualize the relationship between all features. Further, selection by feature techniques help in dimensionality reduction by converting features from a higher space to a limited dimensional space. The selected features will be extracted after completing the selection of data. Figure 5 represents the graph with selected attributes after omitting irrelevant features.
Fig. 3 Dataset represents all attributes
116
M. Rastogi et al.
Fig. 4 Independent variables represented by a heat map
3.3 Implementing Machine Learning Models In this phase, ML algorithms are implemented on the dataset. Using the preprocessed data, the models will be implemented in this phase. We have considered different classification techniques for the processing [21]. Comparative analysis of performance is done by putting the proposed model to the test.
4 Result The proposed classification models is implied for the identification of disease as early as possible. In this methodology, the four ML algorithms are considered: RF, LR, SVM, and K-NN were applied to the breast WINSCONSIN dataset [22]. The performance has been evaluated on the basis of accuracy, recall, and f1-score [23]. The confusion matrix helps in calculating N-fold cross-validation techniques. The confusion matrix chart of each algorithm is presented (Fig. 6; Table 1). The analysis of random forest, SVM, logistic regression, and K-NN has been shown in Table 2.
Implementation of Machine Learning Techniques in Breast Cancer …
117
Fig. 5 Dataset denotes specific attributes
With reference to Table 2, the outcomes of models were assessed, and the evaluation has been drawn in Fig. 7 which can help for better understanding. According to the evaluation, a conclusion has been drawn that SVM outperforms than other algorithms in terms of N-fold cross-validation techniques.
5 Conclusion and Future Scope A model for identifying breast cancer has been developed using four machinelearning algorithms. Finding the deadliest cancer disease may be aided by the patient’s medical history. A dataset containing prior medical information about patients, including diagnosis, area, concave spots, perimeter, radius, and many other factors, is subjected to algorithms. Only 15 of the dataset’s 32 variables were chosen for machine learning model analysis after preprocessing techniques were applied to the dataset. Four machine learning (ML) techniques—KNN, random forest, SVM, and logistic regression—were used to build the target model. Our model accuracy with SVM is (95.90%), recall (94.91%), and f1-score (94.11%) which is higher than the accuracy of the prior system. In the forthcoming era, the work can be toward the diagnosis of other diseases which may help the doctors in the prognosis of disease at an early stage.
118
M. Rastogi et al.
K-NN
Logistic Regression Fig. 6 Confusion matrix graph of ML algorithms
Implementation of Machine Learning Techniques in Breast Cancer …
Random forest
SVM Fig. 6 (continued)
119
120
M. Rastogi et al.
Table 1 Confusion matrix values of ML algorithms Models
Accurate positives
Accurate negatives
Inaccurate positives
Inaccurate negatives
K-NN
73
85
7
6
Logistic regression
73
85
6
7
Random forest
86
85
5
5
SVM
74
89
5
3
Table 2 Comparison of ML techniques ML techniques
Accuracy
Recall
F1-Score
RF
94.15
92.06
92.06
SVM
95.9
94.91
94.11
Logistic regression
92.98
88.88
90.32
K-NN
92.39
85.71
89.25
Fig. 7 Performance analysis of machine learning models
References 1. Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. Springer Nat Comput Sci 139–160 2. Rastogi M, Goel N (2022) A review of machine learning algorithms and its applications. Vivekananda J Res 2(1):132–145 3. Simon A, Deo MS (2015) An overview of machine learning and its application. Int J Electr Sci Eng (IJESE) 1(1):22–24 4. Dey A (2016) Machine learning algorithms: A review. Int J Comput Sci Inf Technol 7(3):1174– 1179 5. Yedjou CG (2021) Application of machine learning algorithms in breast cancer diagnosis and classification. Int J Sci Acad Res 2(1):3081–3086
Implementation of Machine Learning Techniques in Breast Cancer …
121
6. Celik O, Altunaydin SS (2018) A research on machine learning methods and its applications. J Educ Technol Online Learn 1(3):25–40 7. Rastogi M, Vijarania M, Goel N (2022) Role of machine learning in the healthcare sector. In: International conference on computational and intelligent data science in Elsevier SSRN 8. Priyanka, Sanjeev K (2020) A review paper on breast cancer detection using deep learning. In: IOP conference series: materials science and engineering, pp 1–7 9. Alanazi SA, Kamruzzaman MM (2021) Boosting breast cancer detection using convolutional neural network. Hindawi J Healthc Eng 1–11 10. Vaka AR, Soni B (2020) Breast cancer detection by leveraging machine learning. Sci Dir 1–5 11. Seetharamulu KS (2018) Machine learning in healthcare: a review. In: ICECA 2018, IEEE Xplore, IEEE, pp 910–914 12. Valeria CM (2019) Machine learning techniques for personalized breast cancer risk prediction: comparison with the BCRAT and BOADICEA models, BMC, pp 21–31 13. Dhahri H, Al Maghayreh E (2019) Automated breast cancer diagnosis based on machine learning algorithms. Hindawi J Healt Eng (2019) 14. Ganggayah MD, Taib NA (2019) Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Making 19–35(2019) 15. Islam MM, Haque R (2020) Breast cancer prediction: A comparative study using machine learning techniques. Springer Nat Comput Sci 1–15 16. Gupta C, Gill NS (2020) Machine learning techniques and extreme learning machine for early breast cancer detection. Int J Innovative Technol Exploring Eng (IJITEE) J 9(4) 17. Kamal VK, Kumari D (2020) Use of artificial intelligence/machine learning in cancer research during the covid-19. APJCC, 5:251–253 18. Bepari SBS (2021) Breast cancer detection using machine learning techniques. Int J Eng Res Technol (IJERT) 10(07):98–103 19. Masood H (2021) Breast cancer detection using machine learning algorithm. Int Res J Eng Technol (IRJET) 8(02):738–747 20. Rasool A, Bunterngchit C (2022) Improved machine learning-based predictive models for breast cancer diagnosis. Int J Environ Res Publ Health 1–19 21. Omondiagbe DA, Veeramani SM (2019) Machine learning classification techniques for breast cancer diagnosis. In: IOP conference series: materials science and engineering, pp 1–16 22. Muhammet Fatik AK (2020) A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. MDPI Healthcare, pp 1–23 23. A1-Tam RM, Narangale SM (2021) Breast cancer detection and diagnosis using machine learning: A survey. J Sci Res The Banaras Hindu Univ 65(5)
Performance and Analysis of Propagation Delay in the Bitcoin Network Shahanawaj Ahamad, Suryansh Bhaskar Talukdar, Rohit Anand, Veera Talukdar, Sanjiv Kumar Jain, and Arpit Namdev
Abstract Bitcoin denotes virtual money that relies on a P2P network to spread and validate transactions. Because it is decentralized, Bitcoin is different from conventional currencies. In this study, we provide an event-based simulated model of the Bitcoin P2P network. To allow accurate parameterization of the provided simulation model, extensive evaluation of the actual Bitcoin network is made. Additionally, validation findings show that the simulation model performs very identically to the actual Bitcoin network. Here the suggested Bitcoin Clustering Based Super Node (BCBSN) protocol is evaluated using the built simulation model as a means of accelerating information transmission in the Bitcoin network. The results of the evaluation demonstrate that the proposed clustering methodology may minimize the transaction propagation latency by a respectable amount. Keywords Simulation validation · Propagation delay · Clustering analysis
S. Ahamad College of Computer Science and Engineering, University of Hail, Hail City, Saudi Arabia S. B. Talukdar School of Computer Science and Engineering, VIT Bhopal, Bhopal, MP, India R. Anand (B) Department of ECE, G. B. Pant DSEU Okhla-1 Campus (Formerly G. B. Pant Engineering College), New Delhi, India e-mail: [email protected] V. Talukdar Kaziranga University, Jorhat, Assam, India S. K. Jain Department of EE, Medi-Caps University, Indore, MP, India A. Namdev Department of IT, University Institute of Technology RGPV, Bhopal, MP, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_11
123
124
S. Ahamad et al.
1 Introduction Bitcoin is e-digital money that is not controlled by any central bank or government. Bitcoin [1] is instead supported by a network of miners and a cryptography system which runs on top of the Bitcoin P2P network [2]. In January 2009, the first version of the Bitcoin payment system was launched. In contrast to their name or different personal identifiable data, the one who is using Bitcoin system may be identified by their public key. Bitcoin has succeeded in its stated objective of establishing a universal digital currency that facilitates transaction on a worldwide scale. Bitcoin is regarded as a trustworthy currency that enables international transactions to be handled just as quickly as local ones. With just a few firms and enterprises utilizing Bitcoin, the acceptance of Bitcoin is gradually expanding toward becoming a substitute for other forms of actual cash like dollars or pounds. The Bitcoin system is still having certain security concerns, which is the cause of that. To increase consumers’ confidence in the Bitcoin system, it is crucial to look at Bitcoin’s resilience to security assaults and underlying network [3, 4]. Through the distributed validation and monitoring of transactions, Bitcoin’s distributed trust mechanism is made possible. This method ensures that all nodes inside the network ought to agree on which transactions are legitimate. By transmitting a Bitcoin agreement to all of the network nodes, this consensus is achieved. The consensus established in the Bitcoin network is recorded in a publicly distributed ledger [5]. All of the legitimate transactions that have ever been handled in the network are recorded in this ledger. Inconsistency in the clones of the public ledger is inevitable, since transactions are verified against it. This creates doubt on the legitimacy of a particular transaction, which might allow an attacker to use a double paying out attack to pay out a Bitcoin twice. The likelihood of a twice pay out attack in the Bitcoin system poses a threat to the security of the Bitcoin network and Bitcoin acceptance; hence it is crucial to examine this issue. We provide a framework for event-based modeling of the Bitcoin network called the Bitcoin-simulated framework in this study. The authors parameterize the offered framework using information from the actual Bitcoin network, since creating a framework of the real-world requires proof that the framework accurately captures the supposed reality. Here, • The authors offered the framework of the approach and examined how information spreads in the actual BN. The authors have also shown how propagation delay may compromise security [6] by allowing users to twice the cost and undermine the public ledger’s accuracy. • The extent of the BN, the delivery of session duration, and the distribution of latencies between nodes were precisely assessed here in order to parameterize the provided model. • The authors did validation of the offered framework that was acquired by relating the framework to the actual Bitcoin network (BN) depending on the transaction communication latency, to guarantee the framework’s validity before any experiments are developed.
Performance and Analysis of Propagation Delay in the Bitcoin Network
125
2 Literature Review The Bitcoin network has been studied using previous models of the system designed to measure transaction propagation latency and assess the viability of various assaults on the Bitcoin network. The Bitcoin reference client software is now being run experimentally using a newer shadow plug-in mechanism that was proposed in [7]. But there is no functionality for collecting statistics in this simulator. Additionally, since each Bitcoin client must carry out all the pricey blockchain communications and cryptographic processes required by the BN, it does not permit full-scale trials. Franzoni et al. [8] suggested a new Bitcoin model where the fundamental components of the Bitcoin client were turned into a simulation model. Here, the client code abstracted away all the computationally intensive cryptographic procedures. Despite this, since the model concentrates on evaluating a possible network segregation assault, it does not meet our objectives. By creating a Bitcoin client which keeps track of how transactions are distributed around the network, estimations of the transactional propagation latency in the actual BN were carried out. Prior propagation delay measures, however, do not accurately reflect the actual propagation delay because they do not capture the precise moment at which peers publish transactions. [9] has developed a model that takes into account several changes to the transaction dissemination protocol in order to accelerate the transaction propagation time. The fundamental tenet of this approach is that nodes examine each transaction they receive to see whether it has already been observed by other nodes in their pool. The authors added the transaction in their repository and sent it to the various nodes if it has never been seen before. If not, they pass the transaction on immediately to their neighbors without adding it to their pool. In this case, the node that generates the actual transaction might receive the phoney transaction. This implies that when a false transaction is received, the issuer of the genuine transaction will quickly recognize the taken double cost assault. To sum up, research in the propagation delay sector has been confined to a small number of changes to the Bitcoin network protocol that have a poor rate of success in reducing propagation delay.
3 Simulation Model for the BTC Network We go through the simulation model, its validation and its parameterization in this part. The produced simulation model is an event-based simulation that was made using measurements from the actual Bitcoin network and the Bitcoin protocol specification. Our built model’s primary objective is to assess the theory behind our suggested clustering strategy for achieving a quicker transaction propagation time. A. The Bitcoin protocol A distributed public ledger that keeps track of all the Bitcoins in the system is the foundation of the Bitcoin protocol. The virtual money is transmitted via every admission in the record and is regarded as a transaction. In the Bitcoin network,
126
S. Ahamad et al.
inputs and outputs are part of every transaction. A transaction’s output, which is utilized as an input in subsequent transactions, serves as a reminder of who the newer holder of the transmitted Bitcoins is. If a directed graph is created by transactions, constants pertaining to transaction records are provided. Every transactional input has a digital signature [10] which permits access to the output of earlier transactions. The digital signature is entirely produced by the Bitcoin user who is in possession of the necessary private key, ensuring that only the owners of Bitcoins may spend them. B. Propagation of transactions on the Bitcoin network The pertinent features of Bitcoin are briefly described in this part, and we also go over how the propagation of Bitcoin transactions impacts the synchronization in the public ledger. Every node in the BN arbitrarily establishes a TCP channel connection with another node. Every node manages an array of the IPs of the peers with whom it initiates connections. Each node keeps track of a penalty score for each connection. Any connection that acts badly by sending a corrupted message raises the score. When the score hits 100, the bad IP address is blocked. Transactions and blocks are two categories of information which are broadcast across the Bitcoin network. Users send bitcoins to one another via transactions. Blocks, which make up a portion of the ledger are what allow transactions through the nodes in the BN to be chronologically ordered. As depicted in Fig. 1, the Bitcoin network achieves transaction dispersion by communicating two different message categories: an INC message and an Ackn message. Once a node gets a transaction via any of its neighbors, it transmits an INC message that includes the hash of transaction which requires to be declared [11, 12]. The hash of transaction is examined to determine if it was seen previously when an INC message is acknowledged by a node. If the node has never seen the transaction before, it will issue an Ackn message for requesting it. A node provides the agreement’s data in response to the Ackn message it has just received. An INC message is issued to a randomly chosen connected peer every 100 ms. This indicates that the time needed to convey the INC message directly depends on the quantity of linked nodes. A delay in transaction propagation occurs because transmitting any transaction on the BN requires passing through the action of INC and Ackn messages. Due to the public ledger’s irregularity, which creates a chance to an attacker for manipulating the network consensus, this delay will have an impact on the Bitcoin network’s capacity to scale. The public ledger’s inconsistency will boost attackers for double spending Bitcoins. When a merchant takes Bitcoin payments and ships goods beyond waiting time for the transactions’ affirmation, double paying attacks are most common with quick payments. Double paying attacks specifically occur if an attacker generates two transactions (BC and BD). The identical Bitcoin origin serves as the input for these transactions, but the outputs are different (dissimilar receivers, if we have two transactions, BC will reach to most of the peers and BD will reach to the vendor). When most peers take BC but the seller only accepts BD, the double spending assault is said to be effective. This indicates that BC is verified before BD. The merchant would not be able to redeem the BD if BC was accepted by
Performance and Analysis of Propagation Delay in the Bitcoin Network
127
INC Message
Ackn
Transaction
Node A
Node B
Fig. 1 Mechanism for transaction propagation connecting nodes A and B
succeeding blocks as the actual transaction, since doing so will be deemed an invalid transaction because money has already been spent.
4 Assessment and Parameterization of the Bitcoin Network In this part, actual Bitcoin network (BN) metrics will be monitored. These data are crucial for appropriately parameterizing the model that is being presented. We display the distributions of the key variables that directly affect client activity and information dissemination in the actual BN. The characteristics include the quantity of accessible nodes, peer linking latencies, and peer session durations. We put in place a Bitcoin client to initiate with the measure of a peer’s session duration. By connecting to every available peer in the network, the client is utilized to crawl the entire BN. Additionally, the built client broadcast snapshots of the IP addresses of the available peers every three hours. We were able to identify the times when peers entered or departed the network by utilizing the information collected by running the built crawler for a week. Figure 2 displays the delivery of session duration in a live Bitcoin network. Three-thousand five hundred peers did not quit the network throughout the observation period, according to the session length distributions. The network’s stability varies when these distributions are taken into consideration. This may cause a significant shift in the topology. The crawler also has the ability to gauge the network’s size. The crawler discovered 326,547 IPs, however it could establish connections with 6289 peers, putting the current size of the Bitcoin network at roughly 6500 nodes. The arrangements were gathered by running a crawler that had been constructed and linked to around 5000 network peers, while tracking 33,750 ping/pong messages. It must be remembered that these measured distributions show how late the developing
128
S. Ahamad et al.
Fig. 2 Peer session duration in the BN and the measuring node
crawler is in relation to other peers in the network. The simulation model might be adjusted to include the provided link latencies distribution, to provide an accurate estimation of the time it takes for a transaction to reach various network peers.
5 BTC Model Structure and Validation The model that is being described is a simple, event-based simulation that abstracts away the cryptographic features of the Bitcoin system. Instead, it concentrates on the round-trip transaction time delay and the Bitcoin overlay network. Java was used to create the simulation model because of its object-oriented feature. The activity of a Bitcoin client is framed as an ordered series of clearly interpreted events using the discrete event simulation paradigm. These discrete events, which happen at precise times throughout the simulation, constitute a particular state change for the system. Two concepts of time—simulation time and run time—will be considered in our discrete event simulator. While run time denotes the amount of processor time used by a certain thread, simulation time represents the virtual time in the simulated environment. In contrast, simulation time directly affects how the simulation events are set up and the accuracy of the findings obtained. In particular, when thread A executes event E1, as illustrated in Fig. 3, event E1 should schedule event E1, restore that signifies a prosperous return from event E1. The E1, restore has to be scheduled at a certain simulation time point that is determined after adding the necessary delay. The time distributions which are affixed to the model are where this delay is gathered.
Performance and Analysis of Propagation Delay in the Bitcoin Network
129
Simulation Time
E1
E2
E3
E4
E5
E1Return
Fig. 3 Bitcoin event dependent simulated depiction
A. Validation Here, we present calculations of the transaction propagation latency in both the actual BN and in a simulated network. These tests are crucial to determine if the proposed model behaves as closely to the actual network as feasible, since a number of characteristics of the real BN, including client action, processing time, and network architecture directly affect transaction propagation. The Bitcoin protocol was created and utilized to provide links to several sites in the network for monitoring the time it takes for a transaction to arrive at every point. This allowed researchers to determine how quickly a transaction propagates over the BN. In-depth, we first created a measurement node that functions precisely like a regular node and has the following features. Ten network peers are connected to by the measurement node. It may also establish a legitimate transaction, propagate it to a connecting peer, and thereby following the transaction for noting the precise moment where every connecting peer declares the transaction. Consider a client c that has n connections, propagates a transaction at time T c , and its associated nodes get it at various times (T 1 ,…, T n ), as shown in Fig. 4. The time intervals between the initial propagation of a transaction and successive receptions of the transaction by associated nodes were determined as follows: Δtc , = Tn − Tc
Fig. 4 Diagram of the experimental setup for propagation
(1)
130
S. Ahamad et al.
where T n > T n−1 > · · · > T 2 > T 1. Figure 5 shows the average distribution of Δt c for both the actual BN and the simulated network. The results show that the transaction propagated more quickly within the first 13 s, and that six nodes got it with little variation in the timing of their receptions. It should be observed that nodes (9 and 10) have much longer propagation delays for transactions, meaning that these nodes have seen far wider variations in propagation delays while receiving transactions from other nodes. The percentages of announcing transactions for each node are shown in Fig. 6.
Fig. 5 Comparison between modeling findings and the distribution of Δt c as in BN
Fig. 6 Percentage of nodes that announced the transaction, per node
Performance and Analysis of Propagation Delay in the Bitcoin Network
131
6 Bitcoin Network Transaction Propagation Delay Improvement through Clustering Approach As previously established, the delay in information transmission makes it impossible to prevent consistency issues with the public ledger in the Bitcoin network. Transaction verification takes longer as a consequence. Inconsistency in the ledger replication may also make it difficult to determine if a particular transaction is genuine, which might allow an attacker the opportunity to engage in double spending attacks. In light of this, we test our simulation model’s simulation model to see whether clustering may fasten transaction propagation. In particular, we present the Bitcoin Clustering Based Super Node (BCBSN) [13], a revolutionary cluster formation technique. With the help of this protocol, the Bitcoin network will develop a number of clusters that are geographically different. The BCBSN protocol designates one node as the cluster head for each cluster, which is in charge of looking after the cluster. The following subsections will go into great depth on both stages as well as the findings of BCBSN’s review. A. Algorithm for super peer selection The network’s coordinator nodes are chosen in this step to serve as the cluster’s coordinator (super peer). Each super peer creates a cluster by connecting to the nodes that are nearest to it geographically. The network’s super peers are completely interconnected and aware of one another. The proposed peer selection technique differs on the criteria for choosing the super nodes, since we are conscious of security [14]. In other words, the super peer is chosen from the node with the lowest ID (each node has a unique ID). The handling of a reputation protocol forms the basis of our super peer selection process. This protocol works by having each node refer to a weight that is a positive real integer. Based on each node’s Bitcoin burn rate and duration of operation, the weight is determined. As a super peer, the node with the highest weight is chosen. The key benefit of this strategy is that it would be difficult for a rogue node to pretend to be a super peer. Additionally, the nodes that are more appropriate for playing the function of the super peer would do so. B. Assessment of the BCBSN protocol Utilizing the created Bitcoin simulator, the suggested BCBSN protocol’s performance is assessed. The time it takes for a transaction to travel from a starting node to every peer of the origin’s linked nodes is known as the transaction propagation delay, and it is utilized as an attainment parameter in the simulation [15]. The design of the experiment and its outcomes are discussed in the next subsections. (1) Setup for the test In this experiment, two localized clusters create the BCBSN protocol. As a result, the aforementioned super peer selection technique will sometimes be used in the simulated network to choose two super peers. These super peers are dispersed over two distinct continents. When a super peer’s position matches that of a node, we presume
132
S. Ahamad et al.
that the super peer is seen as being closer to the node. Following the formation of two clusters by the chosen superior peers, typical BN activities are planned. Superior peer choice and cluster upkeep are performed periodically. We calculated the transaction propagation delay utilizing similar methods as in Section V since our simulations are predicated on determining how quickly a transaction spreads across the network after utilizing our clustering strategy. By doing so, we may assess the BCBSN protocol by contrasting the calculations of the transaction propagation latency obtained in this experiment, with those obtained in the implemented Bitcoin protocol. The simulation model’s node count was constrained to the dimension of the actual BN. The operation of the simulation experiment is shown simply in Fig. 7. The authors presume that the nodes in the simulated model are divided in two clusters, A and B, based on their proximity. This will result in a decrease in the length connecting nodes in a cluster. In order to monitor the transaction and record the time it takes for every peer in its association to announce the transaction, we created a measurement node called c. It may construct a legitimate transaction called Tx and transmit it to one of its linked nodes. As a result, the approach used in Section V to assess transactional propagation delay on the actual BN is used to determine transaction propagation delay. (2) Results The distributions of Δt c for the BCBSN protocol and simulated Bitcoin protocol which were gathered in Sect. 5 are shown in Fig. 8. When there are more linked nodes, the variation of delays for both protocols grows. This occurred because, as was noted in Sect. 5, the quantity of nodes that are not demographically confined adversely correlates with propagation latency. However, as illustrated in Fig. 8, the BCBSN protocol’s linked nodes were able to receive the transaction with less variability in latency than the simulation of Bitcoin protocol. It suggests that the BCBSN protocol outperforms the simulated Bitcoin network protocol in terms of Δt c distributions. The BCBSN protocol’s ability to reduce transaction propagation time variances is due in part to its ability to minimize the quantity of hops that a transaction must make. Each cluster’s nodes are also physically localized. The connection latencies between
Fig. 7 Simulated implementation layout
Performance and Analysis of Propagation Delay in the Bitcoin Network
133
Fig.8 Comparison between results of the BCBSN protocol simulation and distribution of Δt c as assessed by the simulation of Bitcoin protocol
the nodes at every cluster may be decreased as a result. To increase information transmission at a faster pace, it is worthwhile to investigate the ideal number of clusters. We will include this in our ongoing work.
7 Conclusion This study offered an approach of the BN that allows for extensive simulation of the BN. We offered the framework of the approach and examined how information spreads in the actual BN. The authors have also shown how propagation delay may compromise security by allowing users to twice the cost and undermine the public ledger’s accuracy. Additionally, a short explanation of earlier work to evaluate and assess the information propagation latency was provided. The extent of the BN, the delivery of session duration, and the distribution of latencies between nodes were precisely assessed here, in order to parameterize the provided model. To gauge the transaction propagation time in the actual BN, we created a cuttingedge algorithm. Transaction propagation measurements reveal that the quantity of linked nodes including the network structure, that is not geographically confined, shows a substantial influence on the transactional propagation time. Partitions in the connection graph are also identified. The proposed approach was checked opposed to analysis of transaction propagation in the actual BN. The provided model acts closely to the actual BN and can be shown from the validation findings. This article presents a revolutionary clustering mechanism for the Bitcoin network. We were able to assess the proposed BCBSN clustering procedure using the created simulator. The
134
S. Ahamad et al.
results of the evaluation showed that the BCBSN protocol effectively reduces the transaction propagation latency.
References 1. Huynh ANQ, Duong D, Burggraf T et al (2022) Energy consumption and bitcoin market. Asia-Pac Financ Markets 29:79–93. https://doi.org/10.1007/s10690-021-09338-4 2. Bansal R, Jenipher B, Nisha V, Jain, Makhan R, Dilip, Kumbhkar, Pramanik S, Roy S, Gupta A (2022) Big data architecture for network security, in cyber security and network security, Eds, Wiley 3. Bommareddy S, Khan JA, Anand R (2022) A review on healthcare data privacy and security. Networking Technol Smart Healthc 165–187 4. Tripathi A, Sindhwani N, Anand R, Dahiya A (2022) Role of IoT in smart homes and smart cities: challenges, benefits, and applications. In: IoT based smart applications. Springer International Publishing, Cham, pp 199–217 5. Sadeghi M, Mahmoudi A, Deng X (2022) Adopting distributed ledger technology for the sustainable construction industry: evaluating the barriers using Ordinal Priority Approach. Environ Sci Pollut Res 29:10495–10520. https://doi.org/10.1007/s11356-021-16376-y 6. Dahiya A, Anand R, Sindhwani N, Kumar D (2022) A novel multi-band high-gain slotted fractal antenna using various substrates for X-band and Ku-band applications. Mapan 37(1):175–183 7. Möser M, Narayanan A (2022) Resurrecting address clustering in bitcoin. In: Eyal I, Garay J (eds) Financial cryptography and data security. FC 2022. Lecture notes in computer science, vol 13411. Springer, Cham. https://doi.org/10.1007/978-3-031-18283-9_19 8. Franzoni F, Salleras X, Daza V (2022) AToM: active topology monitoring for the bitcoin peerto-peer network. Peer-to-Peer Netw. Appl. 15:408–425. https://doi.org/10.1007/s12083-02101201-7 9. Gelashvili R, Kokoris-Kogias L, Sonnino A, Spiegelman A, Xiang Z (2022) Jolteon and Ditto: network-adaptive efficient consensus with asynchronous fallback. In: Eyal I, Garay J (eds) Financial cryptography and data security. FC 2022. Lecture notes in computer science, vol 13411. Springer, Cham. https://doi.org/10.1007/978-3-031-18283-9_14 10. Gupta A, Asad A, Meena L, Anand R (2022) IoT and RFID-based smart card system integrated with health care, electricity, QR and banking sectors. In: Artificial intelligence on medical data: proceedings of international symposium, ISCMM 2021. Springer Nature Singapore, Singapore, pp 253–265 11. Garg P, Anand R (2011) Energy efficient data collection in wireless sensor network. Dronacharya Res J 3(1):41 12. Meelu R, Anand R (2010) Energy efficiency of cluster-based routing protocols used in wireless sensor networks. In: AIP conference proceedings, vol. 1324, No. 1. American Institute of Physics, pp 109–113 13. Mandal A, Dutta S, Pramanik S (2021) Machine intelligence of pi from geometrical figures with variable parameters using SCILab, in methodologies and applications of computational statistics for machine learning. In: Samanta D, Althar RR, Pramanik S, Dutta S, (eds) IGI Global, pp 38–63. https://doi.org/10.4018/978-1-7998-7701-1.ch003 14. Babu SZD et al (2023) Analysation of big data in smart healthcare. In: Gupta M, Ghatak S, Gupta A, Mukherjee AL (eds) Artificial intelligence on medical data. Lecture notes in computational vision and biomechanics, vol 37. Springer, Singapore. https://doi.org/10.1007/ 978-981-19-0151-5_21
Performance and Analysis of Propagation Delay in the Bitcoin Network
135
15. Gupta A, Singh R, Nassa VK, Bansal R, Sharma P, Koti K (2021) Investigating application and challenges of big data analytics with clustering. In: 2021 international conference on advancements in electrical, electronics, communication, computing and automation (ICAECA). pp 1–6. https://doi.org/10.1109/ICAECA52838.2021.9675483 16. Anand R, Ahamad S, Veeraiah V, Janardan SK, Dhabliya D, Sindhwani N, Gupta A (2023) Optimizing 6G wireless network security for effective communication. In: Innovative smart materials used in wireless communication technology. IGI Global, Hershey, pp 1–20
Machine Learning Analysis on Predicting Credit Card Forgery S. Janani, M. Sivarathinabala, Rohit Anand, Shahanawaj Ahamad, M. Ahmer Usmani, and S. Mahabub Basha
Abstract Occurrences comprising of credit card fraud arise frequently and result in spending a huge amount of fund. Online credit card transactions nowadays consist of a major part of various transactions driven online, that has remarkable expansion. Thus, banks and different financial institutions impart important and desirable credit card fraud detection programs. Fraudulent transactional activities can have various patterns and comprise of various types. The four major fraud schemes in reallife transactions are the subject matter of this investigation. A majority of machine learning algorithms are utilized to handle every scam, and the optimal technique is finally selected after scrutiny. This research suggests an in-depth guide for selecting the superior approach for the type of frauds, and the authors furnish a suitable performance metric to demonstrate the assessment. Real-world credit card fraud detection is a further important subject that the authors represent in this paper. To decide if a certain transaction is legal or deceitful, the authors apply predictive analytics conducted by the unified machine learning approaches and an API package. The authors assess an unparalleled technique which conveniently rectifies the skewed S. Janani Department of ECE, Periyar Maniammai Institute of Science and Technology, Thanjavur, Tamil Nadu, India M. Sivarathinabala Department of ECE, Velammal Institute of Technology, Chennai, Tamil Nadu, India R. Anand (B) Department of ECE, G. B. Pant DSEU Okhla-I Campus (Formerly G. B. Pant Engineering College), New Delhi, India e-mail: [email protected] S. Ahamad College of Computer Science and Engineering, University of Hail, Hail City, Saudi Arabia M. A. Usmani Department of CSE, Department of Engineering and Technology, Bharati Vidyapeeth Deemed to be University, Navi Mumbai, India e-mail: [email protected] S. M. Basha Department of Commerce, IIBS Bangalore Airport Campus, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_12
137
138
S. Janani et al.
distribution of the data also. In accordance to a confidential declaration treaty, the bank supplied the data for this research. Keywords Skewed distribution · Real-time credit card fraud detection · Confidential disclosure agreement · Fraud detection system · Credit card frauds
1 Introduction Embracing the advancement of cutting-edge technology and worldwide connection, fraud is sharply increasing. The two essentials approached to eliminate fraud are identification and avoidance [1]. By helping as an extra layer of protection, avoidance hampers any efforts from cheaters. Once avoidance has been abortive, detection phase occurs. Determination hence aids in finding and cautioning as soon as a fraudulent transaction initiates. Web payment gateways are increasingly using without-card transactions for credit card (CC) functioning [1]. Online payment systems produced more than $31 trillion globally in 2019, up with 8.6% from 2018, according to the Neilson analysis from November 2020. Credit card fraud damages globally expanded to $34 billion in 2018 and may reach $45 billion by 2025. Still, there is a sharp rise in fraud transactional activities that has a huge effect on the economic system. Several subcategories of CC fraud are present. Card-isn’t-present (CNP) and card-is-present (CP) frauds are the two primary fraud categories which may be found in a collection of transactions. These two categories may be further divided into behavioral fraud, application fraud, theft/counterfeit fraud, and bankruptcy fraud. Our research intends to address four fraud types that fall within the above-described CNP fraud category in real time. This generation’s replacement for such techniques is machine learning, which can handle massive datasets that are difficult for humans to handle. Here, the authors focus on the following: • Using best-fitting approaches to identify four well-defined patterns of fraud transactions, we offer a unique CC fraud detection method here. • Predictive analytics along with API package are utilized to identify CC fraud in real time, and the end-user is alerted through the GUI instantly when a fraud transaction takes place. • The logistic regression, Naïve Bayes, and support vector machine, ML algorithms had the greatest rates of accuracy in detecting the four fraud patterns discussed later. Furthermore, the models produced respective accuracy rates of 81%, 79%, 83%, and 92%. Supervised learning and unsupervised learning are the two primary divisions of machine learning methods [2, 3]. Fraud detection may be carried out in any method, and the dataset will choose when to employ it [1]. In order to learn under supervision, anomalies must first be classified. Several supervised algorithms have been used to the detection of credit card fraud during the last several years. The two primary analyses of the data employed in this research are categorical analysis and numerical
Machine Learning Analysis on Predicting Credit Card Forgery
139
analysis. Initial data in the dataset are categorical. By using data cleaning and other fundamental preparation methods, the raw data may be prepared. In order to do the assessment, the relevant procedures must first be used to convert categorical data into numerical data. Second, to discover the best algorithm, machine learning approaches employ categorical data [4–6]. This study compares a wide range of artificial intelligence approaches using an efficient performance metric in identification of fraud credit card transactions in order to choose the finest techniques for the four fraud types. The rest of this paper is laid out in the following manner. Review of the literature is presented in Sect. 2 The implemented approach and findings are presented in Sect. 3. In Sect. 4, the paper’s findings are offered.
2 Literature Review It is essential to determine the automation involved in CC fraud detection and in having a thorough comprehension of the different categories of CC fraud because numerous approaches, ranging from supervised approaches, unsupervised approaches, and hybrid ones were suggested in earlier researches to bring solutions to find fraud. As fraud trends changed over time and new types of fraud emerged, scholars’ interest in the topic grew. The remaining part of the section details the fraud detection systems, and ML models which were utilized. The problems identified by the evaluation have been examined for the subsequent use of an effective ML model. Researches previously have found various concerns in fraud detection via the scrutiny of various detection approaches. Researches show the deficiency of real-world data as a significant hurdle. The transaction databases’ very less quantity of frauds compared to non-frauds is the reason for this. Data mining [7] approaches need much time to act while functioning with voluminous data, in accordance with [8]. A different major drawback with the creation of CC transaction data is data overlap. According to various researches, the difficulty is caused in specific cases when normal transactions mimic fraud cases perfectly. Different manner that fraud transactions may masquerade as honest cases is via appearances. Moreover, they are into huge concerns managing categorical data. The maximum quantity of the features while taking into account the CC transactional data contains categorical values. Mostly, various ML approaches in these circumstances do not receive categorical values. Since majority of the ML approaches take longer time span in training than prediction, [9] cite the solution of detecting techniques and feature selection as an obstacle in estimating frauds. The feature selection denotes a major issue which has an impact on monetary fraud detection. It tends to remove the features which perfectly illustrate various fraud predicting components. The cost of fraud detection and a need of flexibility are cited in paper [10] as a hurdle in the fraud detection technique. The cost of fraud and the expense of avoidance must be considered when planning a framework. If the algorithm is encountering unique fraud patterns and
140
S. Janani et al.
distinctive transactions, it is devoid of flexibility. Having a good grip of the performance metric is needed, as effectiveness can alter based on the issue description and its parameters. For detecting CC fraud, various approaches are used. Various approaches are used in those approaches. If the ML technique has to be re-trained as fraud tendency has desperately altered, it can be burdensome, costly, and risky to adapt the fraud detection system to newly detected frauds. As an example, Tyler et al. modified a system suggested in various studies simulated the technique, and then used the approach to analyze a real-life transaction log. The categorization problem was solved utilizing support vector machine. Using Naïve Bayes model, the events of fraud transactions was discretized into tactics. Here, the class disparity was discussed utilizing the synthetic minority oversampling process. Sensitivity analysis was utilized to specify the significance of predictions in terms of commercial evaluation. The results showed that a realistic technique which re-trains an approach having the minimum steps is efficient of performing at least as good as a classifier which basically re-trains every epoch. An additional approach, the risk-based ensemble, is capable of handling data comprising of difficulties and provide excellent outcomes. A very efficient bagging approach was utilized to handle unlabeled data. They employed the Naive Bayes technique for addressing the inherent noise in the transactional dataset. Peter et al. assessed the fruitfulness of various DL approaches. Artificial neural networks [11], recurrent neural networks, gated recurrent units, artificial neural networks, and long short-term memories make up the four topologies. In their project, they utilized undersampling other than data cleaning and various data preparation techniques for solving the worry of class imbalance and scalability. The sensitivity analysis was performed to find out which hyper-parameters were most impactful on the performance of the model. They suggested that the network’s span has an effect on the technique’s performance. They concluded that a larger network performed better. The dissimilar distribution of CC data, often known as the class imbalance, is a problem. The researchers of the project, Hayashi et al. [12] assert that it explains class imbalance and various problems comprising of idea drift and authentication delay. Moreover, they furnished a demonstration of the major effective performance matrix for estimating CC fraud. The research’s achievements also consist of an explicit approach, a strong learning approach, and an “alarm and assessment” framework for handling with “authentication delay”. The efficiency of the alerts is deemed as the most pivotal element by tests. To enhance the efficiency rates in CC fraud detection, Sharma et al. engaged 12 conventional approaches and hybridized approaches that combine AdaBoost and most of the voting approaches [13]. They were considered depending on standard data and actual world data. The techniques’ advantages and disadvantages were analyzed briefly. Data was given noise for assessing the technique’s resilience. Moreover, they had shown that the additional noise had no result on the major voting techniques. The analysis of acutely unbalanced data in paper [14] shows that k-nearest neighbor achieves successfully for specificity and sensitivity, except accuracy. The study [15] analyzed globally utilized supervised learning methods and furnished an in-depth evaluation of similar techniques. Moreover, they have shown that each technique differs determined by the issue domain.
Machine Learning Analysis on Predicting Credit Card Forgery
141
3 Proposed Method A. Dataset The fraud transactional log file along with the transactions log file was merged to model the dataset. All instances of online credit card fraud are recorded in the fraud transactional log file, whereas the all transactions log file records all transactions that the relevant bank has saved over a certain amount of time (given in Tables 1 and 2). Most of the sensitive feature, like the card number, was hashed because of the secret disclosure agreement struck between the financial institution and the researchers. Because of the uneven distribution of honest and fraud activity in the merged dataset, the data’s shape was noticeably distorted. Whereas the transactional log file had 867,723 entries, the fraud cases file contained just 200 records. The two data sources’ attributes are shown below. B. Preparation of data According to their fraud tendency, the raw data were initially separated into four datasets. The information that the bank had acquired was used in this procedure. There are four datasets: (i) Insecure merchant category code transactions (MCC). (ii) Transactions greater than $100. (iii) Risky ISO response code transactions. (iv) Transactions with an undisclosed site address. Table 1 Features of the authentic transaction log
Field name
Description
Card_number
No. in credit card
Date
Date_of_agreement
Time
Time_of_agreement
Transaction_value Value_of_agreement
Table 2 Features of the fraud transaction log
Merchant_Name
Name_of_merchant_applicable_with _the_ agreement
Merchant_City
City registration of the_ merchant
Field name
Description
Card_no
Number in credit card
Date
Date_of_transaction
Time
Time_of_transaction
SEQ
A distinct sequence number that was specified for frauds
Fraud nature
Description of the frauds as card is present or absent
142
S. Janani et al.
Fig. 1 Data mapping
Two alternative methods were applied with the four datasets: (i) By putting unprocessed data into numerical form, (category A). (ii) By categorizing and organizing raw data sans applying any transformations (category B). Datasets 1, 2, and 3 received category A treatment, whereas category B received treatment for dataset number 4. Data are cleansed, combined, and minimized in the data preparation phase. All of the aforementioned procedures were used to prepare the first three sets of data numerically. All of the stages, except the data transformation, were utilized to prepare the data categorically. Below is a description of the category A’s fundamental stages. Data cleaning—One crucial step in the data cleaning process is to replace any left out information. There are other answers to this issue, like ignoring the entire tuple, but most of them are probably to skew the results. Filling them was no longer complicated since the source file that included actual transactions didn’t consist of entries with missing data. The files were cleared of any tuple of meaningless value since they did not provide any significant data and did not skew the data. In addition, the date and time field was divided into two after alterations like deleting extra columns. Data integration—Since fraud and legitimate record files were in two different files, the two data sources were merged prior to the data were exposed to another alteration. Figure 1 demonstrates the mapping procedure. Data modification: Each and every category of the data in this instance were combined into a comprehensible numerical representation. The transactional dataset consists of various data kinds and ranges. Data normalization is therefore a component of data transformation [16, 17]. Data attribute is scaled to fit within a certain numeric range during normalization. Data reduction—Dimension reduction is the method used in this. The danger of discovering incorrect data patterns must be avoided, and the chosen features must get rid of the fraud domain’s irrelevant characteristics. The well-known PCA, or principal component analysis, is a common transform technique. The feature selection problem is solved using this approach from the standpoint of numerical analysis. By calculating the appropriate quantity of main components, PCA carried out feature
Machine Learning Analysis on Predicting Credit Card Forgery
143
selection effectively. Data integration and data cleansing were as important in category B as they were in category A. These data were then transferred to the succeeding step of the procedure. C. Resampling Methods A very unbalanced distribution of cases among the classes distinguished the two data sets. There were very less instances of fraud transactions than there were of legal ones. This was addressed by doing undersampling and oversampling, [18, 19] respectively, by minimizing major events and increasing minor occurrences. Condensed nearest neighbor (CNN) and random undersampling (RUS) were utilized for undersampling and synthetic minority oversampling techniques (SMOTE) were utilized for oversampling. In the SMOTE approach, the minority class is oversampled. Among RUS and CNN, RUS is a non-heuristic technique which evens out the distribution of the classes by removing samples from the random majority class. Additionally, tenfold cross-validation was employed. The data which had undergone cross-validation were then resampled utilizing the prior-mentioned resampling procedures [20]. D. Testing and modeling Four distinct fraud patterns are examined in our research. As shown in Fig. 2, we have mirrored the following procedure for assessing each pattern. Many different strategies were used in the data analysis. With the help of the literature survey, four ML methods were given the highest priority in our investigation. They are logistic regression, KNN, SVM, and Naive Bayes [21]. To our resampled data, we used the chosen supervised learning classifiers. The accuracy and performance of every algorithm were taken into account while choosing machine learning models that can detect each scam. By filtering them out in comparison with a suitable performance matrix, the best models were chosen (Eq. 1–7). Accuracy =
TN + TP TP + FP + FN + TN
(1)
TP TP + FP
(2)
Precision = Recall =
TP TP + TN
(3)
True Positive Rate =
TP TP + FN
(4)
False Positive Rate =
FP FP + TN
(5)
F1 Measure =
2 × (Precision × Recall) Precision × Recall
(6)
144
S. Janani et al.
Fig. 2 Model choice
TP × TN − FP × FN MCC = √ (TP + FP)(TP + FN)(TN + FP)(TN + FN)
(7)
where TN is true negative, TP is true positive, FP is false positive, and FN is false negative. ROC is the true positive rate against the false positive rate. The accuracy rates from the four forms of fraud when the ML classifiers are used on preprocessed and resampled data are shown in the below graphs. E. Deception tracking in real time Previously, fraud was discovered by mass-applying machine learning models to already-completed transactions. Since results take longer time (from few weeks to months) to become available, it may be very challenging to track down frauds that have already been committed. In many situations, fraudsters have been able to make several further fraudulent transactions before being discovered. Real-time fraud tracking involves running fraud detection algorithms as soon as an online transaction is made. As a result, our technology can identify scams in real time. The bank receives a notification that details the fraud pattern and accuracy rate, creating a simple task for fraud monitoring squad to go on to the later phase without misusing time or resources. F. System for detecting fraud Real-time tracking of CC fraud may be mentioned as primary accomplishments of this research. The three primary elements of the real-time fraud tracking framework are the API module, fraud tracking models, and data. Fraud detection involves all of the elements at once. Utilizing three supervised learning classifiers, fraud transactions are divided into four categories (frauds occurring because of hazardous MCC,
Machine Learning Analysis on Predicting Credit Card Forgery
145
Fig. 3 Insecure MCC results
Fig. 4 Outputs of the ISO-response code
ISO-responsive code, unrevealed web address, and transactions exceeding $100). It is shown in Figs. 3, 4, 5 and 6. Real-time transactions between the fraud estimation framework, GUI, and data warehouse are sent through the API module. Live transactions, predicted outputs, and notable data of the ML algorithms are reserved in a data warehouse. The fraud detection framework offers GUIs via which the user may interact. These GUIs display real-world transactions, fraud warnings, and historic fraud data. A notice should be furnished to the API package whereby the fraud detection framework determines a transaction as fraud. A notice will then be sent by the API module to the end-user, and feedback provided by the end-user will be saved. The complete work flow of the fraud estimation framework is depicted in Fig. 7.
146
Fig. 5 Outputs from an unknown web address
Fig. 6 Results of > 100$
Fig. 7 System diagram
S. Janani et al.
Machine Learning Analysis on Predicting Credit Card Forgery
147
4 Conclusion The identification of CC fraud is a topic of interest for academics, and it will be a substantial matter in the future. It is mainly caused by the ongoing alteration of fraud pattern. Using best-fitting approaches to identify four well-defined patterns of fraud transactions, we offer a unique CC fraud detection method here. We also address relevant issues raised by prior researches on CC fraud detection. Predictive analytics along with API package are utilized to identify CC fraud in real time, and the end-user is alerted through the GUI instantly when a fraud transaction takes place. The fraud investigation squad may choose to move on to the later stage when a doubtful transaction is disclosed. According to the technique, the best algorithms that handle the four primary categories of frauds were chosen via research, experimentation, and parameter adjustment. We also evaluate sampling techniques that successfully deal with skewed data distribution. We may thus draw the conclusion that applying resampling techniques has a major influence on getting a classifier to perform relatively better. The logistic regression, Naïve Bayes, and support vector machine, ML algorithms had the greatest rates of accuracy in detecting the four fraud patterns discussed. Furthermore, the models produced respective accuracy rates of 81%, 79%, 83%, and 92%. We want to concentrate on raising the prediction levels for getting a superior forecast, since the established ML algorithms now possess an average level of accuracy. Additionally, location-based scams are the focus of the next developments.
References 1. Xia H, Zhou Y, Zhang Z (2022) Auto insurance fraud identification based on a CNN-LSTM fusion deep learning model. Int J Ad Hoc Ubiquitous Comput 39(1–2):37–45 2. Singh H, Ramya D, Saravanakumar R, Sateesh N, Anand R, Singh S, Neelakandan S (2022) Artificial intelligence based quality of transmission predictive model for cognitive optical networks. Optik 257:168789 3. Sindhwani N, Anand R, Meivel S, Shukla R, Yadav MP, Yadav V (2021) Performance analysis of deep neural networks using computer vision. EAI Endorsed Trans Ind Netw Intell Syst 8(29):e3–e3 4. Chawla P, Juneja A, Juneja S, Anand R (2020) Artificial intelligent systems in smart medical healthcare: current trends. Int J Adv Sci Technol 29(10):1476–1484 5. Jain S, Sindhwani N, Anand R, Kannan R (2022) COVID detection using chest X-Ray and transfer learning. In: International conference on intelligent systems design and applications. Springer, Cham, pp 933–943 6. Pandey D, Pandey BK, Sindhwani N, Anand R, Nassa VK, Dadheech P (2022) An interdisciplinary approach in the post-COVID-19 pandemic era. An Interdisc Approach Post-COVID-19 Pandemic Era 1–290 7. Pramanik S, Galety MG, Samanta D, Joseph NP (2022) Data mining approaches for decision support systems. In: 3rd international conference on emerging technologies in data mining and information security 8. Samanta D, Dutta S, Galety MG, Pramanik S (2021) A novel approach for web mining taxonomy for high-performance computing. In: The 4th international conference of computer
148
9. 10. 11.
12. 13.
14.
15.
16.
17.
18.
19.
20.
21.
S. Janani et al. science and renewable energies (ICCSRE’2021). https://doi.org/10.1051/e3sconf/202129 701073 Akinola OO, Ezugwu AE, Agushaka JO et al (2022) Multiclass feature selection with metaheuristic optimization algorithms: a review. Neural Comput Applic 34:19751–19790 Hajek P, Abedin MZ, Sivarajah U (2022) Fraud detection in mobile payment systems using an XGBoost-based framework. Inf Syst Front. https://doi.org/10.1007/s10796-022-10346-6 Bhattacharya A, Ghosal A, Obaid AJ, Krit S, Shukla VK, Mandal K, Pramanik S (2021) Unsupervised summarization approach with computational statistics of microblog data. In: Methodologies and applications of computational statistics for machine learning, pp 23–37. https://doi.org/10.4018/978-1-7998-7701-1.ch002 Hayashi T, Fujita H (2022) One-class ensemble classifier for data imbalance problems. Appl Intell 52:17073–17089. https://doi.org/10.1007/s10489-021-02671-1 Sharma D, Selwal A (2022) An intelligent approach for fingerprint presentation attack detection using ensemble learning with improved local image features. Multimed Tools Appl 81:22129– 22161 Hammad M, Alkinani MH, Gupta BB et al (2022) Myocardial infarction detection based on deep neural network on imbalanced data. Multimedia Syst 28:1373–1385. https://doi.org/10. 1007/s00530-020-00728-8 Mandal A, Dutta S, Pramanik S (2021) Machine intelligence of Pi from geometrical figures with variable parameters using SCILab. In: Samanta D, Althar RR, Pramanik S, Dutta S (eds) Methodologies and applications of computational statistics for machine learning. IGI Global, pp 38–63. https://doi.org/10.4018/978-1-7998-7701-1.ch003 Arora S, Sharma S, Anand R (2022) A survey on UWB textile antenna for wireless body area network (WBAN) applications. In: Artificial intelligence on medical data: proceedings of international symposium, ISCMM 2021. Springer Nature Singapore, Singapore, pp 173–183 Gupta A, Singh R, Nassa VK, Bansal R, Sharma P, Koti K (2021) Investigating application and challenges of big data analytics with clustering. In: 2021 International conference on advancements in electrical, electronics, communication, computing and automation (ICAECA). pp 1–6. https://doi.org/10.1109/ICAECA52838.2021.9675483 Gupta A, Asad A, Meena L, Anand R (2022) IoT and RFID-based smart card system integrated with health care, electricity, QR and banking sectors. In: Artificial intelligence on medical data: proceedings of international symposium, ISCMM 2021. Springer Nature Singapore, Singapore, pp 253–265 Pathania V et al (2023) A database application of monitoring COVID-19 in India. In: Gupta M, Ghatak S, Gupta A, Mukherjee AL (eds) Artificial intelligence on medical data. Lecture notes in computational vision and biomechanics, vol 37. Springer, Singapore. https://doi.org/ 10.1007/978-981-19-0151-5_23 Veeraiah V, Khan H, Kumar A, Ahamad S, Mahajan A, Gupta A (2022) Integration of PSO and deep learning for trend analysis of meta-verse. In: 2022 2nd international conference on advance computing and innovative technologies in engineering (ICACITE), pp 713–718. https:/ /doi.org/10.1109/ICACITE53722.2022.9823883 Shukla R, Dubey G, Malik P, Sindhwani N, Anand R, Dahiya A, Yadav V (2021) Detecting crop health using machine learning techniques in smart agriculture system. J Sci Ind Res (JSIR) 80(08):699–706
GanCOV: A Deep Learning and GAN-Based Algorithm to Detect COVID-19 Through Lung X-Ray Scans Apratim Shrivastav, Lakshmi Sai Srikar Vadlamani, and Rajni Jindal
Abstract Accurate detection of COVID-19 is crucial for effectively managing the spread of the virus and providing appropriate care to infected individuals. It is also necessary for identifying and isolating infected individuals to prevent further transmission of the disease. In this research article, we propose a method for detecting COVID-19 from lung X-ray images using convolutional neural networks (CNNs) and Generative Adversarial Networks (GANs). The CNN is trained on a large dataset of labeled X-ray images to classify whether or not a given image contains evidence of COVID-19 infection. In addition, a GAN is trained to generate synthetic X-ray images that represent COVID-19-positive cases. By combining the outputs of the CNN and GAN, our method is able to accurately detect COVID-19 from X-ray images. The effectiveness of our approach is demonstrated through experimental evaluations on a holdout test set, showing high accuracies in detecting COVID-19. Our method represents a promising tool for assisting in the diagnosis and treatment of COVID-19. Keywords Generative Adversarial Networks · Convolutional neural networks · COVID-19
1 Introduction The COVID-19 pandemic has highlighted the need for rapid and accurate diagnosis of the disease. Lung X-rays have been widely used as a diagnostic tool for COVID19, as they can provide important information about the presence and severity of the infection. However, the interpretation of X-ray images can be subjective and prone to errors, particularly in the case of COVID-19, where the appearance of the infection on X-rays may be similar to that of other respiratory diseases. A. Shrivastav (B) · L. S. S. Vadlamani · R. Jindal Delhi Technological University, Delhi 110042, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_13
149
150
A. Shrivastav et al.
Machine learning techniques have the potential to improve the accuracy of COVID-19 diagnosis through the use of X-ray images. In particular, deep learning models have been shown to be effective at identifying patterns in medical images that are not easily discernible by humans [1, 2]. However, the training of these models typically requires large datasets, which can be difficult to obtain due to privacy and data-sharing concerns. However, good machine learning-based models require a large dataset in order to achieve high accuracy in real-life conditions. This is an obstacle that can be overcome through Generative Adversarial networks which work with limited data to generate large amounts of synthetic images improving the overall accuracy and reliability of such systems. The GAN model includes two neural networks: a generator and a discriminator. These networks are trained together to learn the distribution of a dataset. The generator creates synthetic data that appears to be real, while the discriminator attempts to differentiate between real and synthetic data. During training, the generator is provided with noise as input and tries to generate synthetic data that looks similar to the real data. The discriminator is then given both the real and synthetic data and tries to identify which is which. The generator minimizes the error in the synthetic data, while the discriminator maximizes the error. This process is repeated until the generator produces synthetic data that cannot be distinguished from the real data and the discriminator cannot distinguish between the two. The main contribution of this paper would be: • To present a novel framework that uses GANs and CNNs and is able to effectively train a deep learning model on a dataset of lung X-ray images to accurately detect COVID-19. Our approach is evaluated on a large, diverse dataset of X-ray images, and we demonstrate that it is able to achieve high levels of accuracy in classifying images as positive or negative for COVID-19. The use of Generative Adversarial Networks and deep learning in this context has the potential to improve the accuracy of COVID-19 diagnosis and reduce the burden on healthcare systems. The rest of this paper comprises Sect. 2, which gives an insight into the related works in this field. Section 4 discusses the methodology of our proposed method and Sect. 5 describes the results obtained from it. Section 6 demonstrates the conclusion and future work.
2 Related Work In order to tackle the problem of COVID-19 detection, various methods have been used by researchers. Islam et al. [3] proposed a Federated learning model that uses ensemble classifiers to detect brain tumors through MRI scans. Though the addition of federated learning slightly reduces their obtained accuracy as compared to traditional centralized architectures, it is compensated for by the addition of privacy. Research has proven that Artificial Intelligence-related methods provide quick and accurate methods to detect diseases like COVID-19, Pneumonia, and Leukemia [4].
GanCOV: A Deep Learning and GAN-Based Algorithm to Detect …
151
Ines et al. [5] proposed a federated learning algorithm for detection of COVID-19 through lung X-rays. In their model, each client is given their own private datasets and the server collects the necessary weights and biases from these clients. For feature extraction and classification purposes, a CNN model was used. Their approach achieved a 93–94% mean accuracy using the VGG-16 model and a mean accuracy between 95 and 97% for the ResNet-50 model. Usman et al. [6] proposed an approach where they augmented their data using GANs. They later employed transfer learning using different kinds of convolutional neural networks.
3 Dataset Used The dataset used for the paper is called ‘COVID-19 Radiography Database’ [7, 8]. The team responsible for curating it consisted of researchers from Qatar University, Doha, Qatar, and the University of Dhaka, Bangladesh, as well as their collaborators from Pakistan and Malaysia, and worked in collaboration with medical doctors. The dataset contains 13808 total Lung X-ray images, out of which 10192 were for negative cases and the rest 3616 for COVID-19-positive cases.
4 Methodology This study uses GANs and deep learning model to detect COVID-19 using a chest X-ray database. The algorithm utilizes GAN to generate synthetic images, while a CNN model is trained on the dataset generated by the GAN for high accuracy.
4.1 Generative Adversarial Network The dataset used for the processing was imbalanced as demonstrated in figure above. Hence, it was essential to generate synthetic images for higher accuracy. For the purpose of generating the images, we implement a variation of Non-Conditional Generative Adversarial Networks (NCGANs). NCGAN is a deep learning model that generates synthetic data. It is an extension of the Generative Adversarial Network (GAN) architecture but does not use a conditioning input to generate data. The NCGAN model is similar to the GAN model in that it includes a generator and discriminator network trained to learn the data distribution. However, unlike the GAN model, the NCGAN model’s generator creates synthetic data solely based on noise input, without the use of a conditioning input. NCGAN has been applied to a variety of tasks, including image and text generation, and is particularly effective at producing high-quality synthetic images. The GAN possesses the ability to generate synthetic data without the need for a conditioning input, which can be useful when
152
A. Shrivastav et al.
the data being generated lacks a specific structure or desired characteristics. Another benefit is the ability to capture the structure of the data distribution and generate synthetic data that represents the real data, especially useful when obtaining real data is difficult or costly.
4.2 Dataset Curation To create our training and testing sets, we split the images from all of our databases into two groups: 70\% for training and 30\% for testing. The dataset had 13808 total images, out of which 10192 were for negative cases and the rest 3616 for COVIDpositive images. To prepare the images for analysis, we first uploaded and resized them to 256 × 256 pixels. We chose this size because all of the images were at least this large. Next, we used a method described in [15] to crop the chest area of the images by selecting a 90-pixel region around the center of each image. The resulting images were then resized again to 100 × 100 pixels. Finally, we normalized the data by dividing all pixel values by 255. As shown in Fig. 1, the first image (a) represents the original image, the second image (b) shows the same image after the cropping process has been applied, and the final image (c) shows the cropped image after it has been resized to (100 × 100). The images first go through histogram equalization. The intensity values of the pixels in the image are first counted and plotted in a histogram. The histogram shows the frequency of occurrence of each intensity value in the image. The intensity values are then transformed such that the frequency of each intensity value is equalized. This is done by mapping the intensity values to new intensity values such that the resulting histogram is a flat line.
Fig. 1 Resizing lung X-ray images
GanCOV: A Deep Learning and GAN-Based Algorithm to Detect …
153
4.3 GAN Architecture Generator: The architecture for the generator is mentioned in Fig. 2. There are several layers for the generator, including convolutional transpose layers, ReLU activation layers, Tanh activation layers, batch normalization layers, and a final convolutional transpose layer. The convolutional transpose layers perform a type of convolution that up samples the input and generates a larger output. This is useful for increasing the spatial resolution of the output. The ReLU and Tanh activation layers apply an activation function to the output of the convolutional transpose layers. The batch normalization layers normalize the output of the convolutional transpose layers. The generator takes in noise as input and generates synthetic data as output. The synthetic data is intended to be similar to the real data and is fed to the discriminator along with real data. The discriminator tries to distinguish between real and synthetic data, and the generator is trained to try to fool the discriminator. This process is repeated until the generator is able to produce synthetic data that is indistinguishable from the real data. Discriminator: The convolutional layers perform a type of convolution that extracts features from the input data. The LeakyReLU activation layers apply an activation function to the output of the convolutional layers. The Dropout layers randomly set a fraction of the input units to 0 during training, which helps to prevent overfitting. The batch normalization layers normalize the output of the convolutional layers. The linear layer is a fully connected layer that maps the output of the convolutional layers to a single value. The sigmoid activation layer applies a sigmoid function to the output of the linear layer to squash it between 0 and 1. The discriminator takes in data as input and outputs a single value that represents the probability that the input data is real. The discriminator is trained to maximize this probability for real data. The flow diagram for our discriminator is shown in Fig. 3. The dataset undergoes several epochs to be generated through the GAN. The process of generating these datasets is shown in Fig. 4.
Fig. 2 Generator flow
154
A. Shrivastav et al.
Fig. 3 Discriminator flow
Fig. 4 Synthetic image generation by GAN after every 100 epochs
4.4 Deep Learning Model The CNN model is defined as a sequential model, which means that the layers are added to the model in a linear fashion, with the output of one layer becoming the input of the next layer. The model starts with a Conv2D layer with 128 filters, a kernel size of (3, 3), and a ReLU activation function. The input shape for this layer is specified as (70, 70, 3), which means that the input data consists of images with a height and width of 70 pixels and three color channels (e.g., red, green, and blue). The output of this layer is then passed through a MaxPooling2D layer, which down samples the input by taking the maximum value over a 2 × 2 window. This is followed by a Dropout layer, which randomly sets a fraction of the input units to 0 during training, which helps to prevent overfitting.
GanCOV: A Deep Learning and GAN-Based Algorithm to Detect …
155
Fig. 5 Model overview
The model then adds two more Conv2D layers, each followed by a MaxPooling2D layer and a Dropout layer. The output of the last Conv2D layer is then flattened and passed through a Dense layer with 16 units and a ReLU activation function. This is followed by another Dropout layer and a final Dense layer with 2 units. The model is then compiled with the Adam optimizer, a Sparse Categorical Cross-entropy loss function, and the accuracy metric.
5 Experimentation and Results 5.1 Evaluation Metrics In this paper, we utilize recall, specificity, accuracy, precision, and F1-score in order to calculate the performance of the model. Figure 5 describes the entire model. Recall measures the model’s ability to accurately identify positive examples, which is represented as the proportion of positive examples that were correctly identified. Specificity, on the other hand, evaluates the model’s ability to correctly identify negative examples or the proportion of negative examples that were correctly identified. Accuracy is a measure of the overall performance of the model, which is determined by the percentage of correct predictions made. Precision assesses the model’s ability to accurately identify positive examples among all the examples it identified as positive, while the F1-score considers both precision and recall to evaluate the model’s performance, calculated as the harmonic mean of these two metrics.
5.2 Results Obtained Table 1 describes the evaluation metrics used for the CNN model and the results obtained for each of them.
6 Conclusion and Future Work In conclusion, the proposed framework for COVID-19 detection using lung X-ray scans has shown promising results in our experiments. This is especially important in the context of a pandemic, where accurate detection of the virus is vital in curbing its
156 Table 1 Obtained results for CNN model
A. Shrivastav et al.
Evaluation metric
Result obtained
Recall
0.98
Specificity
0.99
Accuracy
0.99
Precision
0.96
F1-score
0.97
spread. By aggregating the data and models from available datasets, our approach was able to achieve an accuracy of 0.99 on the test set, outperforming several baseline models. Overall, the proposed deep learning framework for COVID-19 detection using Lung X-ray scans is a promising approach that balances the need for accurate and timely detection of COVID-19. Further studies are needed to fully evaluate the effectiveness of this approach in a real-world setting with a higher accuracy and to explore its potential for use in other medical domains.
References 1. Liu Y et al (2019) Detecting diseases by human-physiological-parameter-based deep learning. IEEE Access 7: 22002–22010. https://doi.org/10.1109/ACCESS.2019.2893877. Author F, Author S (2016) Title of a proceedings paper. In: Editor F, Editor S (eds) Conference 2016, LNCS, vol 9999, pp 1–13. Springer, Heidelberg 2. Panwar H et al (2020) Application of deep learning for fast detection of COVID-19 in X-Rays using nCOVnet, Chaos, Solitons & Fractals. vol138, p 109944, ISSN 0960-0779. https://doi. org/10.1016/j.chaos.2020.109944 3. Islam M, Reza MT, Kaosar M et al (2022) Effectiveness of federated learning and CNN ensemble architectures for identifying brain tumors using MRI images. Neural Process Lett. https://doi.org/ 10.1007/s11063-022-11014-1. LNCS Homepage, http://www.springer.com/lncs. Last accessed 21 Sept 2016 4. Shanmuga Sundari M, Sudha Rani M, Ram KB (2023) Acute Leukemia classification and prediction in blood cells using convolution neural network. In: Gupta D, Khanna A, Bhattacharyya S, Hassanien AE, Anand S, Jaiswal A (eds) International conference on innovative computing and communications. Lecture Notes in Networks and Systems, vol 473. Springer, Singapore. https:/ /doi.org/10.1007/978-981-19-2821-5-11 5. Feki I, Ammar S, Kessentini Y, Muhammad K (2021) Federated learning for COVID-19 screening from Chest X-ray images. Appl Soft Comput 106:107330. https://doi.org/10.1016/ j.asoc.2021.107330. Epub 2021 Mar 20. PMID: 33776607; PMCID: PMC7979273 6. Asghar U et al (2022) An improved COVID-19 detection using GAN-based data augmentation and novel QuNet-based classification. BioMed Res Int vol 2022, Article ID 8925930, 9 pages. https://doi.org/10.1155/2022/8925930 7. Rahman T et al (2021) Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput Biol Med 1322021:104319, ISSN 0010-4825 8. Chowdhury MEH et al (2020) Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 8:132665–132676. https://doi.org/10.1109/ACCESS.2020.3010287
Text-to-Image Synthesis using BERT Embeddings and Multi-Stage GAN Poonam Rani, Devender Kumar, Nupur Sudhakar, Deepak Prakash, and Shubham
Abstract A technique called text-to-image creation is used to create visuals that correspond to provided written descriptions. In this study, we have proposed a MultiStage GAN model using BERT Embeddings which consists of three layers of generators and discriminators. Based on a provided text description, the Stage-I GAN creates low-resolution drawings (64 × 64) of a scene’s basic structure and colors. High-resolution pictures (128 × 128) with photorealistic details are produced by the Stage-II GAN using the text description and Stage-I findings as inputs. Further the Stage-III GAN takes Stage-II images as input and adds enticing details and corrects flaws in Stage-II outcomes through refining and also increases the resolution to 256 × 256. For the purpose of creating photorealistic pictures based on text descriptions, the model is evaluated using the Oxford-102 dataset and the CUB dataset and Inception Score as the metric. The IS of Multi-Stage GAN is 3.27 on Oxford-102 dataset and 4.05 on CUB dataset. Keywords Text-to-image synthesis · Multi-Stage GAN · BERT embeddings · Oxford-102 dataset · CUB dataset
P. Rani · N. Sudhakar (B) · D. Prakash Department of Computer Engineering, Netaji Subhas University of Technology, New Delhi, India e-mail: [email protected] P. Rani e-mail: [email protected] D. Prakash e-mail: [email protected] D. Kumar · Shubham Department of Information Technology, Netaji Subhas University of Technology, New Delhi, India e-mail: [email protected] Shubham e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_14
157
158
P. Rani et al.
1 Introduction The goal of text-to-image synthesis involves training a model to create pictures from a given written description. This task has many applications, such as generating personalised images for social media, creating illustrations for documents and stories, and improving the accessibility of visual media by providing alternative text descriptions for images. A two-stage method of text-to-image synthesis using GANs is to employ a text encoder to convert the written description into a latent code, followed by a generator network to create a picture based on the latent code. The discriminator network is taught to distinguish between actual and produced images, while the generator network is trained to make images that are indistinguishable from real photos using a modified version of the GAN aim. GAN-based text-to-image synthesis is a growing field of study with a wide range of possible applications. To enhance the capabilities and quality of these models, researchers are continually investigating novel strategies and methodologies. BERT (Bidirectional Encoder Representations from Transformers), a popular technique to natural language processing, is used for tasks including text classification, question answering, and language translation. Embeddings, which are fixedlength vector representations of the text that include part of its semantic meaning, are created for text using BERT. In this research, we aim to use BERT embeddings as part of the text encoding process in the StackGAN++ model [13] to create a Multi-Stage GAN for text-toimage synthesis on Oxford-102 and CUB dataset. There are a few potential benefits to using BERT embeddings in the StackGAN++ model. First, BERT embeddings are pre-trained on a large dataset and have been shown to capture a wide range of semantic information, which help the model to better understand the content of the text description. Second, BERT embeddings are relatively low-dimensional, which make it more efficient to use them as part of the text encoding process. Overall, using BERT embeddings in the StackGAN++ model potentially improve the performance of the model on the text-to-image synthesis task. To fully understand the advantages and drawbacks of employing BERT embeddings in this situation, further study is required. For example, the model should be tested on bigger datasets like the Flickr30k and MS-COCO dataset. The process of Multi-Stage GAN can be broken down into the following steps: 1. A text embedding that captures the meaning of the text in a fixed-length vector is created using the text description as input to a BERT model. 2. The Multi-Stage GAN model, which has three stages—Stage I generator, Stage II generator, and Stage III generator—is then given the text embedding. 3. The Stage-I generator creates a low-resolution picture using the text embedding. 4. The Stage-II generator then receives the low-resolution picture and the text embedding to create a mid-resolution image. 5. The Stage-III generator then uses the mid-resolution picture and the text embedding to create a high-resolution image.
Text-to-Image Synthesis using BERT Embeddings and Multi-Stage GAN
159
6. The Multi-Stage GAN model’s high-resolution picture, which is a synthesis of the input text description, is its ultimate output. Following is how the remaining sections of the paper are organised: Sect. 2 examines pertinent research done in this approach. The proposed model is introduced in Sect. 3. Section 4 presents the experiment’s results. Section 5 summarises the article’s future directions before it is concluded.
2 Related Work This section offers a summary of the key discoveries and problems surrounding textto-image synthesis. Recently, excellent results have been obtained using Generative Adversarial Networks (GANs) to produce crisper pictures [3]. However, due to the training instability, it is difficult for GANs to create high-resolution (e.g., 256 × 256) images. To stabilise the training and enhance the image characteristics, several works have been offered. One notable example is the StackGAN model [12], which was able to generate realistic images of birds and flowers based on text descriptions. Other approaches have focused on generating a wide range of image types, including portraits, product images, and landscapes. On the other hand, the latter stages of the suggested StackGANs both add information and fix flaws to low-resolution pictures created by the early stages. Recently, a number of techniques for creating graphics from unstructured information have been created. In order to build an AlignDRAW model, Mansimov et al. [7] learned to predict the alignment of the text with the producing canvas. In order to create pictures with text descriptions and object position restrictions, Reed et al. [9] employed conditional PixelCNN. Nguyen et al. [8] generated visuals conditioned on text using a rough Langevin sampling method. Their sampling strategy, meanwhile, necessitates a laborious iterative optimization procedure. Numerous studies have suggested using multiple GANs to enhance sample quality due to the challenges involved in simulating the finer aspects of genuine pictures. A structure GAN and a style GAN are used by Wang and Gupta [10] to create pictures of interior situations. With the use of layered recursive GANs, Yang et al. [11] factorised picture generation into foreground and background generation. To reconstitute the multi-level representations of a pre-trained discriminative model, He et al. [4] incorporated numerous GANs. However, they were unable to produce photos with fine details at high resolution. To improve the likelihood that the generator would get useful input, Durugkar et al. [2] combined many discriminators with one generator. The StackGANs instantaneously created high resolution pictures that are conditioned on their low-resolution inputs rather than creating a residual image. Concurrently with the work presented here, Karras et al. [6] gradually increased the number of layers in the generator and discriminator for high resolution picture generation. They adopted a more conservative upsampling approach, increasing their picture
160
P. Rani et al.
resolution between successive image production stages by a factor of 2, beginning with 4 × 4 pixels, which is the key difference in terms of experimental setup. Our Multi-Stage GAN, which uses an encoder-decoder network before the upsampling layers, may be able to correct incoherent artefacts or faults in low resolution results, whereas StackGANs, LAPGANs, and Progressive GANs are all focused on enhancing finer details in higher resolution photographs. The study “StackGAN++ : Realistic Image Synthesis with Stacked Generative Adversarial Networks” introduces the StackGAN++ model which is a development of the original StackGAN model. This model used Glove embeddings in the text encoding process. Instead of employing glove embedding, we utilize the BERT model to turn input strings into text embeddings. Hui and Xuchang [5] employed a two stage StackGAN network that was upgraded and deep separable convolutionally optimised to attain an inception score of 3.94 on the CUB dataset. Our three stage GAN model improved upon this result.
3 Proposed Method This section of study presents a Multi-Stage GAN network (Fig. 1) consisting of three-stage generator for image generation along with conditional augmentation for the text input. The three-stage generator allows for more fine-grained control over the resolution of the generated images, and can produce more realistic images.
Fig. 1 Architecture of Multi-Stage GAN
Text-to-Image Synthesis using BERT Embeddings and Multi-Stage GAN
161
3.1 Components A collection of photos and their related text descriptions are used to train the MultiStage GAN so that it can produce images that correlate to the text descriptions. The model consists of conditional augmentation block, generator block, discriminator block and image generation block.
3.1.1
Conditional Augmentation
In this process, the text is converted into a computer understandable numerical format. Word embeddings are used to extract features from the text, which are then passed to the model for further processing. Before passing it to the model, a conditioning augmentation is added to solve the problem of discontinuity in the latent data manifold. The mean μ(φ t ) and diagonal covariance matrix ∑(φ t ) of the independent Normal distribution N(μ(φ t ), ∑(φ t )) are functions of the text embedding φ t , and we sample the latent text variable from this distribution instead of using fixed conditioning text. To extract features from the text, BERT embeddings are used. BERT is a model released by Google AI [1] which was created by layering many encoders of the transformer architecture on top of each other. The model provides great ability for fine-tuning and can be used in a variety of applications.
3.1.2
Stage-1 Generator
As shown in Fig. 2, the Stage-1 Generator is used to generate an image from text and random noise. The image is of low resolution and usually contains less detail. The architecture takes two inputs: one is the sampled text embedding and the other is the random noise. After passing them through a layer of batch normalization and a GLU (Gated Linear Units) layer, it is followed by three upsampling blocks, each of which scales the image by a factor of 2. Each upsampling block contains an upsampling layer, a batch normalising layer after a 3 × 3 convolution layer and a GLU layer. The output of the Stage-1 Generator is a 64 × 64 × 64 vector, which can either be passed as an input to the next generators or transformed into an image by a simple convolution operation with tanh as the activation function.
3.1.3
Stage-2 and Above Generators
All of the generators above Stage-1 (Fig. 3) function similarly; their job is to improve the image’s quality and add additional information. The generator produces a highresolution picture vector of size 2n × 2n × 2n from an input vector of size nxnxn, which is a concealed representation of the image produced by the prior generator block. This image can either be passed on to the next blocks or converted to an image
162
P. Rani et al.
Fig. 2 Architecture of stage-1 generator
by a simple convolution operation. We have used 2 blocks of this kind to upscale an image of 64 × 64 to 256 × 256. The image vector from the previous block and the sampled embeddings are the model’s two inputs. The text embeddings are repeated to form dimensions similar to that of the image, which are then concatenated with the image depth-wise. It is then passed through further operations, which include a convolution operation followed by batch normalization and a GLU. It is then passed through 2 residual blocks, where each residual block contains a convolution operation, batch normalization, and a ReLU. This is used to extract details and make additions to the image. The image is then upsampled to a higher resolution using an upsampling block similar to that in the Stage-1 Generator.
3.1.4
Discriminator
There are two embeddings of text and images as input. The discriminator (Fig. 4) predicts the likelihood that the specified image and text will be aligned in addition to determining whether a particular image is true or fake. With the exception
Text-to-Image Synthesis using BERT Embeddings and Multi-Stage GAN
163
Fig. 3 Architecture of stage-2 and above generators
of the amount of downsampling layers, all stages use the same discriminator. The discriminator repeats the text embeddings as input to create a vector of the image’s dimensions. The decision score is then created by concatenating this with the picture depth and passing it through layers of downsampling and a fully connected layer with a single node. We have used Joint Conditional and Unconditional distribution [13]. The objective of the discriminator is to minimize both the unconditional loss and the conditional loss. [ [ ] ] L Di = −Exi ∼ pdatai log Di (xi ) − Esi ∼ pGi log(1 − Di (si ) + [ [ ] ] − Exi ∼ pdatai log Di (xi , c) − Esi ∼ pGi log(1 − Di (si , c)
(1)
Loss function of generator is according to this loss [ [ ] ] LG i = −Esi ∼ pGi log Di (si ) + −Esi ∼ pGi log(Di (si , c)
(2)
164
P. Rani et al.
Fig. 4 Architecture of discriminator
4 Experimental Result Reconstruction loss is utilised in the Multi-Stage GAN’s conditional augmentation layer to make sure that the output pictures match the input text description. By contrasting the embeddings of the produced pictures with the embeddings of the input text description, the reconstruction loss is estimated. The L1 or L2 distances between the embeddings of the produced pictures and the input text description are added to determine the reconstruction loss in the conditional augmentation layer. This loss is subsequently incorporated as a factor in the model’s overall loss function, which is optimised during training to raise the calibre of the pictures that are produced. The BERT encodings used in the conditional augmentation worked better than the Glove embeddings. The encodings were tested by reconstruction loss during model training. The BERT embeddings showed less reconstruction loss (Fig. 5). BERT embeddings show faster convergence and offer lower reconstruction and KL divergence loss. An indicator of the calibre of generative models’ produced pictures is the Inception Score (IS). By assessing the variety and quality of the produced pictures, it compares them to actual ones. 1000 epochs per stage are used to train the model using the
Text-to-Image Synthesis using BERT Embeddings and Multi-Stage GAN
165
Fig. 5 Reconstruction loss versus epochs
Oxford-102 dataset and the CUB dataset. The Oxford-102 dataset includes 8189 pictures of 102 different kinds of flowers. There are ten language explanations for each flower picture. 200 different bird species are included in the 11,788 pictures in the CUB collection. There are ten language descriptions for each bird image. The Multi-Stage GAN architecture using BERT Embeddings produced good quality images in comparison to the other models (Fig. 6). The inception score on Oxford102 dataset for StackGAN v1 is 3.2, StackGAN v2 is 3.25 and Multi-Stage GAN is 3.27 and on CUB dataset for StackGAN v1 is 3.7, StackGAN v2 is 4 and Multi-Stage GAN is 4.05. Hence, the IS of Multi-Stage GAN is higher than that of StackGAN and StackGAN++ as BERT embeddings are able to capture more of the meaning of the text description, resulting in more accurate image generation. It is significant to note that various metrics and human review may be utilised to assess the model’s performance in addition to the Inception Score when assessing the quality of produced pictures.
166
P. Rani et al.
Fig. 6 Comparison of different models
5 Conclusion In order to provide a more accurate image that corresponds to text description, we have suggested a Multi-Stage GAN model in this research. We have used StackGAN++ as our base architecture and then introduced modifications in it to improve its performance. Multi-Stage GAN architecture has three layers of generator and discriminator for better resolution. By increasing the generator block, we were able to reconstruct the image. We have used the BERT embeddings instead of Glove embeddings to get more precise details about input text. All this helped us to achieve a higher inception score and lower reconstruction loss. In comparison to other models, our text-to-image generative model creates images with a greater quality, more photorealistic features, and a wider variety. As we have trained this model on datasets like CUB and Oxford, we are looking for training on bigger datasets like Flickr30k and MS-COCO as future work. This will give more robustness to the model and the model will be able to generate more diverse images and the will generate images from more generalized input text. Also, layers in the Multi-Stage GAN network could be increased so as to add more detailing to the output and have a better resolution.
References 1. Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. ArXiv.org. https://arxiv.org/abs/1810.04805 2. Durugkar I, Gemp I, Mahadevan S (2017) Generative multi-adversarial networks. ArXiv: 1611.01673 [Cs]. https://arxiv.org/abs/1611.01673
Text-to-Image Synthesis using BERT Embeddings and Multi-Stage GAN
167
3. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. https://arxiv.org/pdf/1406.2661.pdf 4. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition (CVPR 2016). http://hep.kisti.re.kr/2021/13.resnet_lee_060321.pdf 5. Hui L, Xuchang Y (2022) Image generation method of bird text based on improved StackGAN. IEEE Xplore. https://doi.org/10.1109/ICIVC55077.2022.9886347 6. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of GANs for improved quality, stability, and variation. ArXiv.org. https://arxiv.org/abs/1710.10196 7. Mansimov E, Parisotto E, Ba JL, Salakhutdinov R (2016. Generating images from captions with attention. ArXiv: 1511.02793 [Cs]. https://arxiv.org/abs/1511.02793 8. Nguyen A, Clune J, Bengio Y, Dosovitskiy A, Yosinski J (2017) Plug & Play generative networks: conditional iterative generation of images in latent space. Openaccess.thecvf.com. https://openaccess.thecvf.com/content_cvpr_2017/html/Nguyen_Plug__Play_CVPR_2017_p aper.html 9. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. Proceedings.mlr.press; PMLR. http://proceedings.mlr.press/v48/reed16.html 10. Wang X, Gupta A (2016) Generative image modeling using style and structure adversarial networks. In: Computer vision—ECCV 2016, pp 318–335. https://doi.org/10.1007/978-3-31946493-0_20 11. Yang J, Kannan A, Batra D, Parikh D (2017) LR-GAN: layered recursive generative adversarial networks for image generation. ArXiv: 1703.01560 [Cs]. https://arxiv.org/abs/1703.01560 12. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. Openaccess.thecvf.com. https://openaccess.thecvf.com/content_iccv_2017/html/Zhang_StackGAN_ Text_to_ICCV_2017_paper.html 13. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2019) StackGAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962. https://doi.org/10.1109/tpami.2018.2856256
Application of Convoluted Brainwaves for Efficient Identification of Eating Disorder Shipra Swati
and Mukesh Kumar
Abstract Recent trends of research related to brain health have a noticeable inclination toward Electroencephalography (EEG), which effectively captures the electrical activities of the brain. They present valuable insights related to traits of any neurodegenerative diseases that have the potential to disrupt the emotional as well as somatic state of human beings. It helps health professionals to dilute the severity of psychological conditions by developing a relevant treatment plan in advance. However, some disasters like COVID-19 pandemic or fear of world war put huge consequences on mankind in the form of cognitive load, which may result in some psychiatric disorders. This paper investigates the impact on human brain because of one such negative emotional state that provokes unhealthy eating patterns. Here, deep learning approaches distinguish healthy individuals and people affected by eating disorders (ED). The experimental score shows efficient identification of potential subjects by including EEG data. The proposed analysis may provide novel insights to clinicians and healthcare providers for diagnosing different disorders of brain. Keywords Eating disorder · EEG · Deep learning · CNN
1 Introduction The life of human beings gets unnecessarily complicated due to different health conditions including somatic and cognitive diseases. While physical illness has a visible impact, mental health conditions go unnoticed resulting in severe consequences. According to World Health Organization (WHO), until 2019, approximately 970 million people were living with a mental disorder worldwide. Most common of S. Swati (B) · M. Kumar Department of CSE, National Institute of Technology Patna, Patna, India e-mail: [email protected] URL: https://www.nitp.ac.in/cse M. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_15
169
170
S. Swati and M. Kumar
them are anxiety and depressive disorders, while Bipolar Disorder, Schizophrenia, Post-Traumatic Stress Disorder (PTSD), eating disorders, and others, also disturb normal behavior and thinking patterns. The statistics of the year 2020 witnessed a significant spike in the number of anxiety and major depressive disorders, where the estimated increase was recorded to be 26% and 28%, respectively [1]. This drastic escalation is the repercussion of the pandemic COVID-19 and its preventive measure, i.e., lockdown. People felt negative emotions like sad, afraid, anxiety, stress, bored, or loneliness during the time of isolation. It led to a conscious effort to keep high morals using fitness and food, which in turn invited harmful eating habits. When food becomes a primary coping mechanism to deal with emotions, it is called emotional eating and prompts people to eat even when they are not really hungry [2]. Individuals with eating disorders (ED) are at high risk of psychological distress and major physical complications. According to experts, Binge Eating Disorder (BED), Bulimia Nervosa (BN), and Anorexia Nervosa (AN) are the three classes of ED. Among EDs, AN is less common but very serious, whereas BED is the most common and consequently affects more individuals than AN and BN combined. The identified cases of BED suffer from life-threatening obesity along with different mental disorders like major depressive disorder (MDD), Substance Use Disorders (SUD), Attention-Deficit/Hyperactivity Disorder (ADHD), anxiety, and others. So, it is an important public health problem worldwide for which cognitive-based therapies are used with the help of family persons. These treatment frameworks model the interactions between behavior patterns, emotions, and cognitions of candidate influenced by EDs [3]. But, the first and foremost step of treating any illness is to detect it accurately. Also, the BED assessment is challenging because of the wide spectrum of people suffering from EDs and the lack of a quantitative approach to distinguish the potential candidates. Generally, the clinicians depend on self-reports or well-documented investigator-based interviews, which obviously have discrepancies in assessment outcomes among themselves. Here comes the role of neuroimaging modalities, such as functional Magnetic Resonance Imaging (fMRI), Electroencephalogram (EEG), Positron Emission Technology (PET), etc., to explore the functional changes in the brain and body of the affected subject [4]. Apart from somatic unrest, brain activities also indicate serious mental disorders including BED. In neurophysiological research, EEG is found to be one of the most supportive tools because of operational ease, cost-effectiveness, and portability [5]. It records the neural activity of the brain as multi-channel time series data and helps experts to evaluate human mental health or to identify any particular disorder. The spatial resolution of the EEG signal is directly proportional to the number of channels available in the recording devices. However, practically they are comfortable to wear with minimum technical expertise in electrode handling and installation [6]. The healthcare sector has been expanded by integrating Internet-based platforms for diagnosing and treatment of various diseases, leading to the availability of huge new datasets. High computational efforts are required for long-time monitoring of such repositories containing huge amounts of data. So, the application of machine learning (ML) and deep learning (DL) is evident to analyze these repositories for
Application of Convoluted Brainwaves for Efficient Identification of Eating Disorder
171
predicting human ailment in advance to initiate timely medication. DL architectures behave like a black box approach to extract higher-level features from the raw EEG recording even without handcrafted features to interpret, analyze, and visualize the automated engineered features. This paper presents an innovative approach using DL that make use of subjective measurement utilizing feedback provided by the BED-specific questionnaire and the physiological measurements peculiar to eating disorder. The contributions of this paper can be summarized as • In a unique approach, the presented work uses EEG and the Three-Factor Eating Questionnaire for identifying the potential of getting affected with ED. • The EEG data points are encoded using 1D convolutional neural network (CNN). • The proposed approach tests applicability of convolutional neural network (CNN) and long short-term memory (LSTM) for the classification task. • The comparative analysis of the presented framework with existing solution proves its effectiveness for distinguishing individuals having traits of BED. The rest of this paper is organized as follows: First an overview of research related to EEG is provided followed by the discussion on proposed methodology. It includes information related to dataset and the deep learning approach used for the considered problem. Afterward outcome of the presented architecture is analyzed and compared with state-of-the-art approach. Finally, the paper concludes with the limitation and future scope of this work.
2 Related Work In recent years, an extensive amount of research has been conducted to explore the mental state and psychological well-being of humans. A review analysis presented by Antora et al. explores different EEG-based techniques to detect a very common mental illness, depression [7]. The taxonomy of machine learning and deep learning found to perform better than others are SVM and CNN, respectively. Hong et al. developed a gradient-boosting framework, LightGBM to identify the mental state of a driving person using EEG. It may be helpful to prevent fatigue driving, which causes accidents bringing great harm to individuals and families [8]. One clinical neurological case results in abnormal electrical activity occurrences, commonly known as Epilepsy. Tahereh et al. have analyzed the performance of long short-term memory (LSTM) for classifying the EEG samples of normal and epileptic subjects [9]. The classification accuracy of around 96%, achieved by this LSTM-based neural network architecture encourages using EEG for detecting other clinical mental illnesses. In another research related to diagnosing Alzheimer’s Disease (AD), the effective application of EEG is validated for promising diagnosis and prognosis [10]. A deep learning approach has been proposed to address alcoholic addiction using brain signals by Nandini et al. [11]. Their experimental results considered the CNN
172
S. Swati and M. Kumar
and LSTM models along with a hybrid version comprising both. The classification accuracies for these three models are recorded as 92.77%, 89%, and 91%, respectively. EEG has also been utilized to detect different neurological disorders (NDs) like Epilepsy, Alzheimer’s, Parkinson’s, and so on [12]. But very few researches related to the automatic detection of BED are found during the literature survey [13]. Instead of early identification of this disorder, Marie et al. have examined the feasibility and effectiveness of neurological treatment in the reduction of binge eating disorder [14]. Recognizing the lack of state-of-the-art approaches, the result of this proposed framework is compared with the result of one existing methodology, which is implemented on the same dataset.
3 Proposed Framework This part of the paper covers the framework proposed to address the issue discussed in the introductory section. It includes the overview of training two configurations of the deep learning models for predicting the BED label of the EEG recordings. The architecture is shown in Fig. 1, where each phase of the presented methodology is represented for better illustration. It starts with data collection of different somatic and psychological parameters in order to provide a complete overview of participants’ health status. From the available data repository, only two parameters (EEG and FEV) are selected for further investigation of the arguments presented in this article. The outcome of applying DL for find-
Fig. 1 Proposed architecture
Application of Convoluted Brainwaves for Efficient Identification of Eating Disorder
173
ing the influence of selective parameters of TFEQ on EEG is found to deliver improved performance over the existing approach that uses ML technique. Deep learning models require a large sample size for better training. So, the proposed architecture uses sliding window approach for segmenting the multi-channel EEG signals. The model is evaluated for non-overlapping n-window sizes, resulting in (n × f s ) × ch, where f s is the sampling rate and ch is the number of channels. The EEG injective mapped with FEV is encoded using 1D CNN model and fed to another 1D CNN or LSTM to predict the label (0: non-BED, 1: BED). The dataset provided by Max Planck Institute Leipzig with the name ‘LEMON’ is found to be most suitable for this work [15]. This publicly available repository consists of psychological, physiological, and neuroimaging measures that may be helpful in the complete assessment of overall health status of human body. Here, only two parameters are used to evaluate the proposed hypothesis, EEG and Three-Factor Eating Questionnaire (TFEQ) [16].
4 Results and Discussion The LEMON dataset has a total of 203 EEG recordings and 202 psychological assessments related to eating behavior. The EEG data were recorded using 62 electrodes, placed on the scalp according to the 10–10 system, which is an extension of the international standard 10–20 localization system. The sampling frequency of the signal is 2500 Hz, and the frequency range is in between 0.015 Hz and 1 kHz. The raw EEG data is downsampled 250 Hz and band-pass filter between 0.5 Hz 50 Hz. Non-EEG channels are dropped before fetching 10 s epochs from the pre-processed signal. Figure 2 represents the epoch plot of a single EEG recording with data points of 25 s. Total number of EEG channels is 61 after all the pre-processing steps are applied. Another parameter Three-Factor Eating Questionnaire (TFEQ) covers cognitive evaluation related to food addiction, indicating self-control for eating patterns in terms of three factors: (1) cognitive restraint of eating, (2) disinhibition, and (3) hunger.
Fig. 2 Effect of basic pre-processing steps on EEG signal of specific time duration
174
S. Swati and M. Kumar
These three dimensions of human eating behavior are encoded as factor 1, factor 2, and factor 3, respectively. They are quantified by assigning unit points to each of the questionnaire’s 51 queries. Among them, 21 questions belong to factor 1, 16 to factor 2, and the remaining 14 determine factor 3. The LEMON dataset uses ‘Fragebogen zum Essverhalten’ (FEV), which is the German version of the Three-Factor Eating Questionnaire (TFEQ) [17]. From the earlier inspection of TFEQ, it is known that the maximum score for all these three factors are 21, 16, and 14, respectively, while the minimum score for each factor is 0. For separating the participants into healthy (non-BED) and unhealthy (BED), scores of the ‘factor 2,’ i.e., ‘disinhibition’ is further processed. The urge of having equal class distribution in the sample fed to the proposed model resulted in the following rule for relating the disinhibition scores with 1 or 0: ⎧ ⎪ ⎨0 if 0 ≤ FEV_STOERi ≤ 2 SUBi = 1 if 8 ≤ FEV_STOERi ≤ 16 ⎪ ⎩ NULL if 2 < FEV_STOERi < 8
(1)
FEV_STOER in Eq. (1) depicts the score of ‘disinhibition’ (factor 2) in German version of TFEQ, fetched from LEMON dataset. SUBi represents the equivalent score of ‘factor 2’ as per this equation, where i is in the range of (1, N ), N being the total number of subjects. Application of BED-label allocation rule on the ‘FEV factor 2’ scores led to 36 non-BED and 27 BED participants within the dataset. They act as target variables for all EEG epochs, which are generated using the electroencephalogram recordings of the respective subjects. The pre-processed data is fed to this model integrating two sub-models for end-to-end training using Adam optimizer and an initial learning rate of 1e−3 . The internal validity of both models is quite descent due to the rigorous k-fold cross-validation. The performance of output label prediction is compared for both models included here, namely CNN-CNN and CNN-LSTM. The result of both these models for classification accuracy is given in Fig. 3. It is visibly obvious that the LSTM-based model does not perform well compared to CNN-based model. This may be addressed by extensive tuning of hyper-parameters like the batch size, optimizer, and others to have better performance scores. However, Table 1 intuitively explains the effectiveness of the proposed CNN-based model over the state-of-the-art (SOTA) [13] method.
5 Conclusion and Future Scope This research article presents an argument that EEG recording of a participant may be used to find whether the subject is affected by BED or not. For identifying the EEG features related to ED traits, the methodology employs end-to-end deep learning approaches. The performance measures of the proposed framework exhibits improve-
Application of Convoluted Brainwaves for Efficient Identification of Eating Disorder
175
Fig. 3 Performance analysis of both DL models (CNN-CNN and CNN-LSTM). a Plot of loss and accuracy during training. b Plot of loss and accuracy during testing Table 1 Comparative analysis with state-of-the-art work Evaluation CNN LSTM Metrics Value (%) Std. Dev. Value (%) Std. Dev. Accuracy Positive predicted value True positive rate True negative rate F1-score FPR FNR AUC
SOTA [13] Value Std. Dev.
88.68 87.83
0.05 0.05
65.43 43.78
0.03 0.13
81.25 % 84.18 %
0.04 0.08
86.04 91.04 86.79 89.52 13.95 88.55
0.09 0.04 0.06 0.04 0.09 0.05
63.45 66.43 50.92 33.56 36.54 62.66
0.02 0.03 0.11 0.03 0.02 0.04
84.99 % 76.00 % – – – –
0.07 0.13 – – – –
ment over the standard approach with a significant increase in the classification accuracy. The biggest limitation of this work is lacking of enough neuroimaging data of BED-affected subjects. This restricts the presented classification model to an early detection system for predicting behavioral pattern related to ‘disinhibition’ of TFEQ. Also, lack of acceptance among medical practitioners to avail the benefits of this approach may put another hindrance. So, they must be fully convinced with the benefits of such intelligent systems.
176
S. Swati and M. Kumar
References 1. Mental disorders. https://www.who.int/news-room/fact-sheets/detail/mental-disorders (2022), Last accessed 13 June 2022 2. Cooper M, Reilly EE, Siegel JA, Coniglio K, Sadeh-Sharvit S, Pisetsky EM, Anderson LM (2022) Eating disorders during the covid-19 pandemic and quarantine: an overview of risks and recommendations for treatment and early intervention. Eat Disord 30(1):54–76 3. Mulkens S, Waller G (2021) New developments in cognitive-behavioural therapy for eating disorders (cbt-ed). Current Opin Psychiatry 34(6):576 4. Donnelly B, Touyz S, Hay P, Burton A, Russell J, Caterson I (2018) Neuroimaging in bulimia nervosa and binge eating disorder: a systematic review. J Eat Disord 6(1):1–24 5. Swati S, Kumar M (2022) Performance evaluation of machine learning classifiers for memory assessment using EEG signal. In: Industrial internet of things. CRC Press, pp 189–204 6. Ali A, Afridi R, Soomro TA, Khan SA, Khan MYA, Chowdhry BS (2022) A single-channel wireless EEG headset enabled neural activities analysis for mental healthcare applications. Wireless Pers Commun 1–15. https://doi.org/10.1007/s11277-022-09731-w 7. Dev A, Roy N, Islam MK, Biswas C, Ahmed HU, Amin MA, Sarker F, Vaidyanathan R, Mamun KA (2022) Exploration of EEG-based depression biomarkers identification techniques and their applications: a systematic review. IEEE Access 8. Zeng H, Yang C, Zhang H, Wu Z, Zhang J, Dai G, Babiloni F, Kong W (2019) A lightgbm-based EEG analysis method for driver mental states classification. Comput Intell Neurosci 2019 9. Najafi T, Jaafar R, Remli R, Wan Zaidi WA (2022) A classification model of EEG signals based on rnn-lstm for diagnosing focal and generalized epilepsy. Sensors 22(19):7269 10. Fouladi S, Safaei AA, Mammone N, Ghaderi F, Ebadi M (2022) Efficient deep neural networks for classification of alzheimer’s disease and mild cognitive impairment from scalp EEG recordings. Cognitive Comput 1–22 11. Kumari N, Anwar S, Bhattacharjee V (2022) A deep learning-based approach for accurate diagnosis of alcohol usage severity using EEG signals. IETE J Res 1–15 12. Rivera MJ, Teruel MA, Mate A, Trujillo J (2022) Diagnosis and prognosis of mental disorders by means of EEG and deep learning: a systematic mapping study. Artif Intell Rev 55(2):1209– 1251 13. Raab D, Baumgartl H, Buettner R (2020) Machine learning based diagnosis of binge eating disorder using EEG recordings. In: PACIS. p 97 14. Blume M, Schmidt R, Schmidt J, Martin A, Hilbert A (2022) EEG neurofeedback in the treatment of adults with binge-eating disorder: a randomized controlled pilot study. Neurotherapeutics 19(1):352–365 15. Babayan A, Erbey M, Kumral D, Reinelt JD, Reiter AM, Röbbig J, Schaare HL, Uhlig M, Anwander A, Bazin PL et al (2019) A mind-brain-body dataset of mri, EEG, cognition, emotion, and peripheral physiology in young and old adults. Sci Data 6(1):1–21 16. Godet A, Fortier A, Bannier E, Coquery N, Val-Laillet D (2022) Interactions between emotions and eating behaviors: main issues, neuroimaging contributions, and innovative preventive or corrective strategies. Rev Endocr Metab Disord 1–25 17. Löffler A, Luck T, Then FS, Luppa M, Sikorski C, Kovacs P, Tönjes A, Böttcher Y, Breitfeld J, Horstmann A et al (2015) Age-and gender-specific norms for the German version of the three-factor eating-questionnaire (tfeq). Appetite 91:241–247
A New Adaptive Digital Signal Processing Algorithm Shiv Ram Meena and Chandra Shekhar Rai
Abstract In this paper, a new gradient-based adaptive digital signal processing algorithm is proposed that works on the principle of steepest descent method. A new cost function is synthesized with aim to combine best part of LMS and RLS. The cost function is based on weighted least mean square error. The convergence of the adaptive digital signal processing algorithm is proved mathematically. Proposed algorithm was implemented successfully on noise cancellation to validate the convergence. The proposed weighted least mean square algorithm is easy to implement and fast converging. Performance of the algorithm was compared with least mean square algorithm in noise cancellation and found that proposed algorithm is faster converging. The proposed algorithm converges within 0.1 s, while LMS takes 0.6 s to converge. Further, the proposed algorithm is also implemented for 50 Hz powerline humming noise reduction from ECG signal and found robust performance. If the initial value of filter weights in the proposed algorithm is kept nonzero, then performance is much better. As previous errors contribute to error updating, the filter taps quickly reach their optimum value. The proposed algorithm convergence is validated by simulation results. This algorithm has robust and better performance than least mean square algorithm. Keywords Adaptive noise cancellation · LMS · New adaptive digital signal processing algorithm · Cost function · Adaptive filtering · Convergence of algorithm · ECG
S. R. Meena (B) · C. S. Rai University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, New Delhi 110078, India e-mail: [email protected] C. S. Rai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_16
177
178
S. R. Meena and C. S. Rai
1 Introduction Adaptive digital signal processing algorithms are well known in the research community and have various applications in different disciplines of science and technology, such as telecommunication, biomedical, multimedia entertainment, radar and sonar signal processing, and control engineering [1–8]. System identification, noise cancellation, channel equalization, etc. are popular applications since the inception of adaptive signal processing theory [2, 3, 9–11]. Practical implementation attracts more researchers to investigate adaptive algorithms. Therefore, as adaptive signal processing theory matures, these algorithms become more refined and sophisticated. For the best utilization of this improvement in adaptive algorithms, one must understand comprehensive details of how they really work instead of simply applying the algorithms, so more and more new algorithms are synthesized, and as a result, the number of adaptive digital signal processing algorithms has increased enormously. The least mean square (LMS) algorithm [3, 6, 12] is most popular due to its simplicity and is considered a benchmark for all researchers in the field of adaptive digital signal processing. Easy practical implementation is the cynosure of the LMS algorithm, which is a gradient-based algorithm and works on the principle of the steepest descent method. Its performance is good for stationary environments as well as slowly varying or non-stationary signals [13]. The main drawback of the LMS algorithm is slow convergence, and its performance depends on the selection of step size [14]. So, to overcome this problem, several adaptive digital signal processing algorithms are synthesized [1, 8, 15, 16]. A lot of the work is focused to improve the LMS performance such as mitigating the problem of step size and increasing convergence rate [14] normalized least mean square algorithm and several versions of variable step size least mean square algorithms formulated. There are a few other parameters that are considered while synthesizing new adaptive algorithms: minimum mean square error (MMSE), misadjustment, steady-state error, nature of signals being used as input to the system, and cost function. Several modifications and improvements in the LMS algorithms are available in the literature; all these are associated to mean square error-based cost functions with slight modification [16–18]. Still, different LMS-based algorithms are being developed, which make the large family of LMS algorithms. Another major family of adaptive algorithms is associated with the recursive least square (RLS) algorithm, which is derived from weighted square error [8]. RLS is faster than LMS, but its computational complexity is very high and it also suffers from roundoff noise. When any algorithm is implemented on hardware, then its structure and computational complexity play an important role, so RLS is not preferred for realtime applications, that has led to transforming the properties of the RLS algorithm into a new algorithm that can be similar to the LMS algorithm. This idea is being implemented for developing the new algorithm proposed in this paper where a new cost function is prepared as mean of weighted least square error and then steepest descent method is applied on this cost function. Valid assumptions are being made at different stages of derivation, which are mentioned in the next section of the paper.
A New Adaptive Digital Signal Processing Algorithm
179
Convergence is also mathematically demonstrated for the suggested algorithm, and simulation work is used to further validate it.
2 Proposed Algorithm Adaptive digital filters have the ability to maintain the desired performance over the changing input conditions by modifying their internal characteristics by altering the filter weights. The adjustment of filter weights is based on some adaptive algorithms. Generally, these algorithms work on the principle of minimizing some cost function, and this cost function is based on the error signal. The starting of adaptive filters is based on a transversal filter using the least mean square (LMS) algorithm. Various variations of LMS and RLS have been developed in the past few decades. There does not exist any algorithm which is performing well for all the applications. So, a new adaptive algorithm is introduced to fulfill the aim. A fundamental block diagram for any adaptive filtering algorithm is given in Fig. 1. The basic assumptions of an adaptive filter operating in the discrete-time domain are input sequence x(n), the reference signal d(n) or desired signal is statistically stationary as well as non-stationary signals, and y(n) is the output of the digital filter. In general, x[n] and y[n] can be stochastic or deterministic, or multidimensional sequences. Here, transversal filter (filter memory length equal to N = Number of coefficients in FIR filter) structure is considered for discussion whose output is given by the discrete-time convolution. y[n] =
N −1
h[k]x[n − k] or y[n] =
k=0
N −1
x[k]h[n − k].
(1)
k=0
FIR filter has N coefficients and the state is represented by W (n) at any instant of time n. The impulse response of the filter can be given by y(n) = W T (n)X (n),
(2)
and the error signal is defined as difference between d(n) and y(n). Fig. 1 Basic block diagram of an adaptive digital signal processing algorithm responsible for updating the adaptive filter weights
x(n)
Programmable Digital Filter
y(n) + e(n)
Adaptive algorithm for coefficient updating
d(n)
180
S. R. Meena and C. S. Rai
e(n) = d(n) − y(n).
(3)
This error signal is used by the adaptive algorithms to make a cost function; therefore, it is responsible for updating the adaptive filter coefficient vector w(n) according to some performance criteria. In general, the main aim of the adaptation process is to minimize some cost functions made up of error signals and force the output of the adaptive digital filter to approach the reference signal in a statistical sense. The proposed algorithm is based on the new cost function, which is defined as a weighted mean square error given by J (n) =
n
λ(n−k) |E[e(k)]|2 .
(4)
k=1
In this higher weightage is given to the latest average error. The average of the ensemble average is equal to the average of the random variable, so this will give an average mean square error for a statistically stationary signal as well as for statistically non-stationary signals. The cost function J(n) in the equation is quadratic in the parameters {wi (n)}, and the cost function is also differentiable; therefore, there exists a global minimum. Thus, we can use the results from classical optimization theory to determine the point of minima Wopt (n), i.e., at the point of minima on the smooth error surface of cost function, the derivative of the cost function with respect to the chosen parameter is zero. Thus, Wopt (n) can be found from the solution to the system of equations. ∂ J (n) = 0 for 0 < i < N − 1. ∂wi (n)
(5)
Taking derivatives of J (k) with respect to wi (k), where k represents iteration number. k ∂ J (k) ∂e(n) = λk−n E e(n) ∂wi (k) ∂ Wi (n) n=0 k ∂ J (k) ∂e(n) =2 . λk−n E e(n) ∂wi (k) ∂ Wi (n) n=0
(6)
But error can be expressed as e(n) = d(n) − y(n) = d(n) − w T x(n).
(7)
By defining the autocorrelation matrix Rx x (n) and cross-correlation matrix Pd x (n) as
A New Adaptive Digital Signal Processing Algorithm
181
Rx x (n) = E X (n)X T (n) and Pd x (n) = E{d(n)X (n)},
(8)
respectively, the gradient can be written as k ∂ J (k) = −2 λk−n (Pxd (n) − Rx x (n)W (n)). ∂wi (k) n−0
(9)
If the matrix Rx x (n) is invertible, then optimum weights can be calculated by equating gradient equal to zero. −2
k
λk−n (Pxd (n) − Rx x (n)W (n)) = 0.
n−0
As λ is positive values, so Wopt = Rx−1 x (n)Pd x (n)
(10)
which is similar to Wiener solutions, an inverse of the autocorrelation matrix is required to find optimal filter weights, and it is not desirable. So, the method of steepest descent is chosen to avoid this matrix inversion. According to this method, the weights of the filter assume a time-varying form, and their values are adjusted in an iterative fashion along the error surface with the aim of moving them progressively toward the optimum solution [19]. Successive adjustments applied to the tap weights of the filter will be in the direction of steepest descent of the error surface, i.e., in a direction, opposite to the gradient vector whose elements are defined by ∇w J . ∇W J (n) = −2
n
λn−k E X (k) d(k) − W T X (k) .
(11)
k=0
Now, substituting the gradient vector value in the weight update equation given by steepest descent method that is Wˆ (n + 1) = Wˆ (n) + μ(−∇W (J (n)), W (n + 1) = W (n) + 2μ
n
(12)
λn−k E(X (k)d(k)) − E X T (k)X (k)W
k=0
W (n + 1) = W (n) + 2μ
n
λn−k [Pxd − Rx x W ].
(13)
k=0
Now, autocorrelation function and cross-correlations function are defined as ensemble averages given by Eq. (8). But, if the physical processes responsible for the
182
S. R. Meena and C. S. Rai
generation of the input signals and desirable signals are jointly ergodic [12], then it is justified that time average can be substituted instead of ensemble averages. Although the statistical average does not change with the time for statistically stationary signals but if the signal is having nonlinearity then ensemble average may change slightly, then this change can be included in the update by giving more weight to the latest ensemble average. As per ergodicity any, collection of the random sample from a process must represent the average statistical properties of the entire process [16]. However, this procedure depends on the statistical quantities, i.e., expected value E{d(n)x(n)} and E{x(n)x(n)} defined in Pd x (n) and Rx x (n), respectively. In practice, we only have measurements of both d(n) and x(n) to be used within the adaptation procedure. While suitable estimates of the statistical quantities could be determined from the signals x(n) and d(n), we instead develop an approximate version that depends on the signal values themselves. When the filter is operated in an unknown environment, these correlation functions are not available; in such case, we are forced to use estimates in their place [19]. Instantaneous estimates of the autocorrelation function and cross-correlation function can be deduced directly from the definition of autocorrelation function and crosscorrelation function as follows: Rˆ x x (n) = X (n)X T (n) and Pˆd x (n) = d(n)X (n).
(14)
These estimated definitions of autocorrelation matrix Rˆ x x (n) and cross-correlation matrix Pˆd x (n) have been generalized to include a non-stationary environment, in which all the input signals and desired responses are assumed to be time-varying forms too. So, by substituting these estimated autocorrelation matrices and crosscorrelation matrix, we can rewrite the estimates of update for the weights as follows: Wˆ (n + 1) = Wˆ (n) + 2μ
n
λn−k d(k) − X T (k)W X (k)
k=0
Wˆ (n + 1) = Wˆ (n) + 2μ
n
λn−k e(k)X (k).
(15)
k=0
For unknown environment, the weight vector Wˆ (n) is representing the “estimate” of W (n). So, the above equation is the final update equation for the designed algorithm.
3 Proof of Convergence for Proposed Algorithm There are two popular distinct aspects of the convergence of the adaptive digital signal processing algorithms [8]. First is convergence in the mean and second is convergence in the mean square.
A New Adaptive Digital Signal Processing Algorithm
183
Let us define the optimum filter weight as Wopt (n) = w1∗ (n) + w2∗ (n) + w3∗ (n) + · · · + w ∗N (n).
(16)
T Desired signal is then given by d(n) = Wopt (n)X (n)
(17)
say for any time instant n 0 , w(n) = Wopt (n) the algorithm converges completely, i.e., e(n) = 0, for n ≥ n 0 . As the desired response and input to the adaptive algorithm are random process, so convergence should reach in mean. For appropriate choice of step size, we have E[W (n)] → Wopt as n → ∞ [18]. This requires an assumption that x(n) and w(n) are statistically independent. Also desired signal is assumed to be deterministic signal being generated by practical LTI system on deterministic input [12]. In this case, convergence of any adaptive digital signal processing algorithm can be investigated by proving the mean value of filter weights after infinite iteration should approach optimum filter weights. Now, define a weight error vector. w (n) = W (n) − Wopt .
(18)
The filter weight update equation for proposed algorithm can be written as W (n + 1) = W (n) + 2μ
n
λn−k e(k)X (k)
k=0
W (n + 1) − Wopt = W (n) − Wopt + 2μ
n
λn−k d(n) − X T (k) W (k) − Wopt + Wopt X (k).
k=0
Using Eqs. (2) and (4) w (n + 1) =w (n) + 2μ
n
λn−k d(k)X (k)
k=0
− 2μ
n
λn−k w (k)X T (k)X (k)
k=0
− 2μ
n k=0
λn−k opt (k)X T (k)X (k)
184
S. R. Meena and C. S. Rai
w (n + 1) =w (n) + 2μ
n
λn−k d(n) − Wopt (k)X T (k) X (k)
k=0
− 2μ
n
λn−k w (k)X T (k)X (k)
k=0
w (n + 1) =w (n) + 2μ
n
λn−k e(n)opt X (k)
k=0
− 2μ
n
λn−k w (k)X T (k)X (k).
(19)
k=0
Now, define the mean of weight deviation vector v = E[w (n)].
(20)
Taking expected value on both sides of Eq. (19) and by using Eq. (20) v(n + 1) = v(n) + 2μ
n
λn−k E e(n)opt X (k)
k=0
− 2μ
n
λn−k v(k)E X T (k)X (k) .
(21)
k=0
By assuming that weight error vector and input signal are independent, expectation value is applied. Once the mean weight deviation vector is constant, the proposed algorithm will converge and v(n + 1) − v(n) = 0. Then, proposed algorithm will surely converge if 2μ
n
λn−k E e(n)opt X (k) − v(k)E X T (k)X (k) = 0.
k=0
This is called the condition of convergence for the proposed algorithm. Now, Eq. (8) can be used in this equation for further analysis also assuming that optimum error is zero. n k=0
λn−k v(k)Rx x (k) = 0.
(22)
A New Adaptive Digital Signal Processing Algorithm
185
From this equation, we can conclude that if the mean weight deviation vector is constant, then the autocorrelation vector can be positive and negative such that the resultant effect is nullified. Therefore, λ = Rx1x(0) or forgetting factor should be inverse of spectral power of input signal at the initial iteration for convergence of the proposed algorithm. To verify the convergence of the proposed algorithm, it was implemented for noise cancellation using MATLAB. The simulation is done to prove that proposed adaptive digital signal processing algorithm converges.
4 Basic Theory of Noise Cancellation The problem of noise is becoming more tedious to handle in today’s world due to the Industrial Revolution. Because we are constantly surrounded by various industrial equipment such as engines, transformers, compressors, blowers, AC, and various processors that are now an inseparable part of daily life, we are subjected to various types of noise [10]. The basic task of noise reduction techniques is to extract useful information from a noisy input signal that has been corrupted. All practical signals are contaminated by surrounding noisy signals. So, it can be assumed that signals are carriers of useful information that may be contaminated by unwanted signals, i.e., noise [17]. The proposed algorithm is tested for noise cancellation using the two-sensor model [13]. Figure 2 shows the basic structure of adaptive noise cancellation [13]. A signal x(n) is an important information signal fed to the sensor. This sensor also detects noise n(n) that is uncorrelated with the input signal x(n). The signal and the noise combine to form P(n) = x(n) + n(n), which serves as one input to the noise canceller system. The secondary input, n1(n), from another sensor is uncorrelated with the original signal x(n), but it may be correlated with the noise added to the primary signal, n(n). This secondary input is used as a reference input in the adaptive noise cancellation system. The secondary signal is used to produce the output of the adaptive filter y(n), which is close to a replica of the noise n(n) (added in the primary signal). This output y(n) of the adaptive filter is subtracted from the primary input p(n) = x(n) + n(n) to produce the output (e(n) = p(n)-y(n)) of the adaptive noise cancellation system. The system output serves as the error signal for the adaptive process. This error signal is used as a feedback signal by the noise cancellation system during the adaptation process to adjust the filter weights through a chosen adaptive digital signal processing algorithm. Prior knowledge of the signal x(n) or the noises n(n), n1(n) would be necessary before the filter could be designed to produce the noise-cancelling signal y(n). So, assume that x, n, n1, and y are statistically stationary and have zero means. Now, output z is z = x + n − y,
(23)
186
S. R. Meena and C. S. Rai
Fig. 2 Block diagram of noise cancellation [6]
Signal Source
Noise Source
z=x+n-y
Primary Input
x+n
n1 Reference Input
+
Σ
-
Adaptive Filter
z System Output
Filter y Output
e Error
z 2 = s 2 + (n − y)2 + 2x(n − y).
(24)
Taking expectations on both sides of above equation E z 2 = E x 2 + E (n − y)2 + 2E[x(n − y)].
(25)
If x is uncorrelated with n, then output y will also be uncorrelated with x; therefore, E[x(n − y)] = 0. E z 2 = E x 2 + E (n − y)2 .
(26)
The input signal power E x 2 will be unaffected because the filter is adjusted to minimize E z 2 . Therefore, minimum output power will be min E z 2 = E x 2 + min E (n − y)2 .
(27)
The filter weights are adjusted to minimize E z 2 ; as a result, E (n − y)2 is also minimized. If the filter output y is best estimates of the noise present in the primary signal, then E (n − y)2 is minimized. And as z − x = n − y, therefore E (z − x)2 is also minimized. The output z is therefore the best estimate of the signal x when the filter is modified to minimize the total output power [1]. In addition to the original signal x, some residual noise will be present in the output z. The residual noise will be given by (n − y). As power corresponding to x in the output power will remain constant, so minimizing total output power means maximizing signal-to the output noise ratio (SNR). The smallest possible output power is E z 2 ]= E[x 2 only when E (n − y)2 = 0. This is possible if y = n and z = x, that is ideal condition. Minimizing the output power leads the output signal to be perfectly noise free in noise reduction process. For comparison, MATLAB simulation was also done for least mean square algorithm as well as proposed algorithm for same noise cancellation.
A New Adaptive Digital Signal Processing Algorithm
187
5 Simulation Results The proposed algorithm is implemented for noise cancellation system having two sensory inputs. The simulation is aimed to prove the convergence and verification of the mathematical analysis for proposed algorithm. Through, the convergence is validated by simulation and performance of the proposed algorithm is compared with the LMS for same setup of adaptive noise canceller. Both the algorithms are using same length of the adaptive FIR filter and input parameters kept similar. A low-frequency sinusoidal signal of 1.4 s was considered as information signal for the system. A high-frequency signal random noise signal is added in the sinusoidal signal to produce primary signal. Secondary input was fad with a signal that is correlated with the noise used in primary signal. Simulation was done by making MATLAB code for each algorithm. It was observed from Fig. 3 for LMS that initially output signal is same as noisy input and slowly tends toward original signal. And after 0.6 s output becomes recognizable replica of the information signal, but still there is some residual noise present in the output of the system. Simulated results for new algorithms are shown in Fig. 4, and this algorithm converses very fast because of cumulative effect of weighted errors. It can be observed from Fig. 4, that within 0.1 s, the system produces output, which is replica of the information signal (sinusoidal signal in this case) with very less residual error. The changing fluctuation in the error or deviation from the original signal means the solutions of this algorithm are vibration near optimum solution. This vibrations’ effect may be reduced by taking proper value of weighting factor present in the algorithm or by bounding the contribution of the previous terms to the present
Fig. 3 Simulated results of LMS algorithm for noise cancellation
188
S. R. Meena and C. S. Rai
Fig. 4 Simulated results of proposed algorithms for noise cancellation
term. Further another analysis was also done that if filter length increases via keeping all other parameters same, then performance of LMS further deteriorates, while the new algorithm gives the same performance irrespective of the filter length. Further, the proposed algorithm is also implemented for removing the 50 Hz powerline noise from ECG signal. The simulated results are very good in this case also. Figure 5 shows that proposed algorithm is working well for noise reduction in ECG signal. About 50 Hz powerline humming noise [7] is mixed with ECG signal, and then, mixed noisy ECG signal is passed through noise reduction system. Simulated results in Figs. 5, 6, and 7 show that this algorithm is working well to remove powerline noise from ECG signal. The residual error in the output is of the order of 10−7 in proposed algorithm, while LMS has residual error of order 10−4 . So, it can be said that proposed algorithm is better than LMS.
A New Adaptive Digital Signal Processing Algorithm
189
Fig. 5 Simulation results of 50 Hz powerline noise reduction from ECG signal using proposed new adaptive digital signal processing algorithms
Fig. 6 Simulation results of 50 Hz powerline noise reduction from ECG signal using proposed algorithm and LMS
190
S. R. Meena and C. S. Rai
Fig. 7 Residual error in denoised output of noise reduction system used for 50 Hz powerline noise reduction from ECG signal using proposed algorithm and LMS
6 Conclusion A new adaptive digital signal processing algorithm weighted least mean square (WLMS) is proposed. And, convergence in mean has been proved by showing that the mean of the weight deviation error vector can be minimized, if forgetting factor is equal to inverse of power spectrum of input signal at the initial iterations. The convergence is validated by simulation results for noise cancellation. If the initial value of filter weights in the proposed algorithm is kept nonzero, then performance is much better. Because previous errors contribute to error updating, the filter taps quickly reach their optimum value. The ECG results explain that this algorithm give robust and better performance than LMS. It can be concluded that the proposed algorithm will converge very fast. This simulation is only for understanding the behavior of the proposed algorithm. Further, proposed algorithm also reduces the tension of enhancing the length of the filter, as performance is unaltered even after enhancing the adaptive filter length. So, the proposed algorithm can be more useful where fast convergence is required. Further, future scope of work is analysis of convergence in variance for the proposed algorithm and the residual error or steady-state error present in the output can be minimized using a blocked version of the proposed algorithm.
References 1. Aslam MS, Raja MAZ (2015) A new adaptive strategy to improve online secondary path modeling in active noise control systems using fractional signal processing approach. Sign Process 107:433–443 2. Widow B (1975) Adaptive noise canceling: principles and applications. Proc IEEE 63:1692– 1716 3. Widrow B, Stearns SD (2007) Adaptive signal processing. Pearson Education Asia
A New Adaptive Digital Signal Processing Algorithm
191
4. Jain D, Beniwal P (2022) Review paper on noise cancellation using adaptive filters. Int J Eng Res Technol (IJERT) 11(01) 5. Tuvshinbayar K, Ehrmann G, Ehrmann A (2022) 50/60 Hz power grid noise as a skin contact measure of textile ECG electrodes. Textiles 2(2):265–274. https://doi.org/10.3390/textiles2 020014 6. Bellanger MG (2001) Adaptive digital filters. Revised and Expanded, 2nd edn. Marcel dekker inc. New York 7. Nabil M, Mursi A, Nerma MHM (2015) Reduction of power line humming and high frequency noise from electrocardiogram signals. Int J Sci Technol Res 4(06) 8. Haykin S (2014) Adaptive filter theory. Pearson Education Asia LPE, 5th edn. 9. Lampl T (2020) Implementation of adaptive filtering algorithms for noise cancellation (Dissertation). Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-33277 10. Sugadev M, Kaushik M, Vijayakumar V, Ilayaraaja KT (2022) Performance analysis of adaptive filter algorithms on different noise sources. In: 2022 International conference on computer communication and informatics (ICCCI), Coimbatore, India, pp 1–5 11. Tanji AK Jr, de Brito MAG, Alves MG, Garcia RC, Chen G-L, Ama NRN (2021) Improved noise cancelling algorithm for electrocardiogram based on moving average adaptive filter. Electronics 10(19):2366 12. Chakraborty M, Sakai H (2005) Convergence analysis of a complex LMS algorithm with tonal reference signals. IEEE Trans Speech Audio Process 13(2) 13. Meena SR, Rai CS (2020) Effect of eigenvalue spread in noise cancellation of two sensory systems using adaptive algorithms. J Stat Manag Syst (Taylor and Francis) 23(1):157–169 14. Huang F, Zhang J, Zhang S (2016) Combined-step-size affine projection sign algorithm for robust adaptive filtering in impulsive interference environments. IEEE Trans Circ Syst II 63:493–497 15. Bermudez JCM, Bershad NJ, Tourneret JY (2011) Stochastic analysis of an error power ratio scheme applied to the affine combination of two LMS adaptive filters. Sign Process (Elsevier) 91:2615–2622 16. Parreira WD, Costa MH, Bermudez JCM (2018) Stochastic behavior analysis of the Gaussian KLMS algorithm for a correlated input signal. Sign Process 152:286–329 17. Dixit S, Nagaria D (2017) Design and analysis of cascaded LMS adaptive filters for noise cancellation. Circ Syst Sign Process 36:742–766. https://doi.org/10.1080/09720510.2020.172 1634. https://doi.org/10.1109/ICCCI54379.2022.9740807 18. Sakai H, Yang JM, Oka T (2007) Exact convergence analysis of adaptive filter algorithms without the persistently exciting condition. IEEE Trans Sign Process 55(5):2077–2083 19. Satpathy D, Nayak AP (2015) Noise cancellation using NCLMS adaptive algorithms. Int J Eng Technol 3(25). https://doi.org/10.3390/electronics10192366
Building Domain-Specific Sentiment Lexicon Using Random Walk-Based Model on Common-Sense Semantic Network Minni Jain, Rajni Jindal, and Amita Jain
Abstract Sentiment lexicons are an essential resource for research in the sentiment analysis field. Many existing sentiment lexicons like SentiWordNet, SenticNet, etc. are available, but they do not cover words of every domain with their polarity. They contain only a finite number of words. Thus, the automatic construction of domainspecific sentiment lexicons becomes an important task. In this paper, we present a framework to build a domain-specific sentiment lexicon based on common-sense knowledge automatically. The proposed work is the first work, that provides sentiment values to words, word phrases, foreign words, and out-of-vocabulary (OOV) words. For lexicon generation, the sentiment value of the seed words is propagated using a random walk model over the network constructed from ConceptNet. In this work, we provide a sentiment weight to every relation of ConceptNet based on the assumption that concepts propagate their sentiment values to their neighbors depending on the type of relationships connecting them. In the experimental results, we show that the proposed strategy can improve the polarity accuracy of the sentences having word phrases, foreign words, and out-of-vocabulary (OOV) words. Keywords Sentiment analysis · ConceptNet · Machine learning · Lexicon-based approach
M. Jain (B) · R. Jindal Computer Science and Engineering, Delhi Technological University, Delhi, India e-mail: [email protected] A. Jain Computer Science and Engineering, Netaji Subhas University of Technology, Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_17
193
194
M. Jain et al.
1 Introduction Sentiment analysis handles the automatic finding and evaluation of users’ sentiments or opinions from the data available on the web such as reviews, Twitter, blogs [1]. This area has gained significant interest in recent times because of its practical usage and application in today’s environment. The expansion of Web 2.0 tools and knowledge has directed the great advancement of opinion data, i.e., available online. It has now become a valued source for opinion mining. Presently, many opinion-mining approaches are using lexicon-based methods. In this work, we propose a method to build a domain-specific sentiment lexicon automatically. In this work, to build the lexicon, a common-sense semantic network is generated using ConceptNet to incorporate semantic and common-sense knowledge. The use of ConceptNet further enhances the quality of the sentiment lexicon by involving common sense, a large set of relations (38 relations), word phrases, and foreign words, thus making it more powerful. The proposed work is the first work, that provides sentiment values to words, word phrases, foreign words, and out-of-vocabulary (OOV) words. Common-sense knowledge refers to “the millions of basic facts and understanding possessed by human beings”. For instance, if a person says I bought a new phone, the second person will easily understand that he used the money to buy a new phone. Humans can easily find the relationships between terms/words using commonsense knowledge. Providing the same common-sense knowledge to machines is very important. ConceptNet provides this common-sense knowledge as a semantic network. ConceptNet is a “semantic network consisting of nodes linked by edges. The nodes signify the concepts and edges signify the semantic relationships between two concepts” [2]. In literature, existing methods to construct sentiment lexicons are manual, lexiconbased [3], and corpus-based methods [4]. The first method, i.e., manual method requires human annotators to assign the polarity to words as positive, negative, or neutral. Second lexicon-based methods use different semantic relations like “antonyms” and “synonyms” to grow the polarity words based on a small set of seed (labeled) words. The last is a method based on a corpus that utilizes the patterns (co-occurrence patterns) and grammar rules to create lexicons. To build a domainspecific sentiment lexicon, the proposed work uses the positives of both lexicon-based and corpus-based methods. Today a large number of opinionated texts are available on the Internet in the form of reviews, blogs, discussion forums, posts, etc. Therefore, sentiment classification and opinion mining can be potentially useful for my many applications such as search engines, recommendation systems, and market analysis. Today approaches for sentiment analysis have evolved from words to concept based. Concept-based techniques use semantic knowledge bases which help the system to identify the effective and conceptual information associated with opinions [5–7]. The main objective of the
Building Domain-Specific Sentiment Lexicon Using Random …
195
proposed work is to construct a domain-specific sentiment lexicon using a commonsense knowledge base (ConceptNet). This is the first work, that provides sentiment values to words, word phrases, foreign words, and out-of-vocabulary (OOV) words. This work uses a set of seed words with assigned polarity. The sentiment value of the seed words is propagated using a random walk model over the network constructed from ConceptNet. In this work, we provide a sentiment weight to every relation of ConceptNet based on the assumption that concepts propagate their sentiment values to their neighbors depending on the type of relationships connecting them. This method uses a machine learning approach, i.e., gradient descent to provide weightage to the relations. The sample lexicon for the hate speech domain is shown in Table 1. The remaining sections are organized as follows. Section 2 reviews some prior related work on the automatic generation of sentiment lexicon. Section 3 presents the proposed method for a common-sense-based domain-specific sentiment lexicon. Section 4 describes the experiments and results. Section 5 finally presents the conclusion of the work.
2 Related Work This section reviews some prior work, available for constructing sentiment lexicons, mainly connected to our proposed work. Researchers in existing work used various resources like WordNet and ConceptNet to generate sentiment lexicon. The proposed work is the first work to generate a domain-specific sentiment lexicon based on common-sense knowledge. Some prior works are: Hu and Liu [8] proposed an effective and simple approach by exploiting the relations, antonym, and adjective synonyms of resource WordNet to find the polarity of adjectives. They manually constructed a set of seed words including common adjectives only. The main idea was that if the word with unknown polarity is a synonym of a positive word, its polarity also will be positive. Otherwise, it will be negative. They produced a lexicon of around 6800 words called “Opinion Lexicon”. Kamps et al. [9] offered a method to find out the sentiment orientation of words by using a semantic network, i.e., WordNet. They constructed graph using two adjectives as seed words. Then, they calculated shortest between words and seed words. The core notion is comparable to the [10] work. Tsai et al. [10] described a way to build concept-level sentiment dictionary based on commonsense knowledge. In this work, author generated all-purpose (general) sentiment lexicon using ConceptNet. To propagate the values, they used random walk model with iterative regression. Tai et al. [11] and Huag et al. [12] proposed domain-specific sentiment lexicon using label propagation.
196
M. Jain et al.
Table 1 Sample of the proposed lexicon generated for domain “hate speech” Type
S. No.
Word/phrase
English words
1
Best
Word phrases
Foreign words
Polarity 0.648
2
Affection
3
Sarcoma
−0.362
0.519
4
Wedding
0.529
5
War
−0.54
6
Steal
−0.323
7
Immorality
−0.491 −0.476
8
Abhorrent
9
Atone
10
Irreverence
0.383 −0.491
1
Cruel and unusual punishment
−0.202
2
Racial discrimination
−0.325
3
Learning something
0.788
4
Forgive someone
0.654
5
Voice an opinion
0.247
6
You commit a crime
−0.672
7
Watching a TV show
0.575
8
Buy something for a loved one
0.351
9
Have a nervous breakdown
−0.15
10
Tasting something sweet
1
Aborrecimiento (language—Portuguese, meaning—abhorrence)
0.18
2
Redimir (language—Spanish, meaning—to redeem)
3
Halveksia (language—Finnish, meaning—despise)
−0.743
4
Gaizkinahi (language—Basque, meaning—bad guys)
−0.797
5
Aikamoinen (language—Finnish, meaning—quite a)
6
Afligit (language—Catalan, meaning—afflicted)
7
Harridura (language—Basque, meaning—exclamation)
0.165
8
Injusticia (language—Spanish, meaning—injustice)
−0.583
9
Expier (language—French, meaning—to make amends)
−0.736 0.383
0.297 −0.272
0.383 (continued)
Building Domain-Specific Sentiment Lexicon Using Random …
197
Table 1 (continued) Type
S. No.
Word/phrase
Polarity
10
Loukata (language—Finnish, meaning—violate)
−0.303
3 Proposed Work The proposed approach is based on semi-supervised learning as it uses a few seed words with known polarities to enlarge the sentiment lexicon to thousands of words. The popular random walk algorithm is used and modified for this purpose. The proposed approach broadly focuses on (as shown in Fig. 2): (1) collecting standard domain-specific seed words with calculated polarities, (2) extracting ConceptNet relations for network construction and initializing their weights, (3) collecting standard domain-specific seed words with calculated polarities, (4) extracting ConceptNet relations for network construction and initializing their weights, (5) training the weights using a standard gradient descent and a random walk algorithm, (6) Finally apply a random walk algorithm on the graph with trained weights for classification, and (7) plotting a graph between Average Polarity Error and weights of ConceptNet relations. The proposed approach first constructs a network where two nodes are linked if a semantic relation exists between them. This work uses ConceptNet for semantic relations; it also accounts for common-sense, foreign language words and phrases, i.e., multiple words for single node. The resulting graph G is a graph (C, E), where C is a set of concepts in ConceptNet. E is the set of edges connecting concepts/nodes in the graph. Graph G is generated using a domain-specific list of seed words (initial nodes) having known polarity. To get prior sentiment knowledge of seed words, SenticNet is used. Next graph is extended to level L by adding related concepts from ConceptNet. The ConceptNet graph contains labeled (green nodes with known polarity) as well as unlabeled (blue nodes with unknown polarity) which is shown in Fig. 1. Now, the weights of relations in graph G are trained using standard gradient descent and random walk algorithm. Sentiment weights are provided to all relations of ConceptNet. Assigning weights to relations is based on the assumption that concepts pass their sentiment value to their neighbor concepts in different ways depending on the relations connecting them. All the relations cannot have same sentiment weightage. So, we use machine learning approach, gradient descent to provide weightage to relations (shown in Table 2). Step 1: Initialize the relations’ weights with random values and calculate cost. We initialized the weight of 0.1 to all the direct relations and −0.1 to all inverse relation. Inverse or negative relations are tend to change the polarity of connected nodes. In our work, cost is Average Polarity Error (APE), i.e., difference between the classified sentiment value and the reference sentiment value for all the classified words. Reference sentiment value is SenticNet value of node. Step 2: Calculate the gradient, i.e., change in APE when the weights of relations are changed by a very small value from their original randomly initialized value. This helps us to move the values of
198
Fig. 1 Snippet of ConceptNet graph for seed word of domain “Hate speech”
Fig. 2 Overall framework of proposed approach
M. Jain et al.
0.11
0.07
0.11
0.13
0.11
0.13
0.07
0.09
0.11
0.11
0.11
0.09
0.11
0.11
0.11
0.11
0.09
0.11
0.11
0.11
0.09
0.11
0.11
0.09
0.11
0.11
0.09
0.09
FormOf
RelatedTo
Causes
Synonym
HasProperty
DefinedAs
Entails
TranslationOf
MotivatedByGoal
Desires
MadeOf
InstanceOf
HasA
SymbolOf
CausesDesire
AtLocation
DerivedFrom
HasSubevent
0.09
0.09
0.11
0.11
0.11
0.07
0.11
0.15
0.07
0.11
0.09
0.11
5
CapableOf
1
IsA
Relations
Iterations
0.08
0.08
0.10
0.06
0.02
0.10
0.12
0.10
0.08
0.12
0.10
0.06
0.12
0.16
0.12
0.06
0.08
0.12
0.14
0.06
10
Table 2 Weights of direct relations during successive iterations
0.07
0.09
0.09
0.03
0.03
0.07
0.11
0.13
0.09
0.13
0.07
0.05
0.13
0.19
0.09
0.03
0.05
0.11
0.17
0.01
15
0.04
0.08
0.08
0.02
0.04
0.08
0.10
0.14
0.06
0.16
0.10
0.06
0.10
0.16
0.10
0.08
0.04
0.1
0.16
0.02
20
0.05
0.07
0.07
0.05
0.05
0.03
0.11
0.13
0.07
0.13
0.09
0.05
0.13
0.19
0.13
0.07
0.03
0.11
0.19
0.01
25
0.06
0.10
0.06
0.04
0.06
0.04
0.14
0.14
0.08
0.10
0.06
0.04
0.14
0.18
0.10
0.08
0.06
0.10
0.14
0.02
30
0.05
0.13
0.09
0.03
0.05
0.03
0.19
0.13
0.03
0.13
0.05
0.07
0.13
0.15
0.11
0.07
0.03
0.07
0.13
0.01
35
0.02
0.16
0.10
0.01
0.02
0.04
0.18
0.14
0.06
0.16
0.02
0.10
0.14
0.16
0.12
0.08
0.02
0.06
0.14
0.02
40
0+ (continued)
0.2 0+
0.08
0.02
0.1
0.02
0.14
0.14
0.08
0.12
0.02
0.08
0.18
0.16
0.04
0.15
0.11
0.01
0.07
0.01
0.19
0.15
0.07
0.15
0.03
0.11
0.13
0.17
0.07
0.12
0.01 0.11
0.08
0+
0.08
0.02
50
0.07
0.11
0.01
45
Building Domain-Specific Sentiment Lexicon Using Random … 199
−0.11
−0.07
−0.07
−0.11
−0.13
−0.05
−0.13
−0.09
0.11
0.09
0.11
0.09
0.09
−0.11
−0.09
−0.09
−0.11
−0.11
−0.09
−0.11
−0.09
Part of
ReceivesAction
EtymologicallyRelatedTo
CreatedBy
HasFirstSubevent
Antonym
NotCapableOf
NotHasA
NotHasProperty
NotIsA
NotDesires
NotMadeOf
DistinctFrom
77.6
0.05
0.09
UsedFor
Accuracy
0.11
0.09
HasContext
0.45
0.09
0.11
SimilarTo
Average Polarity Error
0.11
0.11
79.2
0.43
0.09
0.13
0.07
0.09
0.13
0.09
0.11
5
HasPrerequisite
1
HasLastSubevent
Relations
Iterations
Table 2 (continued)
78.3
0.43
−0.06
−0.12
−0.06
−0.08
−0.12
−0.06
−0.08
−0.08
0.06
0.06
0.14
0.10
0.14
0.10
0.06
0.12
0.10
0.08
10
77.9
0.41
−0.07
−0.11
−0.07
−0.07
−0.15
−0.09
−0.11
−0.07
0.05
0.05
0.11
0.09
0.13
0.15
0.11
0.11
0.09
0.09
15
78.5
0.39
−0.08
−0.14
−0.02
−0.06
−0.18
−0.06
−0.12
−0.02
0.08
0.06
0.12
0.06
0.12
0.16
0.08
0.10
0.08
0.08
20
80.9
0.38
−0.09
−0.11
−0.03
−0.07
−0.23
−0.03
−0.13
0−
0.07
0.09
0.11
0.07
0.09
0.13
0.13
0.09
0.07
0.07
25
81.3
0.38
−0.08
−0.16
−0.06
−0.10
−0.24
−0.04
−0.18
0−
0.12
0.10
0.10
0.08
0.08
0.12
0.12
0.08
0.10
0.04
30
82.2
0.37
−0.11
−0.17
−0.07
−0.07
−0.25
−0.07
−0.19
−0.01
0.11
0.09
0.09
0.07
0.07
0.13
0.13
0.09
0.11
0.07
35
81.4
0.37
−0.12
−0.22
−0.10
−0.08
−0.26
−0.06
−0.20
0−
0.12
0.08
0.10
0.08
0.08
0.16
0.14
0.10
0.10
0.08
40
80.7
0.38
−0.11
−0.23
−0.09
−0.07
−0.23
−0.09
−0.21
0−
0.177
0.07
0.11
0.07
0.09
0.13
0.13
0.09
0.13
0.07
45
83.4
0.36
-0.12
−0.20
−0.04
−0.04
−0.22
−0.10
−0.24
0−
0.12
0.06
0.12
0.08
0.08
0.16
0.10
0.06
0.12
0.08
50
200 M. Jain et al.
Building Domain-Specific Sentiment Lexicon Using Random …
201
relations in the direction in which APE is minimized. Weights of direct relations are increased by a small value, i.e., 0.01 and weights of inverse relations are also decreased by 0.01. Step 3: Adjust the relation weights with the gradients to reach the optimal values where APE is minimized. Step 4: Use the new trained relation weights for sentiment lexicon generation and to calculate the new APE. Step 5: Repeat steps 2 and 3 till further adjustments to weights do not significantly reduce the APE.
3.1 Random Walk Model Over Weighted ConceptNet Network After the construction of concept graph with weighted relations, next is generating a sentiment lexicon by propagating known values using random walk model. An improvised random walk algorithm is applied to propagate the sentiments from labeled nodes to unlabeled nodes. A random walk starts from a blue node and may end up on a green node, thus affecting the polarity of the blue node by propagating sentiments from the green node. The walk begins from an unlabeled word which may be related to other nodes in the graph with weighted edges of given magnitudes. For this, we defined transition probability Pt+1|t ( j|i ) from node 1 to node j by normalizing the weights of the edges of node i. Ci j , k=1 (C ik )
Pt+1|t ( j|i ) = n
where k represents all nodes in the neighborhood of i, and Pt+1|t ( j|i ) denotes the transition probability from node I at step t to node j at time step t + 1. The length of the random walk is restricted to a length β, and count of random walk is restricted to α. The walk ends if a unlabeled node is reached to labeled node. At the end of the process, unlabeled word is labeled with a polarity magnitude directly which is proportional to the polarity magnitude of the labeled word reached at the end of the walk and to the average weight encountered in its journey. Each relation with it has an associated weight, that is if there is a directed edge of a relation between two words i and j with weight w then, if polarity of i is Pi then polarity of j, i.e., pj is, P j = Pi ∗ (1 + w). If the words i and j are connected by multiple (say n) edges, then the Geometric Mean weight, wGM , can be calculated by taking the nth root of the absolute value of the product of the weights of intermediate edges. For e.g., if there are n intermediates edges, W i for i = 1 to n from node(word) i to node(word) j, then WGM = |Wi , for i = 1 to n|1/n . Further, W Gm is multiplied by −1 if there is an odd number of inverse relations in the n edges.
202
M. Jain et al.
The initialized weight could have been anything though it has to be chosen wisely so that the convergence does not get stuck in the local minima. The weights of inverse relations were avoided to be taken −0.1 due to sum of probabilities restrictions.
4 Experimental Results Astonishing results were obtained just from using the seed [‘good’, 0.664], [‘bad’, −0.36], [‘Love’, 0.655], [‘hate’, −0.83], [‘right’, 0.67], [‘wrong’, −0.76] related to hate speech domain taken from a previously generated sentiment dictionary, i.e., SenticNet and related words going as deep as two layers. These six words gave more words with their predicted polarity (around 930), of which around 470 words were present in the SenticNet (which could be used for getting the Average Polarity Error needed for weights training). The dictionary size increased from 6 to 930 using a small amount of training data where the polarities are accurate for 78.801% words. The process was repeated for many iterations for training the weights with the objective of reducing the Average Polarity Error. Initially, the Average Polarity Error was around 0.482 and it started to decrease as the iterations succeeded. In the results, the classification accuracy may not go hand in hand with Average Polarity Error in the short run because of the way the accuracy is defined. Accuracy is defined as percentage of positive and negative words correctly predicted with polarity greater and less than zero, respectively. This is the reason Average Polarity Error was chosen as the objective function instead of classification accuracy. But in the long run, an inverse relation between Average Polarity Error and classification accuracy was observed. There has been related work done in which some of the ConceptNet relations were used but what we here do is use all the relations and whatever relation should be chosen are automatically extracted when their weights are trained (a relation with trained weight near zero will not affect the prediction). In the weights training it is seen that some weights moved increased from their initialized weights while others reduced (both for direct and inverse relations). Here, the role played by the weights is that they increase or decrease polarities with a percentage indicated by their magnitude. For example, if words A and B are having weight of ‘IsA’ equal to 0.1, then if polarity of A is Pa then polarity of B, Pb = 1.1*Pa It was noticed (shown in Fig. 3) that some weights increased or decreased rapidly while others steadily. On increasing the number of iterations from 21 until they converge (which means that on an average the weights of corresponding relations do not increase), more accurate results can be observed. Although our results were taken by setting the depth as 2 for computational simplicity, depth can be extended to say 10 and thus form a dictionary containing more than 80,000 words. The Average Polarity Error in the above graph is defined by the formula Average Polarity Error = Difference between the classified sentiment value and the reference sentiment value for all the classified words. The Average Polarity Error starts decreasing with the increasing number of iterations. At the first iteration, the Average Polarity Error is around 0.48 which starts decreasing, and at the 50th iteration, it becomes 0.415 with a little bit of fluctuation in the between.
Building Domain-Specific Sentiment Lexicon Using Random …
203
Accuracy = correctly classified words/total classified words (correct + incorrect classifications). The mean accuracy (shown in Fig. 4) in the whole process increases with respect to the initial accuracy with initial weights.
Fig. 3 Graph showing the calculated Average Polarity Error at each iteration
Fig. 4 Graph showing the calculated accuracy at each iteration
204
M. Jain et al.
5 Conclusion and Future Scope In the proposed work, we build a domain-specific sentiment analysis using commonsense knowledge. This method utilizes a common-sense knowledge base, i.e., ConceptNet to generate a semantic graph. Based on the graphs, label nodes propagate their value to unlabeled nodes. The final results after conducting experiments show that the proposed method of generating a domain-specific lexicon provides the most accurate polarity. In the future work, to strengthen the method’s reliability, many more resources like WordNet and BabelNet can be utilized. A more efficient label propagation method can be used in future for better propagation. In addition, fuzzy logic or any other soft computing method can be used for providing weightage to relations of the network in future also. The fuzzification of ConceptNet will provide better results.
References 1. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval 2(1–2) 2. Havasi C, Speer R, Alonso J (2007) ConceptNet 3: a flexible, multilingual semantic network for common sense knowledge. In: Recent advances in natural language processing, pp 27–29. John Benjamins, Philadelphia, PA 3. Sebastiani AEF (2006) SENTIWORDNET: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th conference on language resources and evaluation (LREC), pp 417–422 4. Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: 35th Annual meeting of the association for computational linguistics and the 8th conference of the European chapter of the association for computational linguistics, proceedings of the conference, pp 174–181 5. Sharma SS, Dutta G (2021) SentiDraw: using star ratings of reviews to develop domain specific sentiment lexicon for polarity determination. Inf Process Manage 58(1):102412 6. Chauhan GS, Meena, YK (2020) Domsent: domain-specific aspect term extraction in aspectbased sentiment analysis. In: Smart systems and IoT: innovations in computing, pp 103–109. Springer, Singapore 7. Abulaish M, Fazil M, Zaki MJ (2022) Domain-specific keyword extraction using joint modeling of local and global contextual semantics. ACM Trans Knowl Discov Data (TKDD) 16(4):1–30 8. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA 9. Kamps J, Marx M, Mokken RJ, Rijke MD (2004) Using wordnet to measure semantic orientation of adjectives. In National Institute, pp 1115–1118 10. Tsai ACR, Wu CE, Tsai RTH, Jen Hsu JY (2013) Building a concept-level sentiment dictionary based on commonsense knowledge. IEEE Int Syst 28(2):22–30 48 11. Tai YJ, Kao HY (2013) Automatic domain-specific sentiment lexicon generation with label propagation. In: iiWAS, ACM 53–62 49 12. Huang S, Niu Z, Shi C (2014) Automatic construction of domain-specific sentiment lexicon based on constrained label propagation. Knowl Based Syst 56:191–20050
An Optimized Path Selection Algorithm for the Minimum Number of Turns in Path Planning Using a Modified A-Star Algorithm Narayan Kumar and Amit Kumar
Abstract Robots are becoming more prevalent in everyday life. The most difficult aspect of using a robot is defining its path. Path planning describes the movement of the robot from the starting point to the goal point. This research provides an improved A-star method for a mobile robot’s path-planning finding capabilities in challenging maps. The improved A-star algorithm makes use of the A-star algorithm’s properties. First, the grid surface model is built, and the modified A-star algorithm’s evaluation function is calculated. Second, the acquired multiple least-distance paths were further analyzed using a global minimum number of turns approach, which can speed up convergence and smoothen the global path. In the combinatorial selection of the optimum path from the set of the shortest paths, the path with the minimum diversion of the robot from the initial to the final position is used. It enables the robot to travel the greatest distance possible in a single spin. The straight path enables the robot to jump and cover the most amount of ground in a single leap while keeping a consistent pace. MATLAB was used to run the experiments on a specific grid area. Keywords A-star algorithm · Path planning · Combinatorial selection · Minimum diversion
1 Introduction Robots play an important role in our daily lives. It makes a variety of tasks easier and more precise to execute. Robots can function in both static and dynamic environments by mapping their work areas. Airports, hospitals, post offices, and train platforms are just a few of the places where robots can be used. Companies use robots to do N. Kumar Department of Mechanical Engineering, Muzaffarpur Institute of Technology, Muzaffarpur, India e-mail: [email protected] A. Kumar (B) Department of Mechanical Engineering, National Institute of Technology, Patna, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_18
205
206
N. Kumar and A. Kumar
specific tasks such as welding, material handling, joining machine parts, and so on. Handling potentially hazardous substances, fires, and toxic gaseous environments is just a few examples of extraordinary scenarios where robots are the only viable solution. Robots are being used in research to investigate harsh environments and the availability of rare minerals on asteroids, deep beneath the oceans, satellites, and in space. Path planning is a typical problem that involves determining the shortest path between two vertices, or start and finish points, in a given space. The route depicts the cheapest path from the starting location to the desired condition. Finding the shortest path between two points, on the other hand, is frequently insufficient. People may need to identify the shortest path between two or more points. This type of problem includes the traveling salesman problem [1], in which the path must pass through every vertex in the graph, vehicle routing problems [2], in which the vehicle delivers items to several predetermined places, and path planning for tourist sites [3]. These kinds of problems are classified as multi-point path-planning problems. Depending on the available environmental parameters, path-planning algorithms are further divided into global path planning and local path planning. Global path planning, which seeks the optimum path given a significant amount of environmental data, works effectively when the environment is static and the robot fully comprehends it. As a result, static path planning is also known as global path planning. Local path planning, often referred to as dynamic path planning, is more frequently used in unforeseen or uncertain situations. In static environments such as storage and logistics, path planning requires the robot to be able to see its surroundings. Based on an understanding of the barriers present in the static environment, a modified A-star algorithm for shortest path evaluation was offered. An effective variant of the A-star method is achieved in this research. It makes use of the A-star algorithm’s evaluation function to increase the algorithm’s heuristic information, which speeds up the convergence speed during the search. Simultaneously, the bending suppression algorithm is introduced to increase heuristic information, with the goal of optimizing path smoothness. For the planner surface, a surveillance camera-assisted obstacle identifier approach could be used. The surveillance system’s job is to monitor and detect both stationary and moving obstacles. We evaluate the effectiveness of the proposed algorithm technique in terms of calculations, memory utility, and area coverage in this study. The contribution of this paper is as: • Fast calculations as it calculates from the start and end nodes simultaneously. • Finding the path with a minimum number of turns among the paths that have the same minimum distance. • The robot turns 45° angle in the optimum path instead of taking 90° turn. This paper is organized as follows: Sect. 2 discusses previous work in the field of algorithms. The altered Sect. 3 presents A-star algorithm for path planning. Section 4 discusses the algorithm for a minimum number of turns. Section 5 goes over the experiments and results under various map settings. Section 6 closes and explores potential future research avenues.
An Optimized Path Selection Algorithm for the Minimum Number …
207
2 Related Work Path planning is a critical subject in mobile robot development. Its primary goal is to find an optimal, collision-free, and smooth path from the beginning node to the target node in an obstacle-filled environment [4]. Traditional path planning and intelligent path planning are two types of mobile robot path planning based on the level of intelligence in the path-planning process. Traditional path-planning algorithms include simulated annealing [5], potential function theory [6], fuzzy logic [7], and others. However, traditional methods cannot be improved in terms of path search efficiency and path optimization. Intelligent path-planning algorithms including Ant Colony Optimization [8], genetic algorithm [9], neural network [10], particle swarm algorithm [11], and so on have been studied. An upgraded variant of the traditional A-star method for mobile robot path planning that employs the three tactics of expansion distance, bidirectional search, and smoothing [12]. In terms of path smoothness and path-planning efficiency, this algorithm was able to overcome the constraints of the traditional A-star approach. To ensure that there is room between the path and the obstacles, the distance is originally determined for each obstacle on the map. Second, to speed up path design, a bidirectional searching algorithm is added to the traditional A-star method. Third, the course’s angle turns have been streamlined and improved. Utilizing a model of rectangular impediments and node selection methods, the turning point on the path is reduced, the shortest path is chosen, and interior robot operation is made as simple as possible. The algorithm is practical, according to the outcomes of the simulation. There are still some improvements that might be made, such as using the object’s corner instead of a rectangular one. Incorrect corner selection for intricate objects, on the other hand, will result in the introduction of extra inflection points [13]. Mobile robot path-planning technology is one of the most exciting fields of artificial intelligence. The ultimate goal is to determine the optimal path for mobile robots to take when navigating obstacles from one site to another. Finally, the effectiveness of the A-star algorithm is evaluated using a MATLAB simulation experiment [14]. The heuristic function directs the A-star algorithm while it seeks the shortest path. A-star algorithm uses the heuristic function [15] to assess the cost of the path before planning it. The A-star algorithm searches ten times faster than the Bee algorithm in a space with no barriers. In a hard scenario with impediments, the Bee algorithm outperforms the A-star algorithm significantly. The Bee algorithm [16] has an advantage over the A-star algorithm in that it can search swiftly both locally and internationally. Because the disclosed cost map strategies are orthogonal to one another, they can all be used independently. They also apply to planners other than lattice planners [17]. Lattice-based cost maps are simple to implement, inexpensive, and straightforward to maintain. The A-star algorithm must design the shortest path on the grid, despite the fact that it must traverse around path nodes and choose the least expensive choice. As a result, the approach requires more calculations [18] and takes longer, and its effectiveness decreases as the map’s scale increases.
208
N. Kumar and A. Kumar
Numerous research successes in relation to the A-star algorithm have been produced, with an emphasis on the method’s effectiveness and robustness. The efficiency of the A-star algorithm has been the topic of extensive academic research. The efficiency issue of the algorithm is addressed utilizing an A-star optimization method with two modifications. The evaluation function is initially weighted to boost the credibility of the heuristic function. Second, a node set in the maps that is focused on a certain point is constructed. When there are barriers in the node set, the node is labeled as a “Avoidable node” [19] and is not searched. These enhancements are made to the A-star algorithm in order to maximize its effectiveness; nevertheless, the technique also increases the algorithm’s processing demands. In a practical application, it optimizes the selection of various criteria and numerous conditions for the Dijkstra algorithm. The optimization technique combines numerous influence elements to generate a new weight matrix [20] as a new requirement for Dijkstra’s path selection, increasing search efficiency and meeting people’s diverse path selection needs. An innovative method for combining fuzzy algorithms and cell decomposition. Fuzzy algorithms [21] are widely used in path planning, which uses sensors to find barriers in front of the robot. This program searches for and detects obstacles in the cell deconstruction process. The robot can reach the desired point by avoiding obstacles with the algorithm, but it takes some time. As a result, a second algorithm is needed to solve the problem. Using a mix of these techniques, the robot can travel more quickly to the destination point by avoiding obstacles in changing environmental conditions.
3 A-Star Algorithm and Modifications On the basis of the Dijkstra algorithm, A-star method is designed to eliminate blind search and enhance search efficiency. The algorithm is utilized as heuristic information for path searching to improve the algorithm’s convergence speed and obtain a better path. The estimated function f (n) expresses the heuristic cost of the A-star algorithm, and it is as follows: f (n) = g(n) + h(n),
(1)
where g(n) is the shortest distance between the source and current nodes and h(n) is the shortest distance between the current and destination nodes (Fig. 1). g(n) = h(n) =
/ (n x − sx )2 + (n y − s y )2 , /
(dx − n x )2 + (d y − n y )2 ,
where Start node(sx , s y ), Current node(n x , n y ), Destination node(dx , d y ).
(2) (3)
An Optimized Path Selection Algorithm for the Minimum Number …
209
Initial start node ‘n’ and put it on open list
Calculate cost function f(n) = g(n) + h(n)
Remove from open list and put on closed and save the index of the node ‘n’ which has the smallest ‘f’
If ‘n’ is the target sector
No
Yes Terminate, and use the pointers of index to get the optimal path.
Detect all the successor sectors of ‘x’ which not exist on closed list.
Calculate cost function ‘f’ for each sector
Fig. 1 Flow chart of A-star algorithm
This method is often used in path planning. It uses the best search to discover the cheapest route from a given start node to a given destination node. The pointer traverses the map in a priority queue of contiguous track sectors, adhering to the track with the lowest known heuristic cost [22]. This strategy allows the robot to explore a different path after reaching a dead end and avoiding paths that lead there. For this, two lists are created, one closed and one open [23]. A closed list and an open list are the fundamental building pieces of the A-star algorithm. In contrast to an open list, which is used to record adjacent sectors to those already calculated, calculate the distances moved from the initial sector with distances to the target sector, and also save the parent sector of each sector, a closed list is used to write and save tested and evaluated sectors. The final phase of the algorithm uses these parents to plan the route to the objective. In this paper, the modified A-star algorithm has proposed in which there is no change in the calculation of g(n) and h(n), but the distance of h(n) is viewed from the destination node. In this process, single node movement from the start node and destination node is calculated alternatively. The possible movement is shown in
210
N. Kumar and A. Kumar
Fig. 2 All possible movements
Fig. 2. Two wavefronts, one from the start node and the other from the destination node, are formed and matched somewhere in between the start and destination nodes. The advantage of the modified A-star algorithm is clearly distinguished in finding the minimum distance paths having the minimum number of turns.
4 Algorithm for Minimum Number of Turns Once the wavefront from the start node and the destination node is matched, the multiple same minimum path lengths can be traced. Out of those paths, the single optimized path is the critical issue that is solved by this algorithm. This algorithm returns the best path that has minimum distance and the minimum number of turns. This work with the heuristic information to reduce bending times and cumulative bending angle (Fig. 3). The up-front that starts from the start node and the down-front that start from the destination node match in between. The last nodes of wavefronts are still in the open list. The sum of the open list cell values gives the indication of the number of minimum paths with minimum distance. Further, if the open up-front and downfront match diagonally, then the movement of the pointer for the next node in both directions is diagonal until the possibility lies. If the open up-front and down-front match horizontally or vertically, then the movement of the pointer for the next node in both directions is horizontal or vertical, respectively, until the possibility lies.
Fig. 3 a Initial path-planning problem b up-front and down-front c best path in arrow d colors for different purposes
An Optimized Path Selection Algorithm for the Minimum Number …
211
Fig. 4 a Initial path-planning problem b up-front and down-front c colors for different purposes
5 Experimental Result Figure 4 shows a 15*15 grid work floor with obstacles, a start node, and a goal node. The original position of the robot is used as the starting point and marked as an open cell. Using the modified A-star algorithm, these open cells begin to travel to the next location. Because the earlier cells are marked as closed, there is no way to move backward. This procedure is repeated until the goal is not met. After computing all of the cells from start to finish, the algorithm begins looking for the shortest path with the fewest turns. When there are numerous shortest pathways, determining which one is the best is exceedingly challenging for a robot. The experiment is run on MATLAB software to determine the shortest path. It returns the five shortest pathways in the previously provided working floor grid. Figures 5a–d depict the five shortest pathways (e). The algorithm assists the robot in determining the best course. The grid is counted as distance in this procedure. A horizontal or vertical movement equals one unit distance, while a diagonal movement equals the square root of two units. Each shortest path has a unique orientation and set of obstacles. Now, the algorithm instructs the robot to select the path with the fewest number of turns. Another criterion is that if two such pathways have the same minimal number of turns, the algorithm chooses the one that is computed first. Figure 4b is the optimal course for the robot in this case.
6 Conclusion and Future Scope The up-front and down-front are generated simultaneously, so the pre-installed webcams are needed. This can be seen as a limitation or an advantage. Limitation, if the system does not have webcams for the surveillance system, then it should be installed, or an advantage, if the system has pre-installed webcams for the surveillance, then no extra webcams are needed. The path with the fewest turnings helps the robot to maintain its trip continuity. Robots should use the path that has the
212
N. Kumar and A. Kumar
Fig. 5 All possible shortest paths with the number of turns
fewest turns. With less turning, the robot may lower base motor activity, saving time and energy. In path-planning issues, the algorithm returns a large number of pathways that travel the shortest distance. This paper chooses the best ideal path among those with the same minimum distance between the start and goal points. The essential criterion for the solution is the least number of bots turns. This paper contributes significantly to the advancement of the A-star algorithm in complicated maps for mobile robots, particularly in terms of shortest path length and overall least number of turns. As a heuristic function, the estimated function of the modified A-star algorithm is employed to increase search efficiency and path smoothness. In future research, the proposed approach could be combined with various path-planning algorithms in a dynamic context. Service robots, rescue robots, and industrial robots are just some of the applications that can be introduced.
References 1. Larranaga P, Kuijpers CMH, Murga RH, Inza I, Dizdarevic S (1999) Genetic algorithms for the travelling salesman problem: a review of representations and operators. Artif Intell Rev 13:129–170 2. Wassan NA, Nagy G (2014) Vehicle routing problem with deliveries and pickups: modelling issues and meta-heuristics solution approaches. Int J Transp 2(1):95–110 3. Damos MA, Zhu J, Li W, Hassan A, Khalifa E (2021) A novel urban tourism path planning approach based on a multiobjective genetic algorithm. ISPRS Int J Geo-Inf 10:530 4. Zhou Z, Nie Y, Min G (2013) Enhanced ant colony optimization algorithm for global path planning of mobile robots. In: 5th International conference on computational and information sciences, pp 698–701 5. Miao H, Tian YC (2013) Dynamic robot path planning using an enhanced simulated annealing approach. Appl Math Comput, pp 420–437
An Optimized Path Selection Algorithm for the Minimum Number …
213
6. Cetin O, Yilmaz G (2014) Sigmoid limiting functions and potential field based autonomous air refueling path planning for UAVs. J Intell Robot Syst, pp 797–810 7. Bakdi A, Hentout A, Boutami H, Maoudj A, Hachour O, Bouzouia B (2016) Optimal path planning and execution for mobile robots using genetic algorithm and adaptive fuzzy-logic control. Robot Auton Syst, pp 95–109 8. Wang P, Lin HT, Wang TS (2016) An improved ant colony system algorithm 307 for solving the IP traceback problem. Elsevier Science Inc., pp 172–187 9. Lin D, Shen B, Liu Y, Alsaadi FE, Alsaedi A, Cheng H (2017) Genetic algorithm-based compliant robot path planning: an improved Bi-RRT-based initialization method. Assembly Autom, pp 261–270 10. He W, Chen Y, Yin Z (2016) Adaptive neural network control of an uncertain robot with full-stat e constraints. IEEE Trans Cybern, pp 620–629 11. Song B, Wang Z, Zou L (2016) On global smooth path planning for mobile robots using a novel multimodal delayed PSO algorithm. Cogn Comput, pp 5–17 12. Wang H et al (2022) The EBS-A* algorithm: an improved A* algorithm for path planning 13. Sun Y et al (2013) Research on path planning algorithm of indoor mobile robot research on path planning algorithm of indoor mobile robot. In: International conference on mechatronic sciences, electric engineering and computer (MEC) 14. Zhi C et al (2019) Research on path planning of mobile robot based on A* algorithm. Int J Eng Res Technol (IJERT) 15. Yang JM et al (2015) Path planning on satellite images for unmanned surface vehicles. Int J Nav Archit Ocean Eng, pp 87–99 16. Sabri AN (2018) A study on bee algorithm and A* algorithm for pathfinding in games. Universiti Teknologi Malaysia, Skudai, Johor (IEEE) 17. Ferguson D, Likhachev M (2008) Efficiently using cost maps for planning complex Maneuver. Intel Research Pittsburgh 18. Wang ZQ, Hu XG, Li X, Du ZQ (2018) Overview of global path planning algorithms for mobile robots. Comput Sci, pp 9–29 19. Qingji G, Yongsheng Y, Dandan H (2005) Feasible path search and optimization based on an improved A* algorithm. China Civ Aviat Coll J, pp 42–44 20. Zhou M, Gao N (2019) Research on optimal path based on Dijkstra algorithms. In: 3rd International conference on mechatronics engineering and information technology (ICMEIT). 21. Iswanto I, Wahyunggoro O, Cahyadi AI (2016) Quadrotor path planning based on modified fuzzy cell decomposition algorithm. TELKOMNIKA, pp 655–664 22. Long S, Gong D, Dai X, Zhang Z (2019) Mobile robot path planning based on ant colony algorithm with A* heuristic method. Front Neurorobot, pp 13–15 23. Zidane I, Ibrahim KAK (2018) Wavefront and A-star algorithms for mobile robot path planning. Adv Intell Syst Comput, p 973
Predicting Corresponding Ratings from Goodreads Book Reviews Abhigya Verma, Nandini Baliyan, Pooja Gera, and Shweta Singhal
Abstract With the enhancement of the online world, people tend to trust and mould their opinions with respect to the text provided on commercial and social websites. The product or service reviews provided over the internet enable a new customer to form an opinion regarding the quality, and at the same time, the numeric rating provides a foundation for a quick decision on whether to accept the product or not. In this paper, we have proposed the state-of-the-art classification and Recurrent Neural Network (RNN) models along with the TF-IDF vectorizer, that predicts the numeric rating on a scale of 0–5 associated with a text review of a book on the Goodreads website. We have used Decision Tree, K-nearest neighbours, Logistic Regression, Gradient Boosting and Random Forest classifier and analysed their performance through accuracy, F1-score, recall and precision. The accuracies for the models were obtained as 37.29% for Decision Tree, 47.72% for Gradient Boosting, 46.05% for Random Forest, 23.41% for KNN with the best accuracy obtained as 49.50% from the Logistic Regression model. A more better result is obtained by implementation of the Recurrent Neural Network model on the set with the LSTM layer with an accuracy of 53.61%. The implementation result is helpful for book readers to determine their opinion about essence of a book. Keywords TF-IDF vectorizer · Recurrent Neural Network · Gradient Boosting · K Neighbours’ Regressor · RNN · Logistic Regression
IGDTUW A. Verma (B) · N. Baliyan · P. Gera · S. Singhal Indira Gandhi Delhi Technical University, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_19
215
216
A. Verma et al.
1 Introduction The enhancement of the digital world along with the availability of diversified resources often baffles a user in making a choice whether to book a hotel, buy a product, or invest their time in reading a particular book. Product and service details with essential information such as the description, seller, dimensions, raw-material, price are available on various cataloguing and commercial websites such as the Amazon, MakeMyTrip, Goodreads, and many more. Apart from these basic details from suppliers, there are reviews and ratings from the users who have availed and used the product before and share their experiences and personal opinion about the product. A new user/customer relies on these reviews to have an impression of the quality of the product and decides whether to go for it or discard it. For example, a review by a previous reader on a book helps a new buyer in forming an opinion about how is the book, which genre it focuses if it is better than its prequel, etc. There are times when a book may have hundreds of reviews and many of them too descriptive, that a person may not have time to go through them. Therefore, a single-digit numeric rating provides quick brief info regarding the product quality or if the book is ‘worth reading’ or a ‘waste of time’. In this work, we emphasised predicting the numeric rating on a scale of 0–5 through the textual reviews of books provided by users on the Goodreads website. Goodreads is a popular book cataloguing website which provides a reader with necessary details about a book from the author, publisher, and price to other readers’ reviews and ratings. We have focused on analysing and characterising review statements to get a final opinion in the form of a rating which could help a future reader in quick judgement and also suggests the owner an improvement. The idea of opinion mining and recommender systems is not new and has been employed since the 1990s [8]. The most favoured algorithm for the purpose is the Naive Bayes classifier. Improved embedding and vectorization techniques are being added to algorithms to increase the algorithm’s efficiency. Deep learning methodologies are also taking shape in this problem statement since the last decade. We use five classification algorithms: Logistic Regression, Decision Tree, Random Forest, Gradient Boosting and K-nearest neighbours along with a Neural Network-Recurrent Neural Networks. The TF-IDF vectorization is used for data extraction to convert the word corpus into a numeric representation. Through this study, we propose the following novel contributions: • We worked on a selected subset of 8997 records from a massive real 9,00,000 record dataset available to us. The study is itself unique on a book dataset representing a real-world problem of extracting a rating for consumer benefit. • Our study explores various state-of-the-art algorithms and presents the existing issue of review rating as a classification problem. • We present a parallel between state-of-the-art algorithms and Recurrent Neural Network model revealing relatively better performance of RNN on textual dataset.
Predicting Corresponding Ratings from Goodreads Book Reviews
217
The rest of the paper is organised in the following sequence: In the first section, we highlight the literature study on previous works of textual review categorization into numeric ratings and similar methodologies as implemented by us. In the following section, we have preprocessed the reviews to obtain clean text, free of punctuation marks, and stopwords. We applied the TF-IDF vectorization for data extraction to convert the word corpus into a numeric representation. Finally, we implemented supervised classification algorithms for prediction and compared their results to get the best accuracy of 49.5% for Logistic Regression. RNN, a class of Artificial Neural Networks, is also implemented with its accuracy and loss histories.
2 Literature Review Hossain et al. [9] proposed in 2021, the prediction of online products’ rating from customer textual reviews using supervised machine learning algorithms with TFIDF vectorizer. Out of XGBoost, Random Forest Classifier, and Logistic Regression algorithms, they received promising results with the Random Forest algorithm with an accuracy of 94% and F1-scores, recall, and precision all equal to 0.94 only. The most common algorithm used for rating prediction in the last two decades has been the Naive Bayes classifier because of its quick handling of multi-class predictions with relatively less training data. Haji et al. [7] proposed a method of combining the machine learning approach and lexicon-based sentiment analysis on text reviews of a restaurant dataset. First, they used the Naive Bayes algorithm for text classification and then employed the Lexicon approach that utilises a verbal dictionary with scores assigned to words, to the textual reviews to compute the polarity of reviews and decide if the review is negative or positive. With this, they could increase the accuracy of Naive Bayes by 5–10%. Various models employing the use of Recurrent Neural Networks (RNNs) for review sentiment analysis and prediction have been proposed with good results as by Verma et al. [18]. Their work computes vector representation of product reviews using a Gated Recurrent Neural Network (GRNN), that further classifies the sentiment label in a ranking range of 1–5. They uses Long Short-Term Memory (LSTM), a variant of RNN to process the textual sentences to a fixed length vector, and GRNN to capture the interdependencies existing between sentences of a review. Gezici et al. [5] have also proposed an RNN model to automatically analyse the reviews and predict ratings with a twofold contribution. They presented a deep learning model to predict user ratings to outright inconsistent and insufficient ratings given by users, followed by implementation and evaluation of model with an accuracy score of 87.61% on metric data of mobile OSS applications from Google Play market. Bartosh [2] explains the use of machine learning algorithms in automated text categorization (Table 1).
218
A. Verma et al.
Table 1 Comparative analysis between prior research works Paper Description
Remarks
[20]
They propose automated text grouping and categorisation machine learning models to group digitally available papers
This study is relevant to ours in terms of categorising a textual data into different categories
[1]
They approach review rating as multi-class classification on Yelp hotel dataset
We follow the same approach but on a book review dataset
[13]
They used the bag-of-opinions model to generalise the cumulative linear offset model (CLO) followed by regression technique to predict numeric rating from the text of a user’s product review
Our work approaches the problem as the implementation of the classification models and the Recurrent Neural Network technique to predict the relevant rating score
[15]
They studied automatic text summaries to correctly predict ratings associated with a movie review. Their motive was to evaluate review summaries up to compression of 10–50% and predict the correct rating instead of utilising the complete text
We also focused on predicting ratings, but on vectorised review texts instead of compressed texts
[12]
The authors provide an extensive review of 150 deep learning-based models for classification of text over 40 popular datasets with their relative performances
We look to this paper for understanding the use of neural models in text classification
[6]
This work presents both regression and classification models We also used the same to predict ratings automatically for given aspects of a textual approach of classification review, while each evaluated aspect is assigned a numerical towards our problem score
[19]
Their approach for rating prediction involves an auxiliary LSTM layer (Long Short-Term Memory) that learns from an auxiliary representation from the classification setting and for the regression setting, joins this auxiliary representation into the main LSTM layer
[16]
They used multinomial Naive Bayes classification algorithm We refer to this research with combination of TF-IDF method and opinion analysis to paper to study and mine data of online book reviews analyse the data preparation and extraction methodology
In our work, we focused on designing machine learning models to automatically predict the user’s rating based on the review text
(continued)
Predicting Corresponding Ratings from Goodreads Book Reviews
219
Table 1 (continued) Paper Description
Remarks
[4]
They focused on predicting a net rating corresponding to a product review on the basis of user opinion about different features of product calculated in review text on a hotel booking dataset. Based on the polarity of the sentences associated with a salient feature of product, their model computes a vector of feature intensities, which is then fed to machine learning algorithms for rating prediction
We refer to this research paper for analysing the methodology for text extraction
[17]
Their work assigned a score to each review using weighted Our work addresses the textual feature method and implemented Sequential Minimal same problem statement Optimization regression on tourism dataset of predicting user ratings from a textual feature. However, we employed classification and RNN algorithms for prediction
[9]
They presented supervised machine learning models of XGBoost, Logistic Regression, and Random Forest classifier with TF-IDF vectorizer for predicting a rating based on user textual reviews
We refer to this paper as their problem statement and methodology is similar to ours
[14]
Their aim was to eliminate influence of reviews with mismatched product ratings by sentiment analysis using deep learning methodology
In contrast, we do not focus on mismatched ratings. Instead, we developed simple models for getting numeric ratings from a review text
[3]
Their research work predicts rating polarity (negative, neutral, or positive) through review texts using machine learning classification models
In our work, we are not concerned about the polarity of reviews but focus on predicting the rating itself
[10]
They comprehensively reviewed deep learning-based rating prediction approaches and presented a systematic taxonomy of deep learning models with articulation of the state-of-the-art
We use this review paper to study about deep learning and Recurrent Neural Networks for implementation
[11]
They proposed an Artificial Neural Network (ANN) model for predicting net rating of books through review count, rating count book popularity and voting on a Goodreads book dataset
Our work holds similarity to theirs in terms of predicting numeric ratings of books. However, we work on textual reviews for prediction
220
A. Verma et al.
3 Methodology We categorizse this section into two subsections; first, we give a brief about our dataset and preprocessing procedure, and in the second subsection, we implement and evaluate machine learning algorithms to predict the numeric ratings from textual reviews. Figure 1 represents our methodology flow.
3.1 Description of Dataset The first step of a study is to understand the data so that the preprocessing and modelling can be done accordingly (Fig. 2). Dataset properties for this study, we used the Book Reviews dataset provided by Goodreads, through Kaggle. The dataset consists of 9 lakh textual reviews on 25,474 different books with details mapping across 11 attributes as depicted in Table 2. We have only used ‘review text’ and ‘ratings’ for our analysis.
Fig. 1 Flow diagram representing our methodology Fig. 2 Rating distribution of dataset depicted through pie chart
Predicting Corresponding Ratings from Goodreads Book Reviews
221
Table 2 Original dataset description Attribute
Description
Data type
user_id
Unique Identification character sequence of a user on Goodreads
String
book_id
Unique number assigned to a book for identification
Integer
review_id
Character sequence assigned to each review for identification
String
rating
Numeric rating from 0 to 5 provided by a user
Integer
review_text
Natural English language review text of a book by a user
Integer
date_added
Date and time when the review was added on Goodreads website by a String user
date_updated Date and time when any user updated a review or rating
String
read_at
Date and time when a user last read a book
String
started_at
Date and time when a user started reading a book
String
n_votes
Number of votes by other users to a review text
n_comments Number of further comments by other users on a review text
Integer Integer
Review text consists of the textual reviews of books as given by users in the English language and rating ranges from integers 1 to 5. We have selected a limited set of reviews from this massive dataset for efficient and fast model building. We took one-hundredth part of each rating set for analysis with a total of 8997 reviews. Data analysis and preprocessing and data analysis are crucial steps for a machine learning algorithm to achieve relatively better accuracy. We have done a basic analysis of the data assembled in previous steps to get the most occurring words and their frequency in the review texts of data. This lets us know the sentiment flow of the dataset as depicted in the bar plot of Fig. 3. Also, we made a word cloud (Fig. 4) to represent the most frequent words in the whole dataset.
Fig. 3 Bar plot of top ten most frequent terms in the review texts of dataset
222
A. Verma et al.
Fig. 4 Wordcloud of most frequent words of review text in dataset
Preprocessing involves the steps to convert the normal English grammatical language of reviews to a selected list of important adjectives and adverbs which are necessary for prediction. We performed some tasks to clean the review texts and convert them into a clean text which is suitable for working with machine learning algorithms. The sequential process depicted involves the basic steps of punctuation marks removal, conversion of whole text to lowercase, and removal of stop words and any mentions or hashtags. The effect of preprocessing steps can be analysed from Table 3. The next most crucial step was feature extraction which we implemented with TFIDF vectorization. Term frequency-inverse document frequency (TF-IDF) describes the importance of a word in a series or corpus to a text. It is the most common algorithm that transforms natural language text into a meaningful numerical representation for machine learning algorithms. TF refers to the frequency of a word in a document which is numerically equal to the ratio of the frequency of occurrence of a word to the total number of words in the document. IDF specifies significance of a word by assigning a weight to it based on its frequency in a corpus. The TF-IDF values depend positively on the frequency of occurrence of a word in the document while negatively on the number of documents in the corpus that contains the word.
3.2 Model Construction In this subsection, we discuss different models constructed by us for prediction.
Predicting Corresponding Ratings from Goodreads Book Reviews
223
Table 3 Sample review texts depicting the effect of preprocessing Before preprocessing
After preprocessing
Ick. Ickity Ick Ick. NOT a fan of this one
Ick ickity ick ick fan one
This book is not for me: 1. So slowwwwww……Lots of words with nothing happening 2. Supposed to be a crime noir within a cyberpunk book. But after reading the first 5 chapters all I saw was cliched crime writing and nothing scifi that seemed original or infused with a sense of wonder. If there is something cool later I’m not willing to slog through these problems to get to it
Book me slowwwwww lots words nothing happening supposed crime noir within cyberpunk book reading first 5 chapters saw cliched crime writing nothing scifi seemed original infused sense wonder something cool later willing slog problems get it
The Shahrzad in the original Arabian Nights was a force to be reckoned with, truly cunning, manipulative and fearless. This Shahrzad was nowhere as impressive. The writing was beautiful but maybe I’m just not a fan of overly Romantic stories. It just failed to impress me
Shahrzad original arabian nights force reckoned with truly cunning manipulative fearless shahrzad nowhere impressive writing beautiful maybe fan overly romantic stories failed impress me
Goes without saying. Incredible
Goes without saying incredible
I mean it’s a solid three star book and it did keep me turning the pages but at the same time there were some problems. idk, maybe i’ll write a review. Guess i shall see how the job applications go today. Until then have a gif of momo holding her breath and waiting for something to happen
Mean solid three-star book keep turning pages time problems maybe write review guess shall see job applications go today gif momo holding breath waiting something happen
Decision Tree The Decision Tree (DT) is a supervised non-parametric learning algorithm which can be implemented for solving both regression and classification problems. In our context, as we need to categorise text into six categories, it is a suitable model. It forms a tree-structured classifier where each internal node represents a column attribute. The edges represent the decision that was taken and the nodes the final predicted value. It starts from the root node which is equivalent to the essential attribute and moves down with splitting and inferring rules from each feature node to the final value. The Gini criterion is used in our model with a max depth of 2 to get the best accuracy of 37%. We got average precision, recall, and F1-score as 0.37, 0.57, and 0.44, respectively. Gradient Boosting (GB) Gradient Boosting is a machine learning algorithm that is known for its prediction speed and accuracy. It works in a level-wise fashion by optimising the result of the previous level at every level. Initially, a model is fitted on the training data, then a second model is fitted to rectify the errors of the previous model. The net result is the combination of all models where each model corrects the output of its predecessor. We obtained the best accuracy of 47.72% with a learning rate of 0.1, maximum depth of 3, and number of estimators set to 100. The model gave 0.42 precision, 0.30 recall score, and 0.31 F1-score.
224
A. Verma et al.
Random Forest (RF) Random Forest is essentially a supervised machine learning algorithm which is based on ensemble learning that can be implemented for regression as well as classification problems. It works on the ‘divide and conquer’ strategy by implementing the Decision Tree algorithm on subsets of data and taking prediction classes on the basis of majority votes. The accuracy score is directly proportional to the number of trees implemented, and therefore, we used 100 estimators and ‘log2’ maximum features for the best split of data. The average recall, precision, and F1-score obtained from the model are 0.28, 0.52, and 0.28, respectively. Logistic Regression Logistic Regression is stated as a supervised classification as well as a statistical model that estimates the probability of occurring of an event based on independent variables. It is helpful in predicting a categorical dependent variable through relationships between continuous independent variables. We used a ‘newton-cg’ solver with the penalty ‘l2’ for modelling our vectorised data and obtained 0.40 precision, 0.31 recall, and 0.32 F1-score. K-nearest neighbours (KNN) K-nearest neighbours is a simple supervised classification and regression algorithm that uses proximity to assign a class to an object of concern. An object is compared with the stored data for similarity and classified on basis of the average distance from a specified ‘k’ number of neighbours that are nearest to it. The ‘manhattan’ metric with five nearest neighbours and distance weights gave us an accuracy of 25%. The model gave precision of 0.19, 0.20 recall, and F1-score of 0.16. Recurrent Neural Network (RNN) RNN is a type of Artificial Neural Network (ANN) commonly used in Natural language Processing (NLP). They are recognised by their memory or more precisely internal states. They are capable of influencing a current input and output from the information taken from a prior output. To implement this model, in addition to the preprocessing used above, we tokenized the review text and padded the text to a maximum length of 100 words, wherein larger texts are truncated to give a training vector of shape 8997 by 100. We implemented a simple RNN model with one embedding, one LSTM and two dense layers. The total trainable parameters were 5,343,204 and unique words extracted were 41,591 in the curated dataset. Figure 5a shows how the model accuracy increases with increase in number of epochs up to a level. A significant increase in accuracy can also be observed with an increase in subset records.
4 Conclusion and Future Scope We have focused on the problem of addressing a numeric rating on a scale of 0–5 through natural language textual reviews to enable a new user to make quick and easy decisions about a book. After preprocessing and feature extraction, we proposed five classification models and an Artificial Neural Network technique. The accuracy of each model is summarised in Table 4. Our work showcases a real problem but with a less accuracy score. This less accuracy can be reasoned with the use of a reduced dataset. We also see that most people have written positive reviews and
Predicting Corresponding Ratings from Goodreads Book Reviews
225
Fig. 5 Learning curve of RNN model depicting accuracy and loss history over the epoch of training and testing dataset
Table 4 Accuracy scores of applied algorithms
Algorithm name
Accuracy score (%)
Logistic Regression
49.50
Decision Tree
37.29
Gradient Boosting
47.72
Random Forest
46.05
KNN
23.41
RNN
53.61
ratings 5 and 4 are dominant. The best accuracy score of 49.50% is obtained by Logistic Regression among state-of-the-art algorithms while the Recurrent Neural Network (RNN) shows better results with 53.61% accuracy. This work can be further implemented with larger datasets to get more accuracy. We present a real-world issue to provide aid to readers for better decisions in book selection. Future studies can refer to it to understand the better performance of neural models in comparison to classification models on textual datasets. The study to predict numeric ratings from reviews can be extended for different products and customer services. This overall rating from user feedback reviews may influence a provider or writer to improve the quality of their product in view of achieving a better rating. Acknowledgements Our deepest gratitude goes out to our research supervisor, Professor Dr Shweta Singhal, for her patient supervision, passionate support, and helpful criticism of this study.
References 1. Asghar N (2016) Yelp dataset challenge: review rating prediction. arXiv preprint arXiv:1605. 05362 2. Bartosh V (2019) Machine learning in automated text categorization
226
A. Verma et al.
3. Budhi GS, Chiong R, Pranata I, Hu Z (2017) Predicting rating polarity through automatic classification of review texts. In: 2017 IEEE conference on big data and analytics (ICBDA). pp 19–24. IEEE 4. De Albornoz JC, Plaza L, Gervás P, Díaz A (2011) A joint model of feature mining and sentiment analysis for product review rating. In: European conference on information retrieval. pp 55–66. Springer 5. Gezici B, Bölücü N, Tarhan A, Can B (2019) Neural sentiment analysis of user reviews to predict user ratings. In: 2019 4th International conference on computer science and engineering (UBMK). pp 629–634. IEEE 6. Gupta N, Di Fabbrizio G, Haffner P (2010) Capturing the stars: predicting ratings for service and product reviews. In: Proceedings of the NAACL HLT 2010 workshop on semantic search, pp 36–43 7. Haji R, Daanyaal K, Deval G, Rushikesh G (2019) Rating prediction based on textual review: machine learning approach lexicon approach and the combined approach. Int Res J Eng Technol (IRJET) 6(3):5437–5443 8. Hasanzadeh S, Fakhrahmad S, Taheri M (2022) Based recommender systems: a proposed rating prediction scheme using word embedding representation of reviews. Comput J 65(2):345–354 9. Hossain MI, Rahman M, Ahmed T, Islam AZMT (2021) Forecast the rating of online products from customer text review based on machine learning algorithms. In: 2021 International conference on information and communication technology for sustainable development (ICICT4SD), pp. 6–10. https://doi.org/10.1109/ICICT4SD50815.2021.9396822 10. Khan ZY, Niu Z, Sandiwarno S, Prince R (2021) Deep learning techniques for rating prediction: a survey of the state-of-the-art. Artif Intell Rev 54(1):95–135 11. Maghari AM, Al-Najjar IA, Al-Laqtah SJ (2021) Books’ rating prediction using just neural network 12. Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning– based text classification: a comprehensive review. ACM Comput Surv (CSUR) 54(3):1–40 13. Qu L, Ifrim G, Weikum G (2010) The bag-of-opinions method for review rating pre diction from sparse text patterns. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010), pp 913–921 14. Reddy NCS, Subhashini V, Rai D, Vittal B, Ganesh S et al (2021) Product rating estimation using machine learning. In: 2021 6th International conference on communication and electronics systems (ICCES), pp 1366–1369. IEEE 15. Saggion H, Lloret E, Palomar M (2012) Can text summaries help predict ratings? A case study of movie reviews. In: International conference on application of natural language to information systems, pp 271–276. Springer 16. Soni D, Madan S et al (2015) An efficient approach to book review mining using data classification. In: Emerging ICT for Bridging the Future-Proceedings of the 49th Annual Convention of the Computer Society of India CSI Volume 2, pp 629–636. Springer 17. Venugopalan M, Nalayini G, Radhakrishnan G, Gupta D (2018) Rating prediction model for reviews using a novel weighted textual feature method. In: Recent findings in intelligent computing techniques, pp 177–190. Springer 18. Verma S, Saini M, Sharan A (2017) Deep sequential model for review rating prediction. In: 2017 Tenth international conference on contemporary computing (IC3), pp 1–6. IEEE 19. Xu J, Yin H, Zhang L, Li S, Zhou G (2017) Review rating with joint classification and regression model. In: National CCF conference on natural language processing and Chinese computing, pp 529–540. Springer 20. Yadav BP, Ghate S, Harshavardhan A, Jhansi G, Kumar KS, Sudarshan E (2020) Text categorization performance examination using machine learning algorithms. In: IOP Conference series: materials science and engineering, vol 981, p 022044. IOP Publishing
A Precise Smart Parking Model with Applied Wireless Sensor Network for Urban Setting Ishu Kumar, Sejal Sahu, Rebanta Chakraborty, Sushruta Mishra, and Vikas Chaudhary
Abstract Nowadays, due to the mishandling of the current parking system, even a basic action like parking now appears to be a time-consuming and laborious operation. Existing parking systems need a large amount of staff to manage and force customers to look for a parking place floor by floor. This paper proposes a smart parking energy management system for a structured environment such as a multistory corporate parking space. The system proposes the use of Internet of Things (IoT) technology in conjunction with modern sensors and controllers to establish a systematic parking system for users. Unoccupied parking spaces are highlighted by lights, and users are steered to them to avoid the need to look for an empty parking place. A central system that accesses occupied parking spots in the cloud directs an approaching automobile to an empty parking place. The entire system is fully automatic, which leads to a reduction in the required manpower and improves the aesthetics of parking lot lighting. The proposed model makes the parking system more time-efficient and provides user comfort. It is accomplished by increasing the mean accuracy of car allotment and providing reduction in average time for the user. Keywords Wireless sensor network · Multi-agent systems · Smart city · Internet of Things · Cloud computing
1 Introduction As demographics and industrialization have grown substantially in recent years, many towns have resorted to advanced technology and networks to assist them manage resource restrictions. Towns, for instance, may increasingly rely on smart I. Kumar · S. Sahu · R. Chakraborty · S. Mishra (B) Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India e-mail: [email protected] V. Chaudhary AI & DS Department, GNIOT, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_20
227
228
I. Kumar et al.
city solutions, a component of the IoT. As a result, it is vital to define IoT and its applications in smart cities [1]. The Internet of Things (IoT) is a network of physical things, gadgets, cars, and buildings that are implanted with semiconductors, applications, IR sensors, and internetworking in order to gather and share data [2]. The Internet of Things allows things to be sensed and controlled remotely using existing infrastructure, enabling further seamless integration of the physical world into computing systems as well as ensuring efficiency, accuracy, and economic gain. Smart cities use IoT devices such as linked sensors, lighting, and meters to gather and analyze data. This information may then be used by authorities to enhance facilities, utility services, and operations, among many other things. The notion of constructing a smart city has become a reality with the growth of the IoT. Smart cities handle two major issues: parking infrastructure and traffic control systems. Finding a parking place in today’s towns is always challenging for drivers, and it is only becoming more problematic as the number of individual automobile users increases. This condition presents an opportunity for smart cities to improve the effectiveness of their parking resources, resulting in faster search times, reduced congestion problems, and fewer road accidents [3]. Parking issues and traffic congestion problems can be avoided if drivers are advised ahead of time about the accessibility of parking spots at and near their destination. Such systems need the installation of effective sensors in parking lots to measure occupancy, as well as speedy data processing units to derive useful insights from data received from multiple sources. Parking system technologies use a wireless sensor network (WSN) for the identification and communication process. Such equipment has shown to be more efficient than technologies such as video monitoring devices, pneumatic tubes, and so on. Existing WSNs in parking systems are primarily concerned with the reservation and distribution of parking spots, as well as their connection with digital payment portals via IoT-based platforms. This analysis’ technique proposes a solution for a considerably larger parking area, taking into account all conceivable factors all the way up to the level of execution. Another key issue with parking spots is security, which is handled with IoT systems by delivering real-time input on the state of the asset being watched, in this instance the user’s car. During our sophomore year in college, we worked as interns in a startup where the company used a shared space in a 14 story building, which had a common parking lot for employees of all companies who rented space in the building. Due to poor parking management, the employees of the startup wasted a lot of billable hours in the process of finding a proper parking space. The manager could not blame them as it was a general issue that seemed to have no solution. During the morning hours, when employees of all the companies usually arrive, it was really difficult for the existing parking management to find empty slots for parking due to congestion. As a result, vehicles had to wait for long hours, and thus work was eventually delayed. This made us realize that even a minute issue like parking may lead to loss for a company and that parking is becoming a big problem in today’s world as the population is increasing every day. Then we thought, if cities are getting smarter, then why not parking? So we decided to propose some methodology to solve this real-world issue.
A Precise Smart Parking Model with Applied Wireless Sensor Network …
229
Main Highlights of the paper are as follows: • The purpose of this paper is to present a model that will improve the efficiency of the parking system while taking users’ comfort and time into consideration. • The proposed concept is a smart parking system that is IoT-based and cloudintegrated which makes it possible for people in far-off places to use the mobile application which offers real-time information on parking space availability in a parking area. • Implementing this model can save up to 46% of the user’s time, which is often spent looking for the proper parking spot. It also offers 14% more accuracy when compared to the conventional model.
2 Related Works The authors in [4] employed an ultrasonic detector-based smart parking system (sps) architecture to arrive at the conclusion that ultrasonic sensors may be used to identify both inappropriate parking and parking spaces. The suggested architecture for a parking detecting system would speed up the process of finding open spots and lessen the likelihood of single vehicles parking incorrectly in two spots at once. In [5], researchers have used a parking occupancy tracker model for smart cities to reach at the destination. The established technique can assist in finding a parking space, especially in densely populated cities or areas where sporting or cultural events are scheduled [6]. The author, S. Wijayaratna has used impacts of on-street parking on road capacity to reach the outcome—investigation on the impact of various on-street parking densities on the four-lane split urban roadways’ capacity for traffic. The study will examine if the impact on capacity is unaffected by changes in traffic characteristics by looking at two distinct mid-sized Indian cities [7]. The authors, G. Yan et al. have used an authentic and intelligent parking prototype to reach the outcome— smart parking is an innovative security/privacy-conscious infrastructure and a parking service application that uses the NOTICE idea and architecture [8]. The authors, J. K. Suhr and H. G. Jung have used sensor fusion-based vacant parking slot detection and tracking to reach at the outcome—a system for detecting and tracking unoccupied parking spaces that combines the sensors of an AVM system and an ultrasonic sensorbased automated parking system [9]. The authors, Khaoula Hassoune, Wafaa Dachry, and Fouad Moutaouakkil have used smart parking systems—a systematic survey to reach at the outcome—a study of several parking schemes which was applied by numerous researchers to alleviate the rising problem of traffic congestion [10]. The authors, Abhirup Khanna, Rishi Anand have used IoT-based smart parking system to reach the outcome—on-site implementation of an IoT module; cloud-based, integrated smart parking system [11]. The authors, M. Karthi and P. Harris, employed smart parking with reservations in a cloud-based environment to achieve the goal of real-time supply of parking spots [12]. The authors, T. N. Pham, M.-F. Tsai, D. B. Nguyen, C.-R. Dow, and D.-J. Deng, employed an Internet of Things network architecture and a cloud-based smart parking system to achieve their goal of creating
230
I. Kumar et al.
a unique algorithm that improves the performance of the existing system [13]. The authors, G. Cookson and B. Pishue, used The Impact of Parking Pain in the US, UK, and Germany to arrive at their conclusion, which is a rich and fertile picture of urban mobility that allows research to produce beneficial and useful insights for policy makers, transportation experts, automakers, and drivers.
3 Proposed Model for Smart Parking We’ve created a concept for a smart parking system in this part, where a user uses a mobile application that shows open spots in green and occupied spaces in red. After selecting a space, the user can complete any necessary payments in accordance with the time for which the space is reserved. The elements in the proposed model as seen in Fig. 1 are discussed below. Database: In a smart parking database, the parking lot needs to run smoothly and conveniently. Each of the networks—internal and external—represented by the application are connected through the database. The database contains information on each automobile reservation, such as the car’s registration number, color, and name of the driver, as well as when the parking space was most recently rented and how many hours were allocated. To act as a backup in case of data loss or destruction, the database must also have a reference copy of the data. This method is used for all vehicles with reserved parking and also involves payment. When someone enters the parking lot, they may access the database individually if they wish to see their personal information [14].
Fig. 1 Proposed model for smart parking
A Precise Smart Parking Model with Applied Wireless Sensor Network …
231
Sensing: Sensors installed in smart parking to track the entrance and exit of vehicles. A computerized system made up of several communicating intelligent agents is referred to as a multi-agent system. The sensors are networked and need a network to function. Devices in parking lots or exits receive the signal from the sensor and transmit it to the database. The data on vacant and occupied positions is updated on the display. Green denotes an open area or position in the parking lot, whereas Red denotes an occupied position. With an Arduino component called Node mcu, sensors may be connected to one another electronically using Arduino. It assists in Wi-Fi connection and maintains a connection to a real-time clock so the user can view the time reserved and the hour of departure from places. This is related to the program that displays a warning when a reserved position is about to expire on a screen displaying open and reserved positions. Then, it is connected to Arduino so that it may monitor the locations’ state and constantly update them. Cloud: Data processing and archiving for the parking service are handled via the cloud. It keeps a lot of data regarding the parking that is available and occupied, as well as the entry and leave times. Additionally, it reveals the position of the parking and updates the cloud when a new car is parked or when one is towed out. Server and Database: When someone wishes to book a position for a specific time period, the server uses the database to access the cloud for details about open positions and positions that have already been filled. This information is then displayed to the customer, who can then look for a position that is right for him. The server establishes a connection with the database of all customers who have made mobile phone reservations, logs their entrance and exit timings, and then sends a notification to the cloud stating that this client has chosen a position and established the entry and leave times. The location is then reserved by the server by sending a signal to the screen. Application: One may use the mobile application to reserve a parking spot or look for openings near to the destination. Most programs require the Internet to function properly, and users may reserve parking spaces using a computer, laptop, or mobile phone after researching the finest places. The user receives a notification from the program informing them of the position’s location, parking restrictions, and payment details.
4 Results and Analysis The simulator creates an agent-based model that represents the number of cars and various attributes like average distance, time, and accuracy, showing the comparison between the above proposed model and existing traditional model [15]. The technical requirements for creating a simulation for the smart parking system are as follows—Operating System: Windows 7/8/10, The front end employs HTML5, JavaScript, jQuery, AJAX, and CSS, while the application server is Apache Tomcat
232
I. Kumar et al.
Table 1 Simulation outcome of the proposed model Number of cars
Average distance (m)
Time (min)
Proposed
Proposed
Traditional
Accuracy (%) Traditional
Proposed
Traditional
50
57
46
7
10
99.2
98.4
100
133
192
30
33
88.3
86.6
150
239
314
44
65
67
62
200
255
447
49
91
61
44
250
351
632
53
121
59
31
300
407
782
61
138
53
19
71.25
56.83
Mean
7.0.67 running on Eclipse with the STS Plugin for IDE. APIs and frameworks: RESTful web services, such as MySQL, Spring and Hibernate, Internet Explorer, Firefox, or Chrome as the browser, an Intel Core 2 Duo or higher processor, and 2 GB of RAM [16–18]. The suggested model has a greater mean accuracy than the conventional model, as shown in Table 1. The process being automated allows better accuracy even if the number of cars are increased. Additionally, since real-time updates to the data and the user is shown the most temporally and spatially efficient slot, the average time per vehicle is decreased. In the traditional model, empty parking spaces had to be manually located, a time-consuming and error-prone process, which accounts for the disparity between the two models. On the other hand, in the above proposed model, empty slots are detected by the sensors and are updated automatically in the database which is shown to the clients simultaneously. Consequently, accuracy will improve and time and effort will be saved. Moreover, the application reduces the average distance covered by displaying the nearest open space to the incoming user in advance. Further online payment prevents long queues and waiting time at the toll booth. Figure 2 shows the computational resource needs in terms of memory storage and energy required to perform processing. It is observed that the proposed smart parking model outperforms the existing conventional approach. A memory and energy need of 7.8 and 8.4 units are the outcome obtained.
5 Benefits and Future Scope of the Model Some primary advantages offered by the model are as follows: • Friction is decreased and the experience is enhanced when a customer or visitor can immediately identify a location. • It offers rich data sets that may be utilized to spot trends, peak periods, and other metrics for forecasting and reporting.
A Precise Smart Parking Model with Applied Wireless Sensor Network …
233
16 14
13.6 11.5
Metric values
12 10
8.4
7.8
8
Traditional Model (units)
6
Proposed Model (units)
4 2 0 ENERGY NEED
MEMORY NEED Resource
Fig. 2 Resource requirement analysis of the proposed model
• Knowing exactly where to go helps drivers save wasted travel and idle time, which improves traffic flow in congested locations. • Automation reduces expenditures in parking meters or parking inspectors. The future challenges of Smart Parking System • Improved infrastructure and public transportation flow are required to enable the cloud-based parking environment. • The absence of dynamic mechanisms to direct traffic to its end destination, parking.
6 Limitations of the Proposed Study The limitations of the proposed system are as follows: • Construction and installation costs are comparatively greater. • The organization must do various routine maintenance tasks. • A certain amount of construction expertise is required.
7 Conclusion We have examined the topic of parking in this analysis and have provided a cloudintegrated, IoT-based smart system for parking. The proposed approach provides real-time information about the number of available parking spots in a parking area.
234
I. Kumar et al.
The mobile application might be used by remote users to book a parking place for them. The purpose of this research is to enhance a city’s parking alternatives and, as a result, its citizens’ level of living. The model’s implementation can help users save up to 46% of the time they typically spend looking for the right parking spot. Additionally, it gives 14% greater accuracy than the traditional model. This will lead to an increase in accuracy and a reduction in time and effort.
References 1. Mishra S, Jena L, Tripathy HK, Gaber T (2022) Prioritized and predictive intelligence of things enabled waste management model in smart and sustainable environment. PLoS ONE 17(8):e0272383 2. Tripathy HK, Mishra S, Suman S, Nayyar A, Sahoo KS (2022) Smart COVID-shield: an IoT driven reliable and automated prototype model for COVID-19 symptoms tracking. Computing 1–22 3. Suman S, Mishra S, Sahoo KS, Nayyar A (2022) Vision navigator: a smart and intelligent obstacle recognition model for visually impaired users. Mobile Inf Syst 4. Cookson G, Pishue B (2017) The impact of parking pain in the US, UK and Germany. Hg. v. INRIX Research, vol 21. Online verfugbar Unter http://inrix.com/research/parking-pain/zul etztgeprüftam 5. Mahmud K, Gope K, Chowdhury SMR (2012) Possible causes & solutions of traffic jam and their impact on the economy of Dhaka city. J Manag Sustain 2:112 6. Silar J, Ruzicka J, Belinova Z, Langr M, Hluboka K (2018) Smart parking in the smart city application. In: 2018 Smart city symposium Prague (SCSP), pp 1–5 7. Lookmuang R, Nambut K, Usanavasin S (2018) Smart parking using IoT technology. In: 2018 5th International conference on business and industrial research (ICBIR), pp 1–6 8. Singh P, Gupta K (2016) Intelligent parking management system using RFID. In: Advances in intelligent systems and computing. Proceedings of fifth international conference on soft computing for problem solving, p 497505 9. Bagula A, Castelli L, Zennaro M (2015) On the design of smart parking networks in the smart cities: an optimal sensor placement model. Sensors 15(7):15443–15467 10. Sonar R, Nahata P, Ajmera T, Saitwal N, Jain S (2017) Automatic underground car parking system. Int J Mod Trends Eng Res 4(5):6468 11. Kianpi Sheh A, Mustaffa N, Limtrairut P, Keikhosrokiani P (2012) Smart parking system (sps) architecture using ultrasonic detector. Int J Softw Eng Appl 6(3):55–58 12. Grodi R, Rawat DB, Rios-Gutierrez F (2016) Smart parking: parking occupancy monitoring and visualization system for smart cities. In: SoutheastCon, pp 1–5 13. Wijayaratna S (2015) Impacts of on-street parking on road capacity. Australasian transport research forum, pp 1–15 14. Sivani T, Mishra S (2022) Wearable devices: evolution and usage in remote patient monitoring system. In: Connected e-health. Springer, Cham, pp 311–332 15. Mohapatra SK, Mishra S, Tripathy HK, Alkhayyat A (2022) A sustainable data-driven energy consumption assessment model for building infrastructures in resource constraint environment. Sustain Energy Technol Assess 53:102697 16. Sahoo PK, Mishra S, Panigrahi R, Bhoi AK, Barsocchi P (2022) An improvised deeplearning-based mask R-CNN model for laryngeal cancer detection using CT images. Sensors 22(22):8834 17. Mishra S, Thakkar HK, Singh P, Sharma G (2022) A decisive metaheuristic attribute selector enabled combined unsupervised-supervised model for chronic disease risk assessment. Comput Intell Neurosci
A Precise Smart Parking Model with Applied Wireless Sensor Network …
235
18. Mohanty A, Mishra S (2022) A comprehensive study of explainable artificial intelligence in healthcare. In: Augmented intelligence in healthcare: a pragmatic and integrated analysis. Springer, Singapore, pp 475–502
An Ensemble Learning Approach for Detection of COVID-19 Using Chest X-Ray Aritra Nandi, Shivam Yadav, Asmita Hobisyashi, Arghyadeep Ghosh, Sushruta Mishra, and Vikas Chaudhary
Abstract COVID-19 has led to unwanted and serious consequences from the very time of its spread. Economical India has faced a lot since COVID started spreading, and it has persisted to have terrible outcomes affecting human life all over the world. People have lost lives due to a lack of proper methods and their inefficiency in detecting whether a person is affected by the deadly virus or not with utmost accuracy often leading to mistakes in results. It has become a necessity to find out a quick and dependable means to diagnose the presence of COVID-19 virus in patients in order to deliver better and more prompt treatment amenities and to strongly fight the unwanted disease transmission. To avoid this and to find out a more effective method, COVID can be detected with the help of chest X-ray images. One of the most useful methods to accomplish this is through radiometric evaluation, a chest X-ray being the easily available and cheapest-priced alternative. New technologies hold a very crucial role in the detection and prevention of COVID-19 with Data Science being a crucial one to help with chest X-rays. This paper deals with an ensemble approach to COVID-19 detection. Multiple deep learning techniques, ResNetV2, VGG 16, and InceptionV3 were combined, and using the boosting process, the results were finalized to detect COVID through the chest X-ray images, and finally, the one providing the best-suited accuracy was considered. Looking at the results, we can claim that this is one of the best techniques for the detection of COVID. Further research in this approach can really bring sustainable outcomes and help humanity. Keywords COVID-19 · Healthcare · Machine learning · Ensemble learning · Deep learning
A. Nandi · S. Yadav · A. Hobisyashi · A. Ghosh · S. Mishra (B) Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneshwar, India e-mail: [email protected] V. Chaudhary AI & DS Department, GNIOT, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_21
237
238
A. Nandi et al.
1 Introduction Coronavirus disease has fully changed the ecosystem in this two years. The world has been affected a lot. Everyone has seen major changes in the past years starting from lockdown and economic crisis. The disease is Zoonotic which means that it came from animals and has affected humans. The sickness was first seen in a town called Wuhan which is in China. USA was the first country to hit by this coronavirus and has seen very harsh phases in the entire past two years [1]. India is also one of the countries which hit badly by the COVID. The first case was found on December 2019, and since then, it has been a really tough phase for India. In the entire span of COVID, India has seen major COVID waves with a large volume of people getting trapped. India being a not so developed country had poor hospital facilities and medical systems. There were number of cases where peoples were not able to get oxygen cylinders and hospital beds. Not only physically but also economically India was challenged a several times by this wave. These numbers are something to fear [2]. In 2020, the richest 1% of the population would own 42.5% of total income, while the poorest 50% would hold only 2.5%. India’s poor are anticipated to more than triple after the outbreak, while the middle class is anticipated to have a one-third reduction. The largest private survey conducted in India, the “Consumer Pyramids Household Survey” (CPHS), shows that per capita spending decreased by an average of GDP and did not return to pre-social isolation levels during these times. All schools and other educational institutions, restaurants, hotels, stores, theaters, gyms, fitness centers, and houses of worship have been shuttered in India, despite slight regional variations in the level of lockdown restrictions based on the total COVID-19 events in that area. They noted in a recent letter piece that school closures, in particular, may have been extremely harmful to young people, emphasizing the critical need to treat mental health concerns in Indian adolescents. To our knowledge, no such deliberate attempts have been made. We present fresh data from a small cohort of Indian young people. Figure 1 displays the number of COVID-19 cases by age group in India. The major test that was carried out and still getting carried out is the RT-PCR test. By RT-PCR test, people were able to confirm about COVID [3]. There were also times when RT-PCR kit was not available because the volume of person getting infected by COVID was huge in number. This test can also be used with individual anterior nasal swab specimens. The nose or throat is the most common places on the body where the COVID-19 virus can be found. A variety of chemical processes are used to treat the sample, removing components like proteins and lipids and extracting only the RNA that is present. This extracted RNA includes the person’s genetic material as well as, if any viral RNA is present [4].
An Ensemble Learning Approach for Detection of COVID-19 Using …
239
Fig. 1 COVID-19 cases in India, as per age
2 Literature Survey Jain et al. [5] have approached the most effective deep learning method, which offers insightful analysis for examining many chest X-ray pictures that is crucial for COVID-19 screening. In his study, both with the COVID-19 and healthy people had their chest X-ray scans seen from the PA perspective. Alazab et al. [6] have examined the global incidence of COVID-19 distribution and also 4 described a deep convolutional neural network (CNN)-based artificial intelligence method to recognize COVID-19 sufferer in actual datasets. To find these patients, his algorithm looks at X-ray images of the chest. Yang et al. [7] have used the best models to meet the problem of binary classification for COVID-19 CT-scan, he used four strong pre-trained CNN models: VGG16, DenseNet121, ResNet50, and ResNet152. The binary and multi-class classification of X-ray picture problems were finished with the help of the improved VGG16 deep transfer learning architecture. The upgraded VGG16 detected COVID-19 and pneumonia X-ray pictures with best accuracy of 99%. Dong et al. [8] observed the imaging properties and computational models used for the management of COVID-19. For identification, treatment, and follow-up, doctors have used lung ultrasonography, CT, MRI, positron emission tomography/ CT, and computed tomography (CT). Artificial intelligence (AI)-based quantitative imaging data analysis is also researched. Ni et al. [9] have observed by drawing blood from COVID-19 sufferer who had just been declared uninfected and released. In eight individuals just released patients, we were able to recognize SARS-CoV-2-specific humoral and cellular immunity investigation of the second cohort of six individuals in future. Additionally, significant titers of immunoglobulin G (IgG) antibodies were found 2 weeks after discharge. Alhayani et al. [10] mentioned that in order to evaluate
240
A. Nandi et al.
Fig. 2 Normal chest X-ray versus COVID infected X-ray
and prioritize COVID-19 cases, machine learning can be included in programs and strategies for health providers. With a testing accuracy of 92.9%, supervised learning outperformed alternative unsupervised learning algorithms.
3 Data Preparation The dataset which was used to train the model was of X-ray of lungs. Our database contained a total of 125,000 (approx.) images of which around 60,000 were COVID positive X-rays, and 65,000 were of normal X-rays. 75% of the total data was used in training the model, and from the other 25% remaining, 20% was used as a test set and 5% for validating the model. The image size was generalized and reduced to the dimension of 224 × 224 pixels using the OpenCV library. This was done in order to reduce the computational resources required for processing and to make the model more robust and less susceptible to overfitting. As the model is not unduly focused on minute details in the original image, adopting a lower resolution image also enables the model to generalize to new images better. In the end, this enhances the model’s overall functionality and accuracy. Figure 2 shows the normal chest X-ray and COVID affected X-ray.
4 Proposed Work Using Ensemble Learning Figure 3 depicts the proposed ensemble learning for COVID detection. Ensemble learning approaches aggregate inputs from different learning models to make more accurate and better conclusions [11]. The main reasons for errors in learning models are randomness, variation, and bias. Ensemble approaches serve to minimize the elements that might lead to errors, assuring the reliability and accuracy of machine learning (ML) techniques [12]. Further, ensemble learning uses involve providing a level of confidence [13]. It emphasizes the categorization of ensemble learning
An Ensemble Learning Approach for Detection of COVID-19 Using …
241
applications. We can further divide this ensemble learning into two approaches, namely boosting and bagging [14]. As a result, numerous models are combined, which minimizes variation since the average forecast given by different models is far more dependable and resilient than a single component or decision tree. Boosting is a continuous ensemble strategy that modifies an observation’s weight depending on its most recent categorization [15, 16]. If an observation is mistakenly categorized, “boosting” raises the weight of the data and vice versa. Boosting techniques provide improved prediction models by reducing bias mistakes. Data engineers use the boosting ensemble approach to train the very first boosting algorithm on a complete dataset and then create successive methods by modeling residuals from the initial boosting algorithm, giving more weight to observations that the prior model predicted incorrectly [17]. Figure 4 shows the architecture diagram of ensemble learning.
Fig. 3 Proposed work using ensemble learning
Fig. 4 Ensemble learning
242
A. Nandi et al.
5 Explanation and Results The final stage is outcomes; at this point, our model is deemed ready for use in realworld scenarios [18]. After completion, the model is able to reach its own judgment based on the datasets and training it received. This stage is crucial since it provides the final outcome of the entire model. The dataset used consists of both COVID and non-COVID images. The accuracy obtained by the models VGG, InceptionV3, ResNet50, and Xception is 94%, 97%, 83%, and 92%, respectively. InceptionV3 is chosen as the final model due to its highest accuracy and lowest in the top 5 and 1% error. InceptionV3 has a higher efficiency and is computationally less intensive. Due to its deeper network and high speed, it is the best model suited for this purpose. Table 1 shows error rate of different models in ensemble learning prototype. Figure 5 is a sample output of the test set images using InceptionV3 model. This model gives a 97.56% chance of COVID on the left image of the chest X-ray and a 96.66% chance of non-COVID for the right image of the X-ray. Due to high accuracy, this model is most reliable to predict whether a person is having COVID by its X-ray images. Figure 5 shows the sample output, i.e., COVID or non-COVID from test set images. Table 1 Comparing error rates of different model
Model
Top-5 error (in %)
Top-1 error (in %)
ResNet 50
25.5
6.8
VGG 16
22.5
5.9
InceptionV3
17.78
4.2
Xception
19.33
5.3
Fig. 5 Sample output of test set images
An Ensemble Learning Approach for Detection of COVID-19 Using …
243
• Ensemble learning gave a new lead to the system model and benefited in multiple ways. • The model has the capability of giving the best output based on the decision parameters it takes. • As the decision-making process has a predictability rate above 96% in the case of normal and 97% in the case of COVID, the prototype can be successfully operated in public.
6 Conclusion The functioning prototype of the entire concept may be developed further and carried through to a successful operational system. The model was created applying the most powerful ensemble learning techniques. Looking at the findings, we may conclude that InceptionV3 correctly predicts the outcome. In a later stage of this concept, a reliable picture extraction technique that is extremely unique to the type of image data captured can be included. It can be extremely useful when validating data and arranging it in the best possible format prior to modeling. It will eventually improve prototype accuracy. The goal of this research is to provide a quick and reliable means to diagnose the presence of the COVID-19 virus in patients, to deliver better and more prompt treatment amenities, and to help prevent the spread of the disease. Given that RT-PCR is the most widely accepted method of COVID detection, this methodology with such a high success rate in predicting outcomes will help to advance the technology. It has the potential to reduce the enormous amount of RT-PCR kits required on a daily basis. Once the functioning prototype of the idea is live and accessible via a Web application, it will be usable and accessible to everyone. The suggested methodology offers a quick and accurate way to diagnose COVID-19 in patients, which has the potential to have a large influence on the healthcare sector. This may result in quicker access to treatment resources and ultimately aid in halting the spread of the illness. The suggested paradigm might potentially be applied in related sectors, like remote patient monitoring and the early diagnosis of other viral diseases. Additionally, the suggested approach might be incorporated into current hospital systems and may lessen the requirement for RT-PCR testing, aiding in resource conservation.
References 1. Sahoo PK, Mishra S, Panigrahi R, Bhoi AK, Barsocchi P (2022) An improvised deeplearning-based mask R-CNN model for laryngeal cancer detection using CT images. Sensors 22(22):8834 2. Mishra S, Thakkar HK, Singh P, Sharma G (2022) A decisive metaheuristic attribute selector enabled combined unsupervised-supervised model for chronic disease risk assessment. Comput Intell Neurosci
244
A. Nandi et al.
3. Mohanty A, Mishra S (2022) A comprehensive study of explainable artificial intelligence in healthcare. In: Augmented intelligence in healthcare: a pragmatic and integrated analysis. Springer, Singapore, pp 475–502 4. Yang W, Yan F (2020) Patients with RT-PCR-confirmed COVID-19 and normal chest CT. Radiology 295(2):E3–E3 5. Jain R, Gupta M, Taneja S, Hemanth DJ (2021) Deep learning based detection and analysis of COVID-19 on chest X-ray images. Appl Intell 51(3):1690–1700 6. Alazab M, Awajan A, Mesleh A, Abraham A, Jatana V, Alhyari S (2020) COVID-19 prediction and detection using deep learning. Int J Comput Inf Syst Ind Manag Appl 12(June):168–181 7. Yang D, Martinez C, Visuña L, Khandhar H, Bhatt C, Carretero J (2021) Detection and analysis of COVID-19 in medical images using deep learning techniques. Sci Rep 11(1):1–13 8. Dong D, Tang Z, Wang S, Hui H, Gong L, Lu Y et al (2020) The role of imaging in the detection and management of COVID-19: a review. IEEE Rev Biomed Eng 14:16–29 9. Ni L, Ye F, Cheng ML, Feng Y, Deng YQ, Zhao H et al (2020) Detection of SARS-CoV2-specific humoral and cellular immunity in COVID-19 convalescent individuals. Immunity 52(6), 971–977 10. Kwekha-Rashid AS, Abduljabbar HN, Alhayani B (2021) Coronavirus disease (COVID-19) cases analysis using machine-learning applications. Appl Nanosci 1–13 11. Yadav M, Perumal M, Srinivas M (2020) Analysis on novel coronavirus (COVID-19) using machine learning methods. Chaos Solitons Fractals 139:110050 12. Syeda HB, Syed M, Sexton KW, Syed S, Begum S, Syed F et al (2021) Role of machine learning techniques to tackle the COVID-19 crisis: systematic review. JMIR Med Inform 9(1), e23811 13. Alqahtani AY, Rajkhan AA (2020) E-learning critical success factors during the covid-19 pandemic: a comprehensive analysis of e-learning managerial perspectives. Educ Sci 10(9):216 14. Sivani T, Mishra S (2022) Wearable devices: evolution and usage in remote patient monitoring system. In: Connected e-health. Springer, Cham, pp 311–332 15. Mohapatra SK, Mishra S, Tripathy HK, Alkhayyat A (2022) A sustainable data-driven energy consumption assessment model for building infrastructures in resource constraint environment. Sustain Energy Technol Assess 53:102697 16. Nayak SR, Nayak DR, Sinha U, Arora V, Pachori RB (2021) Application of deep learning techniques for detection of COVID-19 cases using chest X-ray images: a comprehensive study. Biomed Signal Process Control 64:102365 17. Mishra S, Jena L, Tripathy HK, Gaber T (2022) Prioritized and predictive intelligence of things enabled waste management model in smart and sustainable environment. PLoS ONE 17(8):e0272383 18. Tripathy HK, Mishra S, Suman S, Nayyar A, Sahoo KS (2022) Smart COVID-shield: an IoT driven reliable and automated prototype model for COVID-19 symptoms tracking. Computing 1–22
ECG-Based Cardiac Abnormalities Analysis Using Adaptive Artificial Neural Network Prapti Patra, Vishisht Ved, Sourav Chakraborty, Sushruta Mishra, and Vikas Chaudhary
Abstract Electrocardiogram (ECG) is generally used to detect cardiac risks due to its lucidity and non-invasiveness. During the past decades, several studies have helped to create an automatic ECG-based heartbeat classifier. Our paper will provide an overview of current effective methods for ECG-based automatic heartbeat classification for abnormality detection, by methods like ECG signal preprocessing and beat segmentation practices. Also, the detection of cardiac abnormalities is primarily based on the method used to detect ECG patterns, namely the detection of ECG signal patterns using artificial neural networks (ANNs). In this paper, we also compare the results of several experiments and provide a brief analysis on them and get an accuracy of using ANN which is 91.52%. Finally, we discuss some of the assets and liabilities of the methods, and we also present a conclusion which summarizes our overall work. Keywords Heartbeat classification · Heartbeat segmentation · ECG · Artificial neural network · Arrhythmia classification · Pattern recognition · Beat segmentation · Electrocardiogram
1 Introduction Electrocardiogram (ECG) is a recording instance of electrical happenings of heart. Early detection provides information about heart abnormalities and extends human lifespan. Every beat of heart of an ECG sample is partitioned into three waves, namely P, QRS, and T that represents atrial depolarization, ventricular depolarization, and
P. Patra · V. Ved · S. Chakraborty · S. Mishra (B) Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India e-mail: [email protected] V. Chaudhary AI & DS Department, GNIOT, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_22
245
246
P. Patra et al.
ventricular repolarization, respectively. Analysis of large numbers of waves and depolarization and repolarization normal vectors provides important diagnostic information. On the other hand, the electrocardiogram does not directly evaluate the contractile force of the heart, but it can comprehensively show the decline and flow of the contractile force of the heart. Weems et al. [1] discussed a large variety of research in this topic, and this study mainly focuses on medical application, namely Arrhythmia detection. Moreover, the overview focuses on a pattern detection methodology, i.e., artificial neural networks (ANNs). Arrhythmias may be segregated into binary classes. One of the two groups include ventricular fibrillation (VF) and tachycardia (VT). Both are deadly which needs high octane defibrillation (for VF) or low-energy defibrillation (for VT). It is quite important for automated external defibrillators and implantable defibrillators to accurately distinguish VT and VF from NSR and other non-life-threatening arrhythmias with confident certainty. The second group includes less dangerous scenarios which need treatment to avoid future disease. Few arrhythmias are quite rare and require Holter recordings for successful diagnosis [2, 3]. Many filtering and denoising methods are employed to remove white noise. LMSbased adaptive filtering requires less computation as compared to present widely used ECG denoising methods based on wavelets, EMD, and RLS-based adaptive filtering. However, large step sizes are required by LMS-based adaptive filtering to improve its filter performance. To create medical expert systems, rule-based machine learning algorithms are extensively used in medical applications, one of which is The Knearest neighbor (KNN) rule. Multiple applications use the KNN rule as a sample classification method. Another used technique is support vector machines (SVMs), which are also widely used to classify feature-extracted ECG data in contrast to other machine learning techniques, namely decision tree classifiers, genetic algorithms, and deep learning. This work mainly adopts the concept of artificial neural networks [4–8]. One of the main reasons why we’re using artificial neural networks over other algorithms is because of its ability to make reliable decisions on its own and to deal with non-linear and complex relationships which is quite important to analyze in real life. Artificial neural networks are information processing paradigms inspired by the neuronal system of living organisms, such as the brain that processes the information [9]. Figure 1 shows the biological nerve cell structure where information is passed over to the next nerve cell to reach the destination, i.e., another body part, etc. Whereas figure (b) shows an artificial neuron, where the input signals are carried by x, and the weight coefficient is carried by W. The result hence produced is expressed as Y which is directed to enter into another cell. Main contributions in this paper are as follows: • Objectives—Our objective is to detect cardiac abnormalities using ECG signals and artificial neural network. • Method—This paper makes use of artificial neural network algorithm and beat segmentation method. • Result—On brief analysis, we get 91.52% accuracy using ANN.
ECG-Based Cardiac Abnormalities Analysis Using Adaptive Artificial …
247
Fig. 1 Sample biological neuron structure
2 Literature Review Weems et al. [1] demonstrated the use of artificial neural network as a classifier to identify heart risks abnormalities. It uses ECG signal attribute selection metrics such as spectral entropy and poincare plot. Balasundaram et al. [2] they obtained and analyzed a database of 24 human ventricular arrhythmia tracings from the MITBIH arrhythmia database and wavelet-based features that illustrated the discrimination between the VT, VF, and VT-VF groups. Sansone et al. [3] in their paper provide a detailed discussion on two classifiers, namely artificial neural networks and support vector machines. They review ECG methods from a pattern recognition perspective. Poungponsri and Yu [4] propose a research on adaptive filtering for ECG signal noise minimization using discrete wavelet transform and provide a new approach toward adaptive learning ability of artificial neural network. Harkat et al. [6] present arrhythmia beat classification with the use of continuous wavelet transform for feature extraction and RBF optimized by cuckoo search algorithm via Levy flight. They have optimized the RBF classifier by searching for the best parameter values, hence providing high overall accuracy and sensitivity. Saini et al. [7] classify ten heart diseases by retrieving attributes from raw ECG data and sixth level wavelet transformed ECG signals. Their outcome shows improved accuracy. Inna [9] uses two methods, namely Daubechies wavelet and artificial neural network for the detection of cardiac abnormalities. It classifies the ECG into normal and abnormal for ECG pattern recognition. It carries out several experiments in the training process and shows accuracy using 4 formulas in the identification process. Badr et al. [10] use many diagnosis concepts mainly multilayer perceptron neural network. The retrieved feature vectors are input to a classifier, based on a multilayer perceptron neural network. They utilize ROC analysis and contrast matrix to evaluate the overall correct classification result produced by the ECG-based classifier. Nikan et al. [11] adopt ELM classification to categorize beat segments into five classes, by applying an adaptive segmentation method, on basis of median of R-R
248
P. Patra et al.
intervals. Their result outcome depicts effective accuracy of the novel model in beat classification compared to other approaches.
3 Proposed Model Figure 2 shows the whole diagnostic process of disease in medical practice in a flowchart format. In the first phase of this model, we collect data and create a database for the patient. Next, we perform data preprocessing which includes the removal of unnecessary data or data which is not very useful or can be taken into consideration. Once the data has been processed and modified from the previous phase, we record the heart rate using electrocardiogram (ECG). We take the ECG into consideration and check whether it is severe or not, if not found severe, we simply eliminate it. From here the second phase, that is, diagnostic phase, once we have determined the ECG signal to be severe, we determine the patients as severe and create a separate database for their diagnosis which is done with the help of ANN which is done by neurons similar to the brain of the human body. Once the diagnosis has been done, the final evaluation of the patient is done by the doctors, and the same process continues for the new patient.
Fig. 2 Methodology flowchart for processing of diagnostic disease in ECG practice
ECG-Based Cardiac Abnormalities Analysis Using Adaptive Artificial …
249
Table 1 Accuracy analysis of different supervised learning algorithm Supervised learner
Accuracy (%)
Precision (%)
Recall (%)
F-score (%)
Logistic regression
88.52
89.48
90.2
89.89
Naive Bayes
86.34
86.97
87.5
87.22
Support vector machine
59.79
71.25
73.8
72.67
Artificial neural network
94.52
93.23
94.2
93.78
K-nearest neighbor
88.45
90.77
91.82
91.26
Linear classifier
77.53
82.55
83.89
83.07
Radial basis function
82.76
85.76
87.6
86.55
4 Results and Analysis The MIT-BIH arrhythmia database was applied for the arrhythmia indications, classification methods, and supervised learning algorithms described here. A full crossvalidation was done on the MIT-BIH arrhythmia data to validate the performance of the above formal algorithm, true negative (TN), false negative (FN), true positive (TP), and false positive (FP) [11–13]. MIT-BIH database for arrhythmia analysis is the most widely used and recommended by ANSI/AAMI for medical device validation [14–18]. In our study, it has been used for heartbeat segmentation. Table 1 depicts the performance analysis of various supervised learning algorithm with their accuracy, precision, recall, and F-score values. It is observed that ANN-based classification records the best performance with 94.52% accuracy, 93.23% precision, 94.2% recall, and 93.78% F-score metrics, respectively. Also a latency delay analysis was carried out with different supervised models under consideration. Both training as well as testing delays were evaluated. While SVM showed the maximum latency, ANN model generated the least response time of 11.2 s and 21.5 s training delay and testing delay, respectively. The results are depicted in Fig. 3. Beat segmentation has clarity in performance and shows promising results. The preprocessing step in ECG uses digital filters to filter out various kinds of noise from raw signals.
250
P. Patra et al.
Fig. 3 Latency delay analysis with different supervised classifiers
5 Conclusion This study helps in evaluating beat-by-beat arrhythmia assessment and the technical implementation to train a fine-grained arrhythmia detector on ECG samples. It is computationally effective. It also comes with a benefit of exhibiting low sensitivity to noise. We have performed evaluations on multiple datasets, and the outcome has a notable superiority. This study is fruitful in reducing data annotation by a significant amount, while at the same time, it would improve the preciseness of beat-by-beat arrhythmia detection. The method proposed in our comparative analysis surpasses other reference methods in beat classification accuracy.
References 1. Weems A, Harding M, Choi A (2016) Classification of the ECG signal using artificial neural network. In: Juang J (ed) Proceedings of the 3rd international conference on intelligent technologies and engineering systems (ICITES2014). Lecture notes in electrical engineering, vol 345. Springer, Cham. https://doi.org/10.1007/978-3-319-17314-6_70 2. Balasundaram K, Masse S, Nair K, Farid T, Nanthakumar K, Umapathy K (2011) Waveletbased features for characterizing ventricular arrhythmias in optimizing treatment options. In: Annual international conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society, pp 969–972. https://doi.org/10.1109/ IEMBS.2011.6090219 3. Sansone M, Fusco R, Pepino A, Sansone C (2013) Electrocardiogram pattern recognition and analysis based on artificial neural networks and support vector machines: a review. J Healthc Eng 4(4):465–504. https://doi.org/10.1260/2040-2295.4.4.465 4. Poungponsri S, Yu X-H (2013) An adaptive filtering approach for electrocardiogram (ECG) signal noise reduction using neural networks. Neurocomputing 117:206–213
ECG-Based Cardiac Abnormalities Analysis Using Adaptive Artificial …
251
5. Rahman MZU, Shaik RA, Reddy DVRK (2012) Efficient and simplified adaptive noise cancelers for ECG sensor based remote health monitoring. IEEE Sens J 12(3):566–573 6. Harkat A, Benzid R, Saidi L (2015) Features extraction and classification of ECG beats using CWT combined to RBF neural network optimized by cuckoo search via levy flight. In: 2015 4th international conference on electrical engineering (ICEE) 7. Saini R, Bindal N, Bansal P (2015) Classification of heart diseases from ECG signals using wavelet transform and kNN classifier. In: International conference on computing, communication & automation, pp 1208–1215 8. Mašeti´c Z, Subasi A (2016) Congestive heart failure detection using random forest classifier. Comput Methods Programs Biomed 130:54–64 9. Inna S (2013) Detection of cardiac abnormalities based on ECG pattern recognition using wavelet and artificial neural network. Far East J Math Sci 76:111–122 10. Badr M, Al-Otaibi S, Alturki N, Abir T (2022) Detection of heart arrhythmia on electrocardiogram using artificial neural networks. Computat Intell Neurosci 2022:10, Article ID 1094830. https://doi.org/10.1155/2022/1094830 11. Nikan S, Gwadry-Sridhar F, Bauer M (2017) Pattern recognition application in ECG arrhythmia classification. In: Proceedings of the 10th international joint conference on biomedical engineering systems and technologies, vol 5: HEALTHINF (BIOSTEC 2017), pp 48–56. ISBN 978-989-758-213-4. https://doi.org/10.5220/0006116300480056 12. Mohapatra SK, Mishra S, Tripathy HK, Alkhayyat A (2022) A sustainable data-driven energy consumption assessment model for building infrastructures in resource constraint environment. Sustain Energy Technol Assess 53:102697 13. Mishra S, Jena L, Tripathy HK, Gaber T (2022) Prioritized and predictive intelligence of things enabled waste management model in smart and sustainable environment. PLoS ONE 17(8):e0272383 14. Tripathy HK, Mishra S, Suman S, Nayyar A, Sahoo KS (2022) Smart COVID-shield: an IoT driven reliable and automated prototype model for COVID-19 symptoms tracking. Computing 1–22 15. Suman S, Mishra S, Sahoo KS, Nayyar A (2022) Vision navigator: a smart and intelligent obstacle recognition model for visually impaired users. Mob Inf Syst 16. Sivani T, Mishra S (2022) Wearable devices: evolution and usage in remote patient monitoring system. In: Connected e-health. Springer, Cham, pp 311–332 17. Sahoo PK, Mishra S, Panigrahi R, Bhoi AK, Barsocchi P (2022) An improvised deeplearning-based mask R-CNN model for laryngeal cancer detection using CT images. Sensors 22(22):8834 18. Mohanty A, Mishra S (2022) A comprehensive study of explainable artificial intelligence in healthcare. In: Augmented intelligence in healthcare: a pragmatic and integrated analysis. Springer, Singapore, pp 475–502
A Novel Dataframe Creation and 1D CNN Model for Subject-Independent Emotion Classification from Raw EEG Pooja Manral and K. R. Seeja
Abstract Electroencephalograms (EEG) are different than other signals like speech, posture, and facial expressions and are widely used to recognize emotions. Every person has a different emotional condition due to which it is difficult to work on subject-independent approach. This paper presents a method of subject-independent emotion classification from raw EEG signals. The 1D convolutional neural network can extract the underline features automatically when input is given. Therefore, the raw EEG signals are passed through the 1D convolutional layers. This convolutional layer automatically extracts and these features are then employed in the dense layer which is used to classify signals based on emotions high/low (valence, arousal, dominance and liking/disliking). The classification of EEG signals is done without doing any handcrafted feature extraction. The 1D convolutional architecture is trained and tested on benchmark dataset DEAP. Keywords Electroencephalogram (EEG) · Emotion recognition · Convolutional neural network (CNN)
1 Introduction Emotions are reactions that human beings counter when they are in a situation. Emotions play an essential role in physical health, making decisions in real life, and communication. The type of emotion differs from one person to another. A psychological person not balanced will be less responsive to daily problems. Emotion recognition or classification is a subfield of Affective Computing, in which computer processes signals coming from the human brain and recognizes emotions. P. Manral (B) · K. R. Seeja Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Kashmere Gate, Delhi 110006, India e-mail: [email protected] K. R. Seeja e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_23
253
254
P. Manral and K. R. Seeja
Several researchers have worked on non-physiological signals which are speech, posture, and facial expression, but these are quite subjective and depend on various parameters which makes it difficult to identify the emotions of a person. Physiological signals like Electroencephalogram (EEG), Electrocardiogram (ECG), and magnetoencephalography (MEG) give better results to identify emotions. The EEG signals are measured by placing electrodes all over the scalp at different locations. The electric signals are measured by the difference in voltages between two electrodes which means that EEG is a recording of electric signals over a particular time. The EEG signals are more reliable as this is a non-invasive process. The EEG signals are subject-dependent where emotions are recognized effectively, but subjectindependent emotion recognition is a bit challenging. In subject independent, when we train the model on a group of users but test on a different user (not given training). Forex: When a person is diagnosed with depression and the past data is not available when he was healthy, then this approach can be used. Emotion recognition based on EEG has several applications. For analysis of neurological disorders like stress, depression, and anger. Detection of the driver’s state of mind (sleepy or angry) to improve the safety of the driver. • A 1D CNN is proposed to classify emotions. The input to CNN is purely raw no feature extraction is performed. • The experiments performed are purely subject independent.
2 Related Work Several researchers have applied machine learning algorithms and deep neural networks to recognize emotions using EEG data. For making the model purely subject-independent, researchers are working on cross-subject classification and use the concept of leave one subject out. Anshuman et al. [8] proposed a technique for emotion recognition using Capsule Neural Network for cross-subject. They used a Sparse spatiotemporal frame for a combined representation of spatio and temporal data coming from the brain. The CapsNet model is trained for the DEAP dataset. Bayesian optimization is analyzed for tuning hyperparameters. Li et al. [10] investigated the EEG features in emotion recognition for cross-subject classification. They extracted 18 nonlinear and linear features from the EEG signal to reduce dimension. They implemented the technique of leaving one subject out and worked on two datasets DEAP [9] and SEED [17] using a Support Vector Machine. Yang et al. [20] developed an emotion recognition for the cross-subject. They extracted multiple features and formed high-dimensional features and claimed that this emotion recognition combines the significance of test reverse selection and the Support Vector Machine (ST-SBSSVM). The method is implemented on DEAP [9] and SEED [17] datasets. Fdez et al. [6] suggested an emotion recognition for cross-subject/user by feature normalization method called stratified normalization. Leave one subject out strategy and neural network for classification are employed. They extracted features by using the multitaper, welch, and differential entropy methods. The electrodes are
A Novel Dataframe Creation and 1D CNN Model …
255
mapped to a matrix in order to collect spatio and temporal data coming from the brain. Cho and Hwang [5] suggested a 3D CNN with a spatiotemporal representation of signals. The EEG signals were reconstructed as 1D stacks making it a 2D data frame. The 2D EEG frames are then concatenated to form 3D EEG frames using electrode position to the time axis. They make two 3D CNN models. Some researchers have extracted different frequency bands before giving it to the model. Lin et al. [11] proposed a deep CNN model. The EEG samples are converted into gray images by the six frequency bands, and then, they performed feature extraction on the remaining peripheral physiological signals. They implemented four AlexNet models that are pre-trained for classifying emotions based on valence and arousal. Yang et al. [21] took baseline signals to improve accuracy. They separated EEG signals into four bands of frequency that are beta, theta, alpha, and gamma. They have constructed a 3D input where they transform a 1D feature vector into the 2D plane by an electrode distribution map. They perform all the combinations of all bands and compared them with MLP and DT. Pandey and Seeja [15] implemented a Deep NN for subject-independent emotion recognition on the DEAP [9] dataset. They extracted features with the help of discrete wavelet transform (db8) to detect different frequency band coefficients of the signal. Bao et al. [3] suggested a neural network of two-level domain adaption (TDANN). They separated EEG signals into different bands called theta, delta, beta, gamma, and alpha. The topology is calculated to keep features of differential entropies. Pandey and Seeja [14] suggested a user-independent strategy for recognizing emotions for the DEAP dataset. They extracted different frequency bands using Discrete Wavelet Transform and took four emotion-specific electrodes that were Fp1, Fp2, F3, and F4, and two bands theta and alpha. A multi-layer perceptron is implemented for the classification of emotions. Gupta et al. [7] decomposed EEG signals using wavelet transform which is a flexible analytic. After decomposition, for extracting more features, they employed Information Potential. They used ML algorithms for classification that were Support Vector Machine and Random Forest. They implemented this on SEED [17] and DEAP [9] databases. Cheng et al. [4] suggested an emotion recognition algorithm build on CNN. The EEG signals are made such that to be fed into CNN. Then, they tuned the hyperparameters to increase the accuracy. Mei and Xu [12] proposed a CNN framework. They constructed 4 × 32 × 32 Pearson Correlation coefficients’ matrices for every sample and considered them as the picture. These pictures are fed to a convolutional neural network which extracts the correlation. Pandey and Seeja [16] suggested a user-independent strategy for recognizing emotions for the DEAP [9] dataset. They calculated the first differences between IMFs and power spectral density and find out the peak value of it. The IMFs are derived by EEG signals using EMD and VMD. For classification, they used SVM and DNN. Oh et al. [13] proposed a 1D CNN model to classify signals as abnormal and normal using the TUH EEG database. They designed a DNN of 23 layers including the input having a convolutional layer, dropout, max pooling, batch normalization, and dense layer. They do not perform any handcrafted feature extraction. Alhagry et al. [1] proposed an LSTM to classify emotions as high/low arousal, valence, dominance, and liking. They segmented the videos into 12 segments with
256
P. Manral and K. R. Seeja
a length of 5 s. These segments are fed to the LSTM layer, dropout layer, and dense layer. Tang et al. [18] implemented a Bi-modal LSTM for emotion recognition to take temporal data into account. The LSTM encoders are trained by raw EEG features at each time step. Then, a dropout layer is applied and a Support Vector Machine is used for the classification of emotional states. Xu et al. [19] proposed a 1D CNNLSTM model to recognize epileptic seizures through EEG signals. Before giving EEG signals to the model, it is firstly preprocessed and normalized. The model is implemented on the public UCI epileptic seizure recognition dataset.
3 Materials and Methods The process of emotion classification from EEG signals consists of various steps as shown in Fig. 1. In the first step, EEG data is collected. The EEG signals are then preprocessed and arranged in such a way that they can be fed into the CNN. No handcrafted feature extraction is performed before giving it to the model. The CNN extracts features automatically and these features are given to the classifier to classify emotions (high/low).
3.1 DEAP Database The DEAP is a dataset that is used for the evaluation of human emotions. This dataset consists of EEG recordings of 32 users, every user watched 40 one-minute musical videos, and corresponding EEG signals were recorded. The participants rated the video on a scale of 1–9 for the four emotional states (valence, dominance, arousal, and liking) according to the dimensional model of emotions shown in Fig. 2. The data is recorded by wearing an electrode cap which has 40 electrodes, out of which 32 will collect EEG signals and the rest will collect other physiological peripheral signals. The data is down-sampled at the rate of 128 Hz, preprocessed, and segmented. The file contains 32.dat files. Each user file consists of data and corresponding label. The data is of shape 40 × 40 × 8064 in which the first dimension is video/trial, the second dimension is channel or electrodes, and the last dimension is voltages.
Fig. 1 Block diagram of proposed methodology
A Novel Dataframe Creation and 1D CNN Model …
257
Fig. 2 Valence arousal emotions’ model [1]
The label is of shape 40 × 4 where the first dimension shows the number of a trial performed and the second dimension is the label (valence/arousal/dominance/ liking). According to the dimensional model, if the rating of the valence scale is less than 5 and arousal is less than 5, then the emotion would be Sad and Depressed and will fall on third quadrant. If the valence rating is greater than 5 and arousal is greater than 5, the emotion would be Excited and Happy and falls on the first quadrant and so on.
3.2 Dataframe Preparation The downloaded DEAP data has the shape of 40 × 40 × 8064 (trails/electrodes/ voltages). The first three seconds data for every user is baseline signals which are removed because they may cause unnecessary temporal drifts and thus make a frame of size 40 × 40 × 7680. Out of 40 electrodes, 14 electrodes which are emotionspecific [2] are taken into account and are shown in Table 1. A dataframe is created for each trial per subject. Accordingly, 1280 (40 trials × 32 subjects) dataframes are prepared from the dataset. Each column of the dataframe corresponds to the voltages related to one electrode, and hence, there are 14 columns. The rows of the dataframe represent various time stamps, and hence, each row corresponds to the voltage at a particular time stamp from all the 14 electrodes. The structure of a dataframe is shown in Fig. 3. Thus, there are 7680 rows in the dataframe.
258 Table 1 Emotion-specific electrodes
P. Manral and K. R. Seeja
Channel content
Channel no.
AF3
1
F3
2
F7
3
FC5
4
T7
7
P7
11
O1
13
AF4
17
F4
19
F8
20
FC6
21
T8
25
P8
29
O2
31
Fig. 3 Sample dataframe
3.3 Proposed 1D CNN Model Convolutional neural network is selected for the classification of emotions. A CNN is mostly used in image recognition and signal processing. A CNN has two features: sparse connectivity and weight sharing which decreases the complexity of the model. The DEAP data is nonlinear; therefore, the model can be trained for those nonlinear characteristics by convoluting them. The proposed 1D CNN model is shown in Fig. 4. The proposed 1D CNN model has convolutional, max pooling, average pooling, batch normalization, drop out, and dense layers. The input to CNN is raw EEG of 1280 dataframes of the size of 7680 × 14 and the output neurons correspond to the
A Novel Dataframe Creation and 1D CNN Model …
259
Fig. 4 Proposed CNN architecture
Fig. 5 Convoluting signals with a kernel size of 3 × 14
emotions (high/low). The 1D CNN model learns to extract features from sequences of data. The convolution filter convolves with input over a single dimension. After giving input, the convolutional filter will slide over all the voltages of the input taking their dot product. The nonlinear combination of input voltages weighted by the convolutional filter extracts features from the input data frame as shown in Fig. 5. The labels are first normalized between 0 and 1. Then, they are converted into binary values by using a threshold mid-way between the between the extremes which is 0.5.
4 Implementation The proposed method is executed in Python with the TensorFlow platform. The data from DEAP dataset is prepared by using the methodology in Sect. 3.2 before giving it to the CNN. The proposed 1D CNN consists of 13 layers which are 1D convolution, max pooling, average pooling, dropout, and batch normalization.
260
P. Manral and K. R. Seeja
The first convolutional layer consists of seven filters with a kernel size of 3 × 14, and the dimension of the data becomes seven one-dimensional arrays of size 7680 × 1. After convolutional layer, the max pooling or average pooling is applied to extract the most crucial features and reduce the spatial size of convolved feature [13]. Dropout layers are used to avoid overfitting. Batch normalization is applied to normalize the activation functions used before. The next layer is the flattening layer which converts the features to a single linear vector of size. In the end, fully connected layers are connected. The last dense layer consists of two neurons that are used to classify the emotion into HIGH and LOW. At the output layer, Softmax activation function, Adam optimizer with learning rate 1e−5, and binary cross entropy loss function are used. The model is trained for 100 epochs with a batch size of 128. The model details and parameters are illustrated in Table 2. Table 2 Model specifications
No. Layer
Parameters
Channel no.
1
Input
–
7680 × 14
2
Conv1D + tanh
Units = 7, kernel = 3
7680 × 7 7680 × 7
3
Batch normalization –
4
MaxPool1D
Pool size = 2, stride 3840 × 7 =2
5
Conv1D + tanh
Units = 5, kernel = 3
6
MaxPool1D
Pool size = 2, stride 1920 × 5 =2
7
Conv1D + tanh
Units = 5, kernel = 3
1920 × 5
8
Conv1D + tanh
Units = 5, kernel = 3
1920 × 5
9
AvgPool1D
Pool size = 2, stride 960 × 5 =2
10
Dropout
Rate = 0.5
3840 × 5
960 × 5
11
Flatten
–
4800
12
Dense + tanh
64
64
13
Dropout
Rate = 0.5
64
14
Dense + softmax
2
2
A Novel Dataframe Creation and 1D CNN Model … Table 3 Training and testing data shape for experiment 1
261
Set
Data shape
Label shape
Train
1200 × 7680 × 14
1200 × 2
Test
80 × 7680 × 14
80 2
5 Results and Discussion This study is purely subject-independent, and the training data is different from the testing. The EEG records of DEAP dataset are divided into two phases training and evaluation. Two experiments are performed.
5.1 Experiment 1 In this experiment out of 32 subjects, 30 subjects are used in training, while remaining two subjects are used in testing. Twenty % data is used for validation from training data. The corresponding data shape is shown in Table 3.
5.2 Experiment 2 The leave one subject out (LOSO) concept is used which means that out of 32 subjects in DEAP dataset, one of the subject is used in testing, while the 31 remaining users are used in training. Out of the training data, 10% data was used for validation. The corresponding data shape is shown in Table 4. The proposed model performance for experiment 1 and experiment 2 is evaluated using accuracy, precision, recall, and F 1 -score metrics as presented in Tables 5 and 6. The metrics reported are acquired from test epochs. These measures are defined for all the four emotional states. An explanation of these metrics with their respected equations are given below where the abbreviations used in the equations are false positive (FP), true positive (TP), false negative (FN), and false positive (FP). Accuracy It is the ratio of correctly classified data instances over the total number of data instances. The formula is shown in Eq. (1). Table 4 Training and testing data shape for experiment 2
Set
Data shape
Label shape
Train
1240 × 7680 × 14
1240 × 2
Test
40 × 7680 × 14
40 × 2
262
P. Manral and K. R. Seeja
Table 5 Performance measures for experiment 1 Measures
Valence
Arousal
Dominance
Liking
Accuracy
58.75
66.25
61.25
65
Precision
61.53
71.15
74
64.93
Recall
71.11
75.51
67.85
98.03
F1-score
65.97
73.26
71.02
78.12
Table 6 Performance measures for experiment 2 Valence (%)
Measures
Arousal (%)
Dominance (%)
Liking (%)
Best case accuracy
70
77
77
75
Average case accuracy with standard deviation
52.4 ± 6.69
58.31 ± 11.36
57.39 ± 8.26
61.18 ± 9.79
Average case precision
53.06
59.12
58.53
64.31
Average case recall
64.06
80.46
70.2
84.59
Average case F 1 -score
57.21
66.21
62.45
72.39
Accuracy(% ) =
TN + TP × 100. TN + FP + TP + FN
(1)
Precision It is the ratio of correctly classified as positive out of all positives. The formula is shown in Eq. (2). Precision(% ) =
TP × 100. FP + TP
(2)
Recall It is the ratio of how much were correctly identified as positive to how much were actually positive. The formula is shown in Eq. (3). Recall(% ) =
TP × 100. FN + TP
(3)
F 1 -Score It combines precision and recall into a single metric by taking their harmonic mean. The formula is shown in Eq. (4). F1 Score(% ) =
2(Precision × Recall) . Precision + Recall
(4)
The accuracy, precision, recall, and F 1 -score values in Table 6 are averaged among all the 32 users. The charts of evaluation metrics for experiment 2 are also shown in (Figs. 6, 7, 8, and 9) for all subjects with respect to the emotional states. The charts provide insight of the proposed model and how efficiently it would perform. The
A Novel Dataframe Creation and 1D CNN Model …
263
Fig. 6 Performance evaluation (valence)
proposed CNN model is compatible to extract features and performed better. The model has less convolutional and dense layers making it a less complex and efficient model. No feature extraction of EEG signals is done before giving it to the model as compared to existing work. The input given is purely raw EEG. The accuracy scores render higher performance of the model as compared to other models presented in Table 7. Most of the researchers who employed leave one subject out concept have taken only valence and arousal classes to claim accuracy. The classification performed and reported in this study is done on all four emotional classes.
Fig. 7 Performance evaluation (arousal)
Fig. 8 Performance evaluation (dominance)
264
P. Manral and K. R. Seeja
Fig. 9 Performance evaluation (liking)
Table 7 Performance comparison (in terms of experiment 2) Study
Method
Valence (%)
Arousal (%)
Dominance (%)
Liking (%)
Jana et al. [8]
STFG + CapsNet
48.219
58.52
60.96
60.95
Proposed
1D CNN
52.4
58.31
57.39
61.18
6 Conclusion In this work, a 1D CNN is suggested for subject-independent emotion classification. The novelty of the proposed model is the arrangement of EEG signal from the emotion-specific electrode into a tensor suitable for inputting the CNN model. The benchmark DEAP dataset is used for evaluating the model. The model classifies the raw EEG signals into high and low valence/arousal/liking/dominance. The model could extract temporal features for classifying the EEG signals automatically. It is found that the proposed CNN model is better than the existing models since the model takes raw EEG as input and performs well during classification. The model reports an average accuracy of 57.22 and best-case accuracy of 74.75% averaged among all the cases for leave one subject out cross-validation.
7 Limitations and Future Scope The proposed work was able to classify emotions, but CNN only captures the temporal information of EEG signals. In the future, transformer models can be built to increase the accuracy of subjectindependent emotion recognition systems.
A Novel Dataframe Creation and 1D CNN Model …
265
References 1. Alhagry S, Fahmy AA, El-Khoribi RA (2017) Emotion recognition based on EEG using LSTM recurrent neural network. Int J Adv Comput Sci Appl IJACSA 8(10). www.ijacsa.thesai.org 2. Al-Qazzaz NK, Sabir MK, Ali S, Ahmad SA, Grammer K (2019) Effective EEG channels for emotion identification over the brain regions using differential evolution algorithm. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society, EMBS, pp 4703–4706. https://doi.org/10.1109/EMBC.2019.8856854 3. Bao G, Zhuang N, Tong L, Yan B, Shu J, Wang L, Zeng Y, Shen Z (2021) Two-level domain adaptation neural network for EEG-based emotion recognition. Front Hum Neurosci 14:620. https://doi.org/10.3389/FNHUM.2020.605246/BIBTEX 4. Cheng C, Wei X, Jian Z (2017) Emotion recognition algorithm based on convolution neural network. In: Proceedings of the 2017 12th international conference on intelligent systems and knowledge engineering, ISKE 2017, Jan 2018, pp 1–5. https://doi.org/10.1109/ISKE.2017.825 8786 5. Cho J, Hwang H (2020) Spatio-temporal representation of an electroencephalogram for emotion recognition using a three-dimensional convolutional neural network. Sensors 20(12):3491. https://doi.org/10.3390/S20123491 6. Fdez J, Guttenberg N, Witkowski O, Pasquali A (2021) Cross-subject EEG-based emotion recognition through neural networks with stratified normalization. Front Neurosci 15:11. https:/ /doi.org/10.3389/FNINS.2021.626277/BIBTEX 7. Gupta V, Chopda MD, Pachori RB (2019) Cross-subject emotion recognition using flexible analytic wavelet transform from EEG signals. IEEE Sens J 19(6):2266–2274. https://doi.org/ 10.1109/JSEN.2018.2883497 8. Jana GC, Sabath A, Agrawal A (2022) Capsule neural networks on spatio-temporal EEG frames for cross-subject emotion recognition. Biomed Signal Process Control 72:103361. https://doi. org/10.1016/J.BSPC.2021.103361 9. Koelstra S, Mühl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) DEAP: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31. https://doi.org/10.1109/T-AFFC.2011.15 10. Li X, Song D, Zhang P, Zhang Y, Hou Y, Hu B (2018) Exploring EEG features in crosssubject emotion recognition. Front Neurosci 12(Mar):162. https://doi.org/10.3389/FNINS. 2018.00162/BIBTEX 11. Lin W, Li C, Sun S (2017) Deep convolutional neural network for emotion recognition using EEG and peripheral physiological signal. Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics), 10667 LNCS, pp 385–394. https://doi.org/10.1007/978-3-319-71589-6_33/TABLES/3 12. Mei H, Xu X (2018) EEG-based emotion classification using convolutional neural network. In: 2017 International conference on security, pattern analysis, and cybernetics, SPAC 2017, Jan 2018, pp 130–135. https://doi.org/10.1109/SPAC.2017.8304263 13. Oh SL, Vicnesh J, Ciaccio EJ, Yuvaraj R, Acharya UR (2019) Deep convolutional neural network model for automated diagnosis of schizophrenia using EEG signals. Appl Sci 9(14):2870. https://doi.org/10.3390/APP9142870 14. Pandey P, Seeja KR (2019a) Emotional state recognition with EEG signals using subject independent approach. Lecture notes on data engineering and communications technologies, vol 16, pp 117–124. https://doi.org/10.1007/978-981-10-7641-1_10/COVER 15. Pandey P, Seeja KR (2019b) Subject-independent emotion detection from EEG signals using deep neural network. Lecture notes in networks and systems, vol 56, pp 41–46. https://doi.org/ 10.1007/978-981-13-2354-6_5/COVER 16. Pandey P, Seeja KR (2022) Subject independent emotion recognition from EEG using VMD and deep learning. J King Saud Univ Comput Inf Sci 34(5):1730–1738. https://doi.org/10. 1016/J.JKSUCI.2019.11.003 17. SEED Dataset (n.d.). Retrieved 4 Oct 2022, from https://bcmi.sjtu.edu.cn/home/seed/
266
P. Manral and K. R. Seeja
18. Tang H, Liu W, Zheng WL, Lu BL (2017) Multimodal emotion recognition using deep neural networks. Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics), 10637 LNCS, pp 811–819. https://doi.org/ 10.1007/978-3-319-70093-9_86/COVER 19. Xu G, Ren T, Chen Y, Che W (2020) A one-dimensional CNN-LSTM model for epileptic seizure recognition using EEG signal analysis. Front Neurosci 14:1253. https://doi.org/10. 3389/FNINS.2020.578126/BIBTEX 20. Yang F, Zhao X, Jiang W, Gao P, Liu G (2019) Multi-method fusion of cross-subject emotion recognition based on high-dimensional EEG features. Front Comput Neurosci 13:53. https:// doi.org/10.3389/FNCOM.2019.00053/BIBTEX 21. Yang Y, Wu Q, Fu Y, Chen X (2018) Continuous convolutional neural network with 3D input for EEG-based emotion recognition. In: Cheng L, Leung A, Ozawa S (eds) Neural information processing. Springer International Publishing, pp 433–443
Generic Recommendation System for Business Process Modeling J L Shreya, Anu Saini, Sunita Kumari, and Astha Jain
Abstract Process modeling helps in the comprehension and reorganization of corporate activity. Manual process modeling takes longer time and is liable for errors. We propose to build and provide a supporting system for people who manage the creation and application of business process models for their organizations. Any person intending to build and get recommendations for business process models can utilize the proposed recommender system, depending upon the kind of organization the person is working for. The core functions of the recommender system are to recommend fragments of a business process model as well as in-built process models with the benefit of creating, storing, and utilizing them for multiple purposes. The recommendations are based on a number of factors such as the number of people who prefer a particular model along with the models’ ratings. Here, we use the basic principle of collaborative filtering. This recommendation system provides a userfriendly interface, with a lucid structure of query interface to help users surf easily. The recommendation system also avoids the risk of misusing users’ personal details since it does not use cookies. Users can freely utilize this recommendation system even if they are novice. This extended work takes into account all of the specifications of a business process model for a range of industries, such as education, software, food, medical, automotive, banking, and so on, thus developing a generic system.
J. L. Shreya Department of Computer Science and Engineering, Dr. SPM International Institute of Information Technology, Naya Raipur, India e-mail: [email protected] A. Saini (B) · S. Kumari Department of Computer Science and Engineering, G. B. Pant DSEU Okhla-1 Campus, DSEU, New Delhi, India e-mail: [email protected] S. Kumari e-mail: [email protected] A. Jain Tata Consultancy Services Limited, Noida, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_24
267
268
J. L. Shreya et al.
Keywords Generic recommendation system · Business process modeling · Collaborative filtering · Process model repository · Social networks
1 Introduction A recommendation system or recommender system is a data-driven filtering system that seeks to determine a person’s preference for a particular item. In commercial applications, such systems are commonly deployed. As we can see, recommendation systems are used in a wide range of applications and are most commonly recognized as playlist generators for music and video services like Netflix, YouTube, and Spotify, product recommenders for Amazon and Flipkart, and content recommenders for social media platforms like Facebook, Instagram, and Twitter. It is important to see the other areas as well where recommender systems can find a unique purpose. Recommending a full-fledged business process model to the management of any enterprise or organization is of paramount importance. Recommending a built-in business process model or assisting the entrepreneurs in creating their own model for effective work is the basis of the recommender system. It is not surprising to note that the survey done by Jannach and Jugovac [1] showed that recommender systems are one of the major reasons for the success of businesses with machine learning and artificial intelligence in practice. An approach and implementation of building an effective and efficient recommendation system had been undertaken in our previously published research work. The two major tasks completed in the earlier work involved resolving the Cold Start Problem and providing Recommendations through Collaborative Filtering approach. However, the proposed system in this work enhances the recommendation system through incorporation of advanced features. It is well understood that a business process model is the analytical illustration of an organization’s business processes. Modeling the processes is a crucial component for effective business process management. To express business process models, a variety of business process modeling methodologies are employed. Business Process Modeling Notation (BPMN), UML Diagrams, Flowcharts, Gantt Charts, Colored Petri-Nets, and Simulation Models are just a few examples. The most basic necessity of every business or organization that governs workflow and efficiency is a business process model. The business management system, which consists of administrative employees, managers, and other staff, studies and monitors a business process model. As a result, we can create business process models for a variety of industries, including educational institutions (e.g., schools, colleges), software, automotive, food, health care, and banking. A sample business process model for admission of students in college is shown in Fig. 1. The activities or business processes are just graphic representations of organizational operations that are incredibly simple to understand for non-business users. Many tools and approaches exist in the market today that assist process modeling (for example, using BPMN), but none of them have suggestion systems for BP modelers. As a result, management will benefit from developing a recommendation system for this purpose. Such recommendation
Generic Recommendation System for Business Process Modeling
269
systems can assist in selecting the most suitable match for the required process, from the set of recommended ones. Any organization or enterprise can grow only when they are able to implement their ideas, strategies, or plans properly and effectively. The enhanced version of the recommendation system developed in this work is based on this idea of helping a business organization to plan their process so that they can implement the right process model as per the requirements. The built system, while recommending the process models, works in five phases of BPM life cycle, viz. Model, Implement, Execute, Monitor, and Optimize. The management can select its preferences from the recommended list of activities for their process modeling, after providing few relevant details such as type of enterprise or organization, department, and other requirements. Based on the preferences chosen by the user, the recommendation system builds the required business process model. For any model, the most appropriate activity will appear first in the recommendations followed by the next best in relation to the previously selected activity. The advantages of employing recommender systems in business process management (BPM) are outlined in the following points: (1) time and effort: Complex Business Process (BP) modeling can be a labor-intensive process, especially for inexperienced users who are unfamiliar with the syntax and recurrent modeling issues. Modeling takes less time and effort when a recommender system is used; (2) model that is prone to errors: When complicated business problems are combined with graphical representations, errors may be introduced that the user’s eyes can overlook. In most circumstances, having a recommender system verifies and validates the models being built, thus removing model bugs as a result. The rest of this paper is structured as follows: Section 2 covers pertinent linked works and discussions, Sect. 3 contains the proposed approach, user flow and algorithms, Sect. 4 provides implementation and results, Sect. 5 gives a comparative analysis of our work with other existing approaches. Section 6 of the paper concludes the work followed by Sect. 7 describing our future goals.
2 Related Work The compelling need of a full-fledged recommendation system for BPM led to many new ideas and proposals with different approaches. Numerous research works exist that describe the various methodologies to implement such a system. Some of the works in this field are described in this section. Strimbei et al. [2] propose the BPMN approach for University Information Systems (UIS). Inside an UIS, there are its own set of operations that must be carefully evaluated and created, for example, Admission, Study Programs and Curriculum, and Student Exchange are the four primary processes defined here. The BPMN method displays flow components, whereas Unified Modeling Language (UML) diagrams emphasize the people engaged in each action. This method also allows for the definition of process dependencies. The Student Roadmap process, for example, is influenced by the Admissions Process, Study Programs, and
270
J. L. Shreya et al.
Fig. 1 Sample business process model for college admission process
Curriculum. This demonstrates that BPMN may be used to capture the unique aspects of educational business operations. Dwivedi et al. [3] present the main concern of having a good recommender system that should help the e-learners learn from the learning resources through digital platforms without delay. Studying that different learners have different ways of learning along with different targets to achieve, Dwivedi et al. made an effort to understand the recommendation of learning sequence which gives the most benefits to the learners because just providing with the resources may not fulfill the actual demands of the e-learners.
Generic Recommendation System for Business Process Modeling
271
In a similar manner, the concept of a hybrid recommender system [4], namely MoodleREC, is used to create courses using the Moodle e-learning platform, especially designed for teachers to ease the e-learning processes. Although recommendation-based applications have already been widely used for the past few years in the entertainment field, education, and the e-commerce sector, there is undoubtedly more to do with it. Somehow, the domain of business process management has not been able to get the exposure to recommendation-based services. There can be various reasons behind it, such as the inability to solve complex problems of real-time scenarios and less efficiency [5]. In another scenario, Jiang et al. [6] developed a newer version of the slope one technique built on the fusion of trusted reliable information and user similarity, which can be used not only in e-commerce recommendation systems, but also in social network and location-based service recommendation systems. And similarly, there are many more works that have worked toward developing or utilizing a recommendation system for one or the other purposes of the users in a variety of fields. It is still in the survey and research directions to find a recommender system that can perform to provide multi-stakeholder recommendations [7]. Therefore, our proposed system is a practical implementation of a generic recommendation system which facilitates its multiple stakeholders to get recommendations of business process models, thus making it a different system from others. The stakeholders in our system belong to areas of education, medical, banking, software, automotive, and food industry.
3 Proposed Approach This section describes the specifications of the recommender system elaborating the functionalities, intended customers, platform, goals, and our proposed system in more detail, followed by users’ flow and proposed algorithms. a. Requirement specifications As the goal is to provide a supporting system to the people who manage the creation and application of business process models for their organizations, it is necessary to understand the below mentioned things. i. End users Any person intending to build and get suggestions of business process models can use this application, depending upon the type of organization the user is working for. ii. Functionality (a) Novice users can create their own Business Process Model that is visually represented in the form of a flowchart as per the functioning of the system. (b) Experienced users can select any of the departments under the organization for which they are interested in developing a business process model.
272
J. L. Shreya et al.
(c) If in case the user cannot find his/her preferable activity in the list of current recommendations, then the user has the authority to include that activity by himself/herself in the model. And then onward, this activity can be found in the list of recommendations for other users with similar interests. (d) All the functionalities mentioned above are open to both registered as well as unregistered users. However, the registered users have the privilege of getting recommendations of complete business process models along with the individual activities. (e) A user who holds a valid account can save and store his/her models in the My Models repository after creating customized models. (f) The recommender system’s major functions are to recommend fragments of a business process model as well as in-built process models with the benefit of developing, storing, and using them for various purposes. The recommendations are made by keeping a count of the number of users who choose a certain model, and the model with the highest count becomes the highly recommended one. Hence, the basic principle of collaborative filtering is applied here. iii. Platform This work has been launched as a web-based application. iv. Technical processes • Frontend development: JavaScript, Hypertext Markup Language (HTML), Cascaded Style Sheets (CSS). • Backend development: Python programming language, Flask (a Pythonbased web framework), SQLite and sqlite3 (for database management), and Visual Studio Code. v. Goals This recommendation system provides a friendly user interface, with a lucid structure of query interface to help users surf easily. Apart from this, the recommendation system avoids the risk of losing users’ credentials as it does not use cookies to store data. End-users are free to utilize this recommendation system even if they are inexperienced. a. Our recommendation system This paper defines a revised recommendation system for business process modeling that enhances the asset of an enterprise or organization by not only recommending the most appropriate activities of the required process model according to their suitability but by also providing complete and in-built business process models to its users [8]. The modeling notation used for our system is flowchart, suiting the environment and programmability of the system.
Generic Recommendation System for Business Process Modeling
273
3.1 Methodology (a) The user can choose his/her preferences in various fields like type of business enterprise, area of expertise, work environment, and other requirements. (b) Based on the preferences listed by the user, the system selects the models from the database by comparing previous selections of other users with similar requirements and suggesting them to the current user; the most appropriate option will appear first in the recommendations. (c) However, if the user is not satisfied with the provided recommendations, he/she is also allowed to create customized models according to their own requirements. (d) The system is capable of providing recommendations for individual activities in the creation of models to assist the user. The models created will be saved and stored and recommended to other users. 3.1.1
Phases
The functional structure and execution of the system’s development can be broken down into the following three phases. We have continued to enhance the recommendation system’s abilities by incorporating major changes in the third phase of development [8]. (a) Phase 1: Resolving the cold-start problem through users’ choices involved a self-learning-based approach for the recommender system. (b) Phase 2: System implementation through collaborative filtering (CF) approach. This technique used for providing recommendations was an integral part of the recommender system. (c) Phase 3: Enhancements through creating and interlinking three kinds of social networks: • A network of a large process model repository. • A network of the users’ history. • A network of the insertion history of the users. On successful advancements by generating the social features in the third phase, the recommender system can now recommend complete and in-built business process models from the process model repository along with the initial services which was carried in first and second phases.
274
J. L. Shreya et al.
3.1.2
Major Contribution to Advancements
The addition of new features, functions, and dataset made the system more advanced than before. The advancements are discussed as follows in this section: (a) Expansion of the dataset • The basic version of the recommendation system provided the facility of recommending business process models in the education field only, i.e., Library Management, Student Admission Process (Online and Offline Modes), and Recruitment. • However, as the newly built recommendation system is generic in nature, various process models belonging to different fields and areas of expertise were explored and relevant information was gathered. • After studying the models, the data obtained have been recorded in excel sheets (in the form of datasets). • The areas taken into consideration apart from the education sector are banking sector, software industry, automotive industry, food industry, and hospitals. (b) User profile creation (a network from the users’ history) • Earlier, the system was simply dynamic in nature. However, it neither asked nor stored the user credentials except the count value that represented the number of users who preferred a particular activity [8]. • Our enhanced version of the previously built recommender system, now has a social network for the users (by generating the user profiles). • This has enabled the new recommendation system to work more efficiently than before. (c) Building a process model repository • A repository storing the complete process models for recommendation purposes has been built for the system. • Using this enhancement, a user not only has the access to the preferred activities for his/her business process model but can also select the most suitable in-built models from the recommendations provided by the system. (d) Lastly, linking of the user history with process model repository and implementing collaborative filtering is done. c. Flowchart Since it is important to understand the overall working flow of the entire recommendation system, we present the user’s flow in the form of a flowchart. The flowchart given in Fig. 2 represents the user flow for the recommender system. A user will be provided an extended version of the recommender system to get the most suitable business process models. The flow is briefly described as follows:
Generic Recommendation System for Business Process Modeling
275
Fig. 2 Flowchart for user flow
(a) A new user can choose to continue with or without registration on the application’s site. The user can explore many sample business process models for various purposes on the web application’s homepage. If the user continues as an unregistered user, then: • The user will be provided recommendations for individual activities assisting in model building. • The user will be able to store the created models in the database or model repository. • No in-built model recommendations will be provided. (b) Below the sample models, options showing the different types of organizations for which the models can be designed are displayed for users to choose from. (c) The user chooses an organization type from a variety of options.
276
J. L. Shreya et al.
(d) A page appears that lists all of the departments that fall under the organization that the user has chosen. (e) If the user wants to choose a different organization, he or she can return to the previous page by clicking on the Home button. (f) By hitting the Home button, the user can return to the previous page and select a different organization. (g) The user chooses a department type from a list of alternatives. (h) A page with two separate but adjacent sections for developing the process model is displayed. (i) If the user wants to change departments, he or she can return to the previous page by clicking the Choose Department button. (j) If the user is satisfied with the currently selected department, he or she can now design his or her own process model. (k) The work area on the left section of the page is where the user can see the building of the required model. Three subsections make up the right section of the page. • The first subsection is a list of all of the recommended activities for this department’s business process model. If the user finds an activity that meets the process model, he or she can click on it, and it will be added as a node to the flowchart being built on the left portion of the page. • The second portion includes a tiny query interface to search for the desired activity. The first subsection displays the results in order of their importance. • If the user is still unable to locate the activity, the last subsection makes it easier for the user to enter it into the system. This activity can be found and added to the process model by searching for it again. (l) As a result, a user can successfully design the desired business process model utilizing our system’s recommendations. However, a registered user will be privileged with more functions and options. After successful login, the user is shown three options to choose from. The user might directly go for model creation, ask for recommendations, or see through the already saved business process models. The recommendations made by the system work by applying a collaborative filtering approach. d. Algorithms Algorithm 1 describes how business process models are saved in our recommender system. The database is initially connected to the system. For each and every business process model, the model score is initially set to 0. The number of people who choose that model is shown by the model score. The id of a registered user is represented by the value in u. If the model is created and saved by a non-registered user, the user id is set to zero; otherwise, the database value for the user id is used. The id of the type of organization selected by the user is represented by the value in x. Similarly, the id of the type of department selected by the user within the organization with id x is shown by the value in y. The user is now ready to start building the model. The flowchart is updated as the user selects a certain activity. When a user clicks the save
Generic Recommendation System for Business Process Modeling
277
button after being satisfied with the model, the saveModel method is called, and the model is saved to the database along with a user id, organization id, and department id. The modelScore is also increased by one. The cold start problem for models is overcome in this fashion since each model can now have a specific score. Algorithm 1 Saving the business process models 1: if userLogIn is true then 2: Set userId = u 3: else 4: userId = 0 5: end if 6: selectOrgId = x 7: selectDeptId = y 8: Start function startModel (x, y) 9: Start function displayActivities (x, y) 10: if select activity = true then 11: activityscore ← activityscore + 1 12: flowchart ← activity 13: else 14: select activity = false 15: end if 16: if model save is true then 17: Start function modelSave (u, x, y) 18: modelscore ← modelscore + 1 19: End function modelSave 20: end if 21: End function displayActivities 22: End function startModel Algorithm 2 Recommending the business process models 1: userId = u 2: selectOrgId = x 3: selectDeptId = y 4: Start function recommendModel (u, x, y) 5: Start function showModel (x, y) (a) Display models from department y and organisation x (b) if selectModel = true then (c) modelscore ← modelscore + 1 (d) else (e) repeat step 5(a) 6: end function showModel 7: end function recommendModel = 0
278
J. L. Shreya et al.
Algorithm 2 is used for recommending the models created by different users. Models are recommended only to the registered users. If a user wishes to access the model recommendations, then he/she has to login first. Firstly, the system is connected with the database. When user chooses the options to see the recommendations, recommendModel function is started. Along with this, showModel(x, y) function is also triggered. This function displays all the models from organization with id x and department with id y, which are fetched and displayed based on the rankings of the model from database. The number of people who choose the model with id = m determines the ranking of any model (id = m). The function modelScore updates the score of the model (id = m) that the user has selected. As a result, collaborative filtering has been successfully accomplished, as the user is now being recommended the models that other users with comparable interests like. The time complexities of the proposed algorithms shall not be greater than O(n), where ‘n’ is the size of the input provided to our recommendation system. Here, input refers to the keywords, either searched or added to the existing repository of business processes/activities.
4 Business Buddy The built application recommends complete in-built business process models to the users. Initially, the user is at the homepage of the system ‘Business Process Recommendation System’ nicknamed as ‘Business Buddy’ where sample Business Process Models are shown. As the user scrolls down further on this page, options for selecting the type of business organization are provided. A user can select the organization type from here to see further options for departments within that organization type. Departments are displayed after the selection of an organization. The next page of Business Buddy displays the listed departments of the selected organization type along with a button to return to the homepage for selecting another organization if the user wishes to switch. The user selects a department from this page. As the user selects a department, the recommended next activity is shown at the right side of the screen and the flowchart of process model will be displayed in the left section of the page. Required activity can be selected from the recommendations or can be searched. A flowchart can be seen in making when the user selects the activities from the displayed recommendations (where the highly recommended one appears at the top). The screen also displays the results of the search for any activity. User can pick the activity and add it to flowchart if it is not present earlier. A registered user must login to continue further if he needs to get the access of all services of Business Buddy. If an unregistered user wishes to get recommendation for Business Process Model, then he must sign up at the site. Upon signing up, a user gets a dashboard where three functionalities are available to the user: • Create models: A new user can create his or her own customized models and store them in their repository for future references.
Generic Recommendation System for Business Process Modeling
279
• Recommend models: This is the most important component of the recommender system wherein in-built business process models are recommended to the users according to their requirements. • My models: It is a kind of repository where models built by the users are stored. Users can anytime get an access to this repository for different purposes. The users have above three options to proceed further. The user can choose any one of them as well as come back to the previous page anytime if he wishes to change his choice. Some departments under the ‘Education’ organization are admission of students, fee collection, library management, and recruitment. As the user selects a particular department, a list of recommended models is displayed, in order of their rankings (based on number of users). The model with the maximum users gets the highest rank and is highly recommended by our system. On choosing the recommended model, the complete and in-built model is shown to the user for viewing. The user may like the model by clicking the Perfect button as shown in Fig. 3. This will increment the quantitative value of the model as one more user is counted here who showed an interest in this model. The recommender system also maintains the user insertion history by maintaining a record of the models earlier created by the users. The users can anytime view these models if they find a need. From time to time, it also updates and informs the respective user about the selections made by other similar users for his model. The registered user is led to the login page when the user logs out from the dashboard. Similarly, the generic recommendation system has business process models for other areas of interest as well. The objective of maintaining different business process models for every sector of the society has been achieved, thus making the recommendation system a generic service-based platform to enable its users to build and obtain business process models for effective use in any organization.
Fig. 3 Viewing the recommended business process model
280
J. L. Shreya et al.
Table 1 Comparative analysis Work
Modeling notation/tools
Core idea and recommendation type
Application area/stakeholders
BPMN approach for UIS [2]
BPMN2, UML
Enterprise modeling
University information systems
LPRS using VLGA [3]
MATLAB
Modified variable length genetic algorithm
Education sector
MoodleRec [4]
Keyword-based queries
Hybrid rec. system
Education sector
Rec. system for BPM [5]
Business process graphs
Offline mining; online recommendations
Business process analysts
E-commerce Modified slope rec. system [6] one algorithm
Collaborative filtering
E-commerce
Rec. system for BPM in education [8]
Flowchart
Collaborative filtering
Education sector
Generic rec. system viz. business buddy
Flowchart
Online recommendations through collaborative filtering
BPM recommendations for all functional areas, such as education, medical, software, food, and automotive sectors
5 Comparative Analysis Table 1 compares the existing solutions with our recommendation system, on various grounds such as the modeling notation used, the type of recommendation, and scalability. Comparative analysis shows that our solution outperforms the existing recommendation systems in terms of scalability in the application domain. It not only covers the education sector, but a lot more than that in the current industry.
6 Conclusion The recommendation approach presented here plays a vital role in exemplifying the fact how a user’s choice affects the other person’s idea of selecting a suitable and reliable business process model. With the increase in the number of users accounting for the same organization or department, the recommender system analyzes the trend and makes a temporary house to store the data so that as soon as a new user arrives, he/ she may get the best suggestion. The accomplishing part of the extended work is the
Generic Recommendation System for Business Process Modeling
281
successful creation of a user-friendly dashboard wherein three major actions can be performed by the user, viz. to create a new and personalized or customized business process model, to get the best recommendations by the recommender system, and thirdly, to store, see, and utilize the user’s models, i.e., the user’s model repository. Apart from this, getting recommendations of complete and in-built models along with the suggestions of individual activities of any process is rather a bigger achievement. The recommender system has been created to store and use large sets of data, where the recommendations made does not confine itself to one industry but to several industries and organizations with differing departments. We hope that the current implementation of the system benefits the society to organize their business work flow in an efficient manner.
7 Future Work We now plan to take up some more challenges by studying the feedback given by the users, i.e., the customers of the system who are using the services of the recommender system. Knowing whether the services of the system are assisting the users, will be a great opportunity to make new amendments. Accordingly, evaluations can be done, and users can be informed to go through a transparent procedure in terms of recommendations. Our future work will focus on testing the system on a good scale to analyze the shortcomings. Furthermore, although we believe to utilize the strengths of a flowchart to build an effective business process model, we still require studying mechanisms for a flowchart to contain all possible alternatives during action. We aim at reaching the targets through more exploration and collaboration.
References 1. Jannach D, Jugovac M (2019) Measuring the business value of recommender systems. ACM Trans Manag Inf Syst (TMIS) 10(4):1–23 2. Strîmbei C, Dospinescu O, Strainu RM, Nistor A (2016) The BPMN approach of the university information systems. Ecoforum J 5(2):181–193 3. Dwivedi P, Kant V, Bharadwaj KK (2018) Learning path recommendation based on modified variable length genetic algorithm. Educ Inf Technol 23(2):819–836 4. De Medio C, Limongelli C, Sciarrone F, Temperini M (2020) MoodleREC: a recommendation system for creating courses using the Moodle e-learning platform. Comput Hum Behav 104:106168 5. Deng S, Wang D, Li Y, Cao B, Yin J, Wu Z, Zhou M (2016) A recommendation system to facilitate business process modeling. IEEE Trans Cybern 47(6):1380–1394 6. Jiang L, Cheng Y, Yang L, Li J, Yan H, Wang X (2019) A trust-based collaborative filtering algorithm for E-commerce recommendation system. J Ambient Intell Hum Comput 10(8):3023– 3034 7. Abdollahpouri H, Adomavicius G, Burke R, Guy I, Jannach D, Kamishima T, Krasnodebski J, Pizzato L (2020) Multistakeholder recommendation: survey and research directions. User Model User-Adapt Int 30(1):127–158
282
J. L. Shreya et al.
8. Saini A, Jain A, Shreya JL (2022) Recommendation system for business process modelling in educational organizations. In: International conference on innovative computing and communications. Springer, Singapore, pp 585–594
Parkinson Risks Determination Using SVM Coupled Stacking Supratik Dutta, Sibasish Choudhury, Adrita Chakraborty, Sushruta Mishra, and Vikas Chaudhary
Abstract Biomarkers obtained from a person’s voice may provide interpretations into brain-related risks like Parkinson’s disease due to their cognitive aspects. This audit portrays as of late created specialized methodologies for the type of diseases and examines its utility and constraints as an examination stage and for change in translation for disease genomics and disease accuracy medication. Considering the variety available and understanding of plentiful change of data, it requires the examination of enormous scope. Use of classifiers like the k-nearest neighbour (KNN), decision tree, logistic regression, and support vector machine (SVM) algorithms are required that will be utilizing some training variables and examine how the input values are related to the class. Then, after implementing each classifier successfully, the test accuracies for the support vector machine (SVM), k-nearest neighbour (KNN), decision tree, and logistic regression models were 97.4, 96, 86.6, and 83.5%. Stacking is further used using the above said methodology, it was possible to achieve an overall accuracy of 97.5%. Error rate generated was also least of 0.327 with stacking. The goal of this paper was to improve the illness detection process with more accuracy and early detection, ultimately saving many lives. Keywords Cognitive · Neurodegenerative · Genomics · k-nearest neighbour · Decision tree · Stacking
1 Introduction Parkinson’s disease is a chronic risk that affects both the brain’s capacity for function and the bodily functions that it regulates. An estimated 10 million people worldwide suffer from this condition, which makes the body rigid and makes the hands and S. Dutta · S. Choudhury · A. Chakraborty · S. Mishra (B) Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India e-mail: [email protected] V. Chaudhary AI and DS Department, GNIOT, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_25
283
284
S. Dutta et al.
body tremble. The symptoms gradually peaks. Initially, it may be a minor trembling of hands and it eventually increases with tremors, as a result it can become quite stiff. There is still no effective treatment or cure for this advanced stage, which results in a loss of around 60% of the dopamine in the basal ganglia, which regulates movement of the body with a modest quantity of dopamine [1]. The effectiveness of treatment depends on starting it at the earliest possible stage of the illness. A person with Parkinson’s disease may also have the following signs and symptoms: (1) depression (2) problems with sleeping and remembering (3) stress (4) problems with balance and a loss of smell. What’s interesting in this situation is that 90% of all Parkinson’s disease diagnoses some form of vocal impairment, which includes a decline in the ability to produce vocal sounds normally or dysphonia as it is known medically [2]. A staggering 1–2% of people in the 60+ age group globally are affected by this disease [3]. More than one-third of all diseases can be avoided, and about 33% of them can be treated with early detection. Mass screening must be made even more effective, cost-effective, and secure until real primary prevention is discovered. In context to this, more work needs to be done to persuade women patients to give their consent to such mass screening. When creating an ideal prediction and detection model, ensemble techniques combine several machine learning algorithms or models. When compared to the base learners alone, the model developed performs better. The selection of key features, data fusion, and other uses of ensemble learning is also possible [4]. • The three main categories of ensemble techniques are bagging, boosting, and stacking. Here, stacking is used for combining numerous regression or classification models. • Exploring a space of various models of the same problem is the goal of stacking. • The fundamental principle underlying this is that when a learning programme is addressed with various types of models, only a part of the problem domain may be learned. • As a result, we may create a variety of learning methods and utilize them to design intermediate predictions of a unique prototype. • Then, a novel framework which acquires skill from intermediate estimations with the same aim can be added. The final model is piled on top of the preceding models. As a result, performance as a whole increases in comparison with individual intermediate models [5]. Figure 1 shows a simplified illustration of the underlying stacking design. It demonstrates how the model’s architecture is composed of two or more base models, including an ensemble model that integrates all estimates from the root models and a learner’s model. The simple prototypes are referred to as level-0 model, whereas the stacked model is known as a level-1 model. The stacked ensembling approach takes into account the initial train samples, primary models, primary-level prediction, intermediate model, and final prediction. • Original data: This data is also referred to as test data or training data and is separated into n no. of folds.
Parkinson Risks Determination Using SVM Coupled Stacking
285
Fig. 1 Stacking demonstration
• Base models: Another name for these models is level-0 models. These models provide assembled predictions (level-0) using training data. • Level-0 prediction: When a basic model is triggered on a piece of training data, it generates a distinct set of predictions. • The stacking model’s architecture is made up of a meta-model, which makes it possible to combine the underlying models’ predictions in the most efficient manner. The meta-model is also known as the level-1 model. • Level-1 prediction: Samples not used in model training are fed into a stacked model, forecasts are made, and those forecasts, along with the desired outcome, make it easier to use the input and output pairs of trained data pairs for model fitting. The stacked model, which is trained on numerous estimates made by different models, decides how to optimally combine all estimations of fundamental prototypes. The novelty of this approach is that it can protect a variety of effective models’ abilities to address classification and regression issues. Additionally, it helps to develop a superior model with predictions that outperform all individual models.
2 Background Study
Authors details
Year
Method
Nilashi et al. [1]
2022
The study gives concept-based model to interpret role of biomarker in medical analysis
Michael et al. 2018 [2]
Medical brain risks with epidemiology of main neurodegenerative symptoms
Zeng et al. [3]
2017
Distinguishing Parkinson’s affected patients from healthy people using grey matter in cerebellum
Mei et al. [4]
2021
Predictive analytics for assessment of Parkinson’s risks (continued)
286
S. Dutta et al.
(continued) Authors details
Year
Method
Joshi et al. [5]
2011
Parkinson symptoms categorization with stacked meta learner
Xu et al. [6]
2020
The main aim is to test the ability of motor function of the patient with Parkinson’s disease and show high accuracy in early stage
Mei et al. [4]
2021
Present a detailed view of data visualizations using prediction methods used in disease diagnosis
Sujan Reddy et al. [7]
2022
Stacked deep neural networks with machine learning frameworks for short-term energy harvesting estimation
Wang et al. [8]
2020
Fast and at prior assessment of Parkinson’s risk with deep learning and machine learning
Yang et al. [9]
2021
Parkinson’s disease classification with stacking ensemble learning algorithm
3 Proposed Model of Parkinson’s Disease Detection Using SVM-Based Stacking In Fig. 2, the proposed model is shown here starting from the extraction of training data to finding accuracy using individual classifiers, each giving separate accuracies, to obtaining final accuracy using stacking algorithms. The PD dataset obtained from kaggle is used to train the stacked model using PD detection method. This dataset consists of 31 people’s biological voice measures, 23 of whom have Parkinson’s disease (PD). Each column in the table corresponds to a certain vocal measure and each row in the table represents one of the 195 voice recordings from these people (i.e. “name” column). The main objective of the data is to identify between healthy people and people with PD, as shown by the “status” column, which is set to 0 for healthy and 1 for PD [4]. A matrix’s column entries (attributes) containing ASCII-encoded subject name and recording number, voice fundamental frequency on average (MDVP: Fo (Hz)), MDVP: Fhi (Hz) is the maximum and MDVP: Flo (Hz) is the minimum vocal fundamental frequencies (Hz), MDVP: Jitter (%), MDVP: Jitter are measures of fundamental frequency fluctuation (Abs), Jitter: DDP, MDVP: RAP, and MDVP: PPQ. MDVP: Shimmer, MDVP: Shimmer (dB), Shimmer: APQ3, Shimmer: APQ5, MDVP: APQ, and Shimmer: DDA are amplitude variation measurements. The noise tonal component ratio of the voice can be measured using NHR and HNR. 1 indicates Parkinson’s disease, and a 0 indicates health. RPDE and D2 signal fractal scaling exponent, or DFA, are two nonlinear dynamical complexity measurements. Spread1, Spread2, and PPE are three nonlinear measurements for fundamental frequency variation [7]. Four different types of algorithms to train the data are SVM, KNN, logistic regression, and decision tree. The model was then trained using the meta learner
Parkinson Risks Determination Using SVM Coupled Stacking
287
Fig. 2 Proposed stacking-based model of Parkinson’s disease detection
as logistic regression utilizing the data that we had previously fed into the model training for the stacked model. • Support vector machine (SVM): Support vector machines, sometimes referred to as support vector networks, which collect data for regression and categorization, are supervised learning models with corresponding learning algorithms. • K-nearest neighbour (KNN): It is used for regression and classification. The data in the two situations consists of the k closest prepared models in the data set. The item is then simply demoted to the class of that single nearest neighbour if k = 1. This value is the average of the traits of the k closest neighbours. • Logistic regression: This logistic model can be used to show several instances that are similar to determining what an image includes. Each object that can be identified in the image will have a probability assigned to it between 0 and 1, with a sum of 1. • Decision tree: Decision trees are included in the category of supervised learning and can be applied in classification and regression tasks, with classification problems being the most common. Stacking combines the various learning capabilities of different ML models to allow them to be used to solve a problem. The stacking framework combines multiple classifiers and utilizes the significant results obtained as F1 score, accuracy, precision, and recall which are taken as input for the stacked classifier, which is logistic regression, as shown in level 0. This increases the sharpness of findings by integrating frequently less strong prototypes and is quite simple. The SVM classifier, KNN classifier, and decision tree classifier are used as the base learners to feed the predictions from each model into the integrated learner in order to determine the mean forecast of the stacking framework. On basis of that, an integration for the
288 Table 1 Accuracies of individual classifiers and overall accuracy after stacking
S. Dutta et al.
Classifier
Accuracy (%)
SVM
97.4
KNN
96
Decision tree
86.6
Logistic regression
83.5
Stacking
97.5
level-1 stacking model that outperformed rival combinations and had a 97.5% accuracy rate is proposed. A stacked model using SVM, KNN classifier, and decision tree classifier as the base learners and logistic regression as the meta learner is generated [8].
4 Result and Analysis The recorded test accuracies for SVM, KNN, decision tree and logistic regression models were 97.4%, 96%, 86.6% and 83.5% respectively. Because of this, the layered model advocate adopts it in place of any of the separate models due to its performance on the test data [9]. The stacking model performed better than all other models, achieving a test score of astounding 97.5%. The stacked model overdid other models and had a greater potential for generalization. Because of the stacked model bases, its outcomes on the knowledge gained from every individual models and it performs better overall. The accuracy analysis is given in Table 1. In Fig. 3, box plots, i.e. type of map that shows the quartile distribution of a particular set of data (or variable) is shown. The data set displays the minimum, maximum, median, first quartile, and third quartile of respective classifiers after stacking algorithm and overall accuracy of it is shown here. The plots were plotted for the range of values reported for accuracy of the machine learning models used [10, 11]. SVM achieved the best result among classification models, i.e. 97.4% whereas stacking algorithm gave overall maximum accuracy of 97.5%. Figure 4 shows the RMSE value analysis of the proposed stacking model to determine the error rate incurred in implementing it. A relatively less RMSE error of 0.327 is observed in comparison with other classifiers which resulted in higher values.
5 Societal Benefits of the Proposed Model (1) Although high diagnostic accuracy for Parkinson’s disease has been demonstrated in clinical practice, and as demonstrated in this study, machine learning approaches have also demonstrated high accuracy. Models including SVM are
Parkinson Risks Determination Using SVM Coupled Stacking
289
Fig. 3 Stacking box plot analysis of the proposed model
Supervised methods
Stacking
0.327
Logistic regression
1.032
Decision Tree
0.654
KNN
0.879
SVM
1.762 0
0.5
1
1.5
2
Error rate (RMSE)
Fig. 4 Error rate analysis of the proposed stacking model
particularly useful for increasing Parkinson’s disease diagnosis using underutilized data modalities (such as voice) in clinical decision-making and identifying relevant features from this data [12]. (2) Machine learning algorithms can identify patterns associated with disease and medical difficulties by concentrating on massive amounts of patient data, including medical records. (3) Future advancements in AI may make it easier for people in developing nations to access healthcare services and hasten the diagnosis and treatment of cancer [13].
290
S. Dutta et al.
6 Conclusion Parkinson’s disease, a progressive neurodegenerative condition of the brain, can affect the nerve cells in the central nervous system due to elevated dopamine levels [14–16]. In this work, continuous assessment of the progression of the disease by measuring the drop in dopamine levels in the brain is done. A patient’s risk status can be determined with the help of the illness’ progression, so that preventative actions can be performed early. An algorithm for prepping the data values and narrowing the data range in order to prevent noise and missing data is developed. The execution process was then described using an algorithm from the proposed paradigm. The suggested model was built using a variety of ML techniques, and the output was combined using a meta-learning process.
7 Future Scope Using collected and consistent data, machine learning enables the construction of models that quickly explore information and communicate outcomes. The development of evidence linking environmental factors, such as ambient air pollution and climatic conditions, to the development and severity of cardiovascular diseases (CVDs) has increased interest in medical care. AI allows medical care specialist organizations to make better decisions regarding a patient’s treatment options and diagnosis, which enhances the quality of medical services administrations overall. AI tactics are anticipated to produce unavoidable linkages with distant hubs. In reality, AI prepares for the Internet of Things (IoT), a network that maintains communications between various devices without the need for human interaction. A few industries like healthcare, brilliant networks, vehicular interchanges, etc., apply AI tactics.
References 1. Nilashi M, Abumalloh RA, Minaei-Bidgoli B, Samad S, Yousoof Ismail M, Alhargan A, Abdu Zogaan W (2022) Predicting Parkinson’s disease progression: evaluation of ensemble methods in machine learning. J Healthcare Eng 2022:2793361. https://doi.org/10.1155/2022/2793361 2. Erkkinen MG, Kim MO, Geschwind MD (2018) Clinical neurology and epidemiology of the major neurodegenerative diseases. Cold Spring Harb Perspect Biol 10(4):a033118. https://doi. org/10.1101/cshperspect.a033118 3. Zeng LL, Xie L, Shen H, Luo Z, Fang P, Hou Y, Tang B, Wu T, Hu D (2017) Differentiating patients with Parkinson’s disease from normal controls using gray matter in the cerebellum. Cerebellum (London, England) 16(1):151–157. https://doi.org/10.1007/s12311-016-0781-1 4. Mei J, Desrosiers C, Frasnelli J (2021) Machine learning for the diagnosis of Parkinson’s disease: a review of literature. Front Aging Neurosci 13:633752. https://doi.org/10.3389/fnagi. 2021.633752. PMID: 34025389; PMCID: PMC8134676 5. Joshi DD, Joshi HH, Panchal BY, Goel P, Ganatra A (2022) A Parkinson disease classification using stacking ensemble machine learning methodology. In: 2022 2nd international conference
Parkinson Risks Determination Using SVM Coupled Stacking
6.
7. 8.
9.
10.
11.
12.
13. 14.
15.
16.
291
on advance computing and innovative technologies in engineering (ICACITE), Greater Noida, India, pp 1335–1341. https://doi.org/10.1109/ICACITE53722.2022.9823509 Xu S, Pan Z (2020) A novel ensemble of random forest for assisting diagnosis of Parkinson’s disease on small handwritten dynamics dataset. Int J Med Inform 144:104283. https://doi.org/ 10.1016/j.ijmedinf.2020.104283 Reddy S, Akashdeep S, Harshvardhan R, Kamath S (2022) Stacking deep learning and machine learning models for short-term energy consumption forecasting. Adv Eng Inform 52:101542 Wang W, Lee J, Harrou F, Sun Y (2020) Early detection of Parkinson’s disease using deep learning and machine learning. IEEE Access 8:147635–147646. https://doi.org/10.1109/ACC ESS.2020.3016062 Yang Y, Wei L, Hu Y, Wu Y, Hu L, Nie S (2021) Classification of Parkinson’s disease based on multi-modal features and stacking ensemble learning. J Neurosci Methods 350:109019. https:/ /doi.org/10.1016/j.jneumeth.2020.109019 Mohapatra SK, Mishra S, Tripathy HK, Alkhayyat A (2022) A sustainable data-driven energy consumption assessment model for building infrastructures in resource constraint environment. Sustain Energ Technol Assess 53:102697 Mohanty A, Mishra S (2022) A comprehensive study of explainable artificial intelligence in healthcare. In: Augmented intelligence in healthcare: a pragmatic and integrated analysis. Springer, Singapore, pp 475–502 Sahoo PK, Mishra S, Panigrahi R, Bhoi AK, Barsocchi P (2022) An Improvised deeplearning-based mask R-CNN model for laryngeal cancer detection using CT images. Sensors 22(22):8834 Sivani T, Mishra S (2022) Wearable devices: evolution and usage in remote patient monitoring system. In: Connected e-health. Springer, Cham, pp 311–332 Mishra S, Jena L, Tripathy HK, Gaber T (2022) Prioritized and predictive intelligence of things enabled waste management model in smart and sustainable environment. PLoS ONE 17(8):e0272383 Tripathy HK, Mishra S, Suman S, Nayyar A, Sahoo KS (2022) Smart COVID-shield: an IoT driven reliable and automated prototype model for COVID-19 symptoms tracking. Computing 1–22 Suman S, Mishra S, Sahoo KS, Nayyar A (2022) Vision navigator: a smart and intelligent obstacle recognition model for visually impaired users. Mob Inform Syst 2022
Customer Feedback Analysis for Smartphone Reviews Using Machine Learning Techniques from Manufacturer’s Perspective Anuj Agrawal, Siddharth Dubey, Prasanjeet Singh, Sahil Verma, and Prabhat Kumar
Abstract Recent advances in the technology have led to an exponential growth in the e-commerce business. Users today prefer online shopping to legacy shopping methods in order to save their effort and time. In addition, product reviews given by the customers provide transparency about the quality of the products which helps fellow customers to buy or discard a product. Moreover, product reviews are also of huge importance to merchants, to assess the drawbacks and requirements of their customer base. This paper proposes a customer feedback analysis system for smartphone reviews to produce meaningful keywords behind the dissatisfaction of the customers. The proposed approach employs a sentiment classifier to extract negative reviews corresponding a mobile device. Further, topic model is used to extract keywords representing the cause of dissatisfaction among the customers. The proposed system will be helpful to the smartphone manufacturers to identify the drawbacks of their products and need of the customers to create better products. Keywords Customer reviews · Online shopping · Sentiment analysis · Topic modeling
A. Agrawal · S. Dubey · P. Singh · S. Verma (B) · P. Kumar Computer Science and Engineering Department, National Institute of Technology Patna, Patna, India e-mail: [email protected] A. Agrawal e-mail: [email protected] S. Dubey e-mail: [email protected] P. Singh e-mail: [email protected] P. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_26
293
294
A. Agrawal et al.
1 Introduction The rise of e-commerce in recent years has led to an increasing trend among customers to purchase products online. Moreover, the availability of huge range of brands and ease of getting products delivered at doorsteps have boosted the population of online customers. In addition, sellers have also benefitted from the rise of e-commerce by providing them a global platform to reach customers across towns, states and even borders. Further, online shopping also facilitates customers to give their reviews for products, which is arguably the most impactful way to eliminate shopper’s concerns regarding a product. Product reviews are one of the driving forces behind the success of e-commerce platform. Notably, 90% of the customers read reviews before purchasing a product, while 88% of them are prompted to purchase, after reading positive reviews [1]. Sellers often overlook the importance of product reviews and focus on the design of their site or optimizing their page. However, positive product reviews play a crucial role in establishing credibility of a brand or shop, whereas negative reviews help in understanding the drawbacks of their products as well as the requirements of their customers. Owing to this, the current work proposes a feedback analysis system for mobile manufacturers, by extracting meaningful keywords and phrases from customer reviews for a mobile device. These keywords would help mobile brand owners to assess the requirements of their user base and improve on the drawbacks of their devices. The proposed system employs a sentiment classification model to identify product reviews as positive and negative. Further, the negatively identified reviews are used by the topic model to extract meaningful keywords corresponding to the reviews. The major contributions of the proposed work can be summarized as follows: • Product feedback analysis system is proposed to extract keywords from negative reviews corresponding to different mobile devices. • A sentiment classification model is proposed to classify the customer reviews as positive and negative. • A topic model is designed to extract keywords from the classified negative reviews of the mobile devices. The rest of the paper is organized as follows: Sect. 2 briefs about the related works. Section 3 explains the proposed methodology for the extraction of keywords from the negative reviews. Section 4 discusses the results obtained and Sect. 5 concludes the paper highlighting the future work.
Customer Feedback Analysis for Smartphone Reviews Using Machine …
295
2 Related Works The proposed system employs a sentiment classification model and topic model with the aim of producing keywords corresponding to negative reviews of a mobile device. Further, several researchers have proposed different approaches to optimize online shopping domain [2]. Hence, this section presents the related work based on two aspects, namely sentiment classification and topic modeling.
2.1 Sentiment Classification Several researchers have used different machine learning and deep learning techniques [3] for sentiment classification of product reviews in different application areas. For instance, [4, 5] used customer review data from Amazon to classify reviews in positive and negative categories. Smetanin and Komarov [6] employed convolutional neural networks to successfully identify customer reviews in positive and negative sentiments with an F-measure score of 75.45%. Support vector machine was employed by Jabbar et al. [7] to perform sentiment analysis at two levels: review level and sentence level. The proposed approach obtained an F-1 score of 93.54%. In another approach, [8] proposed a hybrid model based on long short-term memory (LSTM) and convolutional neural networks (CNN) for sentiment classification of movie reviews. Similarly, [9] utilized random evolutionary whale optimization algorithm and deep belief networks for online product sentiment analysis to achieve a classification accuracy of 96.86%. Dadhich and Thankachan [10] applied five different supervised learning classifiers such as Naïve Bayes (NB) [11], logistic regression (LR) [12], SentiWordNet, random forest (RF) [13] and K-nearest neighbor (KNN) [14] to classify amazon and Flipkart customer reviews into three categories: positive, negative and neutral classes. Mathew and Bindu [15] employed several transformers such as BERT, RoBERTa, ALBERT and DistillBert for sentiment analysis. Further, the results obtained were compared with LSTM memory network, and it was noted that transformer models outperformed the LSTM network. Similarly, [16] proposed two models based on LSTM and recurrent neural networks (RNN) for sentiment classification of tweets.
2.2 Topic Modeling Topic modeling techniques such as Latent Dirichlet Allocation (LDA) and singular value decomposition (SVD) have been widely used by researchers to extract meaningful keywords from documents and discover hidden themes (topics) that run
296
A. Agrawal et al.
through documents. For instance, [17] used airline reviews to identify significant issues being faced by the customers. The proposed approach employed topic modelling on negative reviews and revealed keywords such as ‘seat’, ‘service’ and ‘meal’ behind the dissatisfaction of the customers. Similarly, [18] employed LDA to extract 43 topics of interest of customers on Airbnb reviews across New York City to analyze customer experience and satisfaction. Negara et al. [19] employed LDA to extract keywords from Twitter data, which was further used to identify different underlying topics such as economy, military, sports, and technology. Li et al. [20] proposed a joint semantic-topic model to analyze the business impact of online reviews. The proposed approach extracted topics and associated sentiments from the reviews. Tushev et al. [21] used customer reviews of mobile apps from the domain of investing and food delivery to extract keywords which are then used to generate meaningful domain-specific topics.
3 Methodology This section discusses about the proposed approach for extracting keywords corresponding to negative reviews for mobile devices. Figure 1 represents the overview of the proposed approach, where the customer reviews are taken as raw input after going through data preprocessing phase, to train a sentiment classification model. Further, the trained model is used on the review data corresponding to a particular mobile device (referred to as “test data”) to categorize the reviews into positive and negative classes. Next, the positive reviews are filtered out and negative reviews are fed into the topic model for extracting meaningful keywords and topics. The following subsections discusses about the dataset used, followed by data pre-processing techniques, sentiment classification model and topic model used in the proposed approach.
3.1 Dataset Description The proposed approach uses smartphone reviews dataset [22] to generate the final dataset used to train and evaluate the sentiment classification and topic model. Figures 2 and 3 represent the instances of the smartphone details and smartphone reviews dataset, respectively. Figure 4 depicts the final dataset obtained by joining the two datasets based on “asin” value, which uniquely represents smartphone models.
3.2 Text Preprocessing Figure 5 depicts all the pre-processing steps applied in the proposed approach. Initially, all the punctuation marks are removed and the reviews are converted to
Customer Feedback Analysis for Smartphone Reviews Using Machine …
297
Fig. 1 Proposed approach
Fig. 2 Smartphone details
lowercase words; for example, “Good,” and “GrEat” are converted to “good” and “great”. Next, the reviews are tokenized followed by the removal of stop words that frequently appear in the text but do not significantly add to the meaning. Finally, the tokenized words are changed to their base form by applying stemming techniques.
298
A. Agrawal et al.
Fig. 3 Smartphone reviews
Fig. 4 Final dataset
Fig. 5 Pre-processing steps
3.3 Sentiment Classifier The performance of the machine learning models highly depends on the features extracted from the dataset. Hence, the proposed approach employs two different feature extraction techniques, namely count vectorizer and tf-idf to transform the textual data of reviews into features to be fed to the machine learning models. On the one hand, count vectorizer uses the bag of word approach that ignores the text structures and only extracts information from the word counts. On the other hand, tf-idf generates a document-term matrix describing the frequencies of all terms occurring in the collection of text documents. The underlying idea of tf-idf is to value those terms that are not so common in the corpus, but still have some reasonable level of frequency.
Customer Feedback Analysis for Smartphone Reviews Using Machine …
299
Further, Naïve Bayes and linear support vector classifier (Linear SVC) are employed on the features generated by count vectorizer and tf-idf vectorizer, respectively, to obtain the best configuration of sentiment classification model.
3.4 Topic Model The classifier model trained in the earlier step is used to identify reviews with negative sentiments. Further, these negative reviews are turned into document-term matrix with the use of tf-idf vectorizer. Since, the matrix is very sparse and includes lots of low-frequency words, truncated SVD is used to reduce dimension. Consequently, non-negative matrix factorization (NMF) is performed to get the final topics that represent the negative features of the smartphone.
4 Results The proposed work implements two different configurations of sentiment classifiers, namely CountVectorizer along with Naïve Bayes classifier (CV+NB) and tf-idf along with linear SVC classifier (TI+SVC). Figure 6 represents the test accuracy obtained by the two configurations on the test dataset. The results obtained demonstrate that tf-idf + Linear SVC technique outperforms CountVectorizer + Naïve Bayes with an accuracy of 81%. Hence, TI+SVC is used for sentiment classification of customer reviews. Further, all the negative reviews corresponding to a mobile device are identified by the proposed classifier and fed to the topic model. Figures 7 and 8 represent the interface for selecting a mobile brand and its corresponding device. Further, Fig. 9 depicts the WordCloud formed corresponding to the negative reviews extracted for the selected iPhone device. Finally, meaningful keywords are extracted showing the demerits of the products. Table 1 depicts the top three keywords extracted corresponding to iPhone device. Fig. 6 Comparison of sentiment classifiers
300
Fig. 7 Brand picker
Fig. 8 Mobile device selection Fig. 9 Word cloud
A. Agrawal et al.
Customer Feedback Analysis for Smartphone Reviews Using Machine … Table 1 Extracted keywords
S. No
iPhone
1
Battery
2
Charge
3
Screen
301
5 Conclusions In this work, a product feedback analysis system is proposed for smartphone reviews, to extract meaningful keywords from negative reviews. The keywords would help the mobile owners to improve the drawbacks of their devices and understand the requirements of the customers. The proposed system initially employs a sentiment classifier based on tf-idf and linear SVC to identify the reviews with negative sentiments. Further, topic modeling is applied on the negative reviews to extract keywords. However, the proposed work has been applied for smartphone reviews. The proposed work has the potential to be extended to other domains as well. Furthermore, this work targets smartphone brand owners. In future, a similar approach can be developed to identify keywords that would help customers in purchasing the products.
References 1. Saleh K (2019) The importance of online customer reviews [Infographic]. https://www.invesp cro.com/blog/the-importance-of-online-customer-reviews-infographic/. Last accessed 2023/ 01/05 2. Verma S, Sinha A, Kumar P, Maitin A (2020) Optimizing online shopping using genetic algorithm. In: 2020 3rd international conference on information and computer technologies (ICICT). IEEE, pp 271–275 3. Singh N, Singh MP, Kumar P (2021) Event classification from the Twitter stream using hybrid model. In: Proceedings of the international conference on paradigms of computing, communication and data sciences: PCCDS 2020. Springer, pp 751–760 4. Suresh P, Gurumoorthy K (2022) Mining of customer review feedback using sentiment analysis for smart phone product. In: International conference on computing, communication, electrical and biomedical systems. Springer, pp 247–259 5. Pandey P, Soni N (2019) Sentiment analysis on customer feedback data: Amazon product reviews. In: 2019 international conference on machine learning, big data, cloud and parallel computing (COMITCon). IEEE, pp 320–322 6. Smetanin S, Komarov M (2019) Sentiment analysis of product reviews in Russian using convolutional neural networks. In: 2019 IEEE 21st conference on business informatics (CBI). IEEE, pp 482–486 7. Jabbar J, Urooj I, JunSheng W, Azeem N (2019) Real-time sentiment analysis on E-commerce application. In: 2019 IEEE 16th international conference on networking, sensing and control (ICNSC). IEEE, pp 391–396 8. Rehman AU, Malik AK, Raza B, Ali W (2019) A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Multimedia Tools Appl 78(18):26597–26613
302
A. Agrawal et al.
9. Mehbodniya A, Rao MV, David LG, Joe Nige KG, Vennam P (2022) Online product sentiment analysis using random evolutionary whale optimization algorithm and deep belief network. Pattern Recogn Lett 159:1–8 10. Dadhich A, Thankachan B (2022) Sentiment analysis of Amazon product reviews using hybrid rule-based approach. Smart Innov Syst Technol 235:173–193 11. Rish I (2001) An empirical study of the Naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, pp 41–46 12. LaValley MP (2008) Logistic regression. Circulation 117:2395–2399 13. Breiman L (2001) Random forests. Mach Learn 45:5–32 14. Jiang S, Pang G, Wu M, Kuang L (2012) An improved K-nearest-neighbor algorithm for text categorization. Exp Syst Appl 39:1503–1509 15. Mathew L, Bindu VR (2022) Efficient classification techniques in sentiment analysis using transformers. In: International conference on innovative computing and communications. Springer, pp 849–862 16. Pradhan R, Agarwal G, Singh D (2022) Comparative analysis for sentiment in tweets using LSTM and RNN. In: International conference on innovative computing and communications. Springer, pp 713–725 17. Kwon HJ, Ban HJ, Jun JK, Kim HS (2021) Topic modeling and sentiment analysis of online review for airlines. Information 12:78 18. Sutherland I, Kiatkawsin K (2020) Determinants of guest experience in Airbnb: a topic modeling approach using LDA. Sustainability 12:3402 19. Negara ES, Triadi D, Andryani R (2019) Topic modelling Twitter data with latent Dirichlet allocation method. In: 2019 international conference on electrical engineering and computer science (ICECOS). IEEE, pp 386–390 20. Li X, Wu C, Mai F (2019) The effect of online reviews on product sales: a joint sentiment-topic analysis. Inform Manage 56:172–184 21. Tushev M, Ebrahimi F, Mahmoud A (2022) Domain-specific analysis of mobile app reviews using keyword-assisted topic models. In: Proceedings of the 44th international conference on software engineering, pp 762–773 22. Kaggle.com. Amazon Cell Phones Reviews. https://www.kaggle.com/datasets/grikomsn/ama zon-cell-phones-reviews?select=20191226-reviews.csv. Last accessed 2023/01/03
Fitness Prediction in High-Endurance Athletes and Sports Players Using Supervised Learning Shashwath Suvarna, C. Sindhu , Sreekant Nair, and Aditya Naidu Kolluru
Abstract Sports injuries are unpredictable and are unavoidable if it befalls someone. Sports injuries occur during exercise or while participating in a sport. Nearly 2 million people every year, many of whom are otherwise healthy, suffer sports-related injuries, and receive treatment in emergency departments. Some sports-related injuries, such as sprained ankles, may be relatively minor, while others, such as head or neck injuries, can be quite serious. It is even common among sports players who are known to have a fit body and good physical fortitude, the reason being the excessive strain on the body due to overworking a particular muscle group or overstretching a particular muscle group (Khaitin et al. in Ann Transl Med, 2021) [1]. These are sports injuries that occur due to one’s negligence. While there is also another type of injury that occurs due to the initiative of others, that occurs due to foul play of the players. The latter kind of injuries are unpredictable, the first kind of injuries can be predicted which occurs due to repetitive action of the muscles and it can be avoided, which may help the player to last on the ground for more time and showcase his arsenal of plays. Due to the advancement in technology, it can be observed that 50% in the number of injuries has decreased between 2012 and 2021. More than 1.1 million sports-related injuries were treated in emergency departments in 2021— down from more than 2.1 million in 2012. These numbers can be further decreased with the thorough help of data science. While it may seem data science has little to nothing in sports but on the contrary, it has a wide range of applications for it to be a domain in itself. Many different methods can be used to acquire data, i.e., either by asking players for feedback or video recordings of the players’ practices or by frequent medical check-up data etc. The main motive is to identify a player’s fitness based on very important and general day-to-day attributes that affect one’s lifestyle, where these attributes are very different from pure medical attributes. These data can be permuted and can be used for the identification of a sports injury that may happen in near future. S. Suvarna · C. Sindhu (B) · S. Nair · A. N. Kolluru Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai 603203, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_27
303
304
S. Suvarna et al.
Keywords Fitness prediction · Data science · Machine learning · Injury · Analysis · Acute training load · Chronic training load · Training stress score · TSB · Body mass index
1 Introduction Generally, an Injury is considered to be a piece of bad news for the entire management as well as the team because the consequence of playing during an injury can be considered to worsen the player’s performance and also affects the team’s performance. In the modern era, although the health industry has advanced, the combination of technology in the health industry has benefited the sports industry, and the three of them have worked hand in hand, but the issue of sudden injuries and certain impacts that these injuries bring to the teams have continued to affect the team and especially the results of the team. An unfit player is often considered to be a dead asset for the team, and it is necessary to assess the players frequently.
1.1 Injury and Its Consequences Statistics often tell us that sudden injuries are one of the major reasons a player is not available in a particular season. Also, when this involves big names, the team is badly wound as they have lost that momentum of having a solid team. So, it is necessary to understand fitness and analyze certain aspects [1] and declare the player fit/unfit for the next game and give them proper rest. The reason there is a necessity to predict a player’s injury is that most teams lose revenue and match results because they are unable to manage a player’s workload, resulting in overstressing certain conditions that can cause or lead to a particular injury. In football, it has been often seen that many of the players are injured very frequently because of the exertion of their body during training hours, this also concludes that different sports have different methods of injury and risk management, basically predicting an injury-prone player or analyzing trends related to injured players can help the team in being prepared better for the next time and of course, save them huge amounts of money.
1.2 Injury Prediction Using Data Science and ML Models Data science and machine learning have two very different roles in resolving this issue. As it is known that data science/analytics generally deals with the representation of data and generating important patterns from it. Data science probably can help in the visualization trends of the injured players by using statistics, and different types
Fitness Prediction in High-Endurance Athletes and Sports Players Using …
305
of graphs for example a scatter plot can be used for showing the correlation between two attributes, a bar plot to represent the importance of each attribute, and heatmap can also show the correlation between an injury and the various factors causing it. Thus, it can be concluded that data science can be used to basically give us the most important factors affecting a player’s condition or the vulnerability of a player related to a sport. After using data science for most probably identifying the most important attributes then the machine learning models can be implemented to predict whether a certain player is prone to an injury based on the following factors or not [2], for example from our exploratory data analysis it has been observed that several hours for training, proper ground conditions, players relation with the playing conditions, recent health conditions, joint mobility, past injury risk and, several others can be important for determining an injury for a player. After obtaining these attributes, they can be used to predict whether that player is going to be injured in the near future or not. The main objectives of using these domains are to find important patterns and the most important features and generalizing a concept of determining whether a person should play the next match or not. The most important ML [3] models which be used to predict an injury-prone player or predict any result based on these parameters in classification and other supervised learning algorithms. Basically, printing a yes/no output can also work in this case. Naive Bayes, linear regression are common methods. Random forest and decision trees are also some of the most frequent methods used these days. The concept of using supervised algorithms is to work on labeled data. There are two simple answers whether it is correct for the management to add this player in the squad or not.
1.3 Societal Benefits of the Fitness Prediction Many people wonder what life is like after earning a degree in a technical field. For some, it means working at a desk all day. Others work retail or go to school to become teachers. However, there are many jobs that require little training or education skills. For example, consider a fitness instructor. The job description is typically short and sweet—the person doing it must be in good physical condition. Essentially, someone who excels at athletics or sports must look for work as a fitness instructor because it’s an easy way to earn money. Working as a fitness instructor requires an easy schedule and little training—which makes it ideal for those who don’t want to work hard for their career. Plus, any person who successfully completes the job training program will possess the necessary skills to excel at this job. That being said, there are many benefits of having high-endurance athletes and sports players as employees. First, most fitness instructors take pride in their work because they know how important their job is to society. People who train hard understand the difficulty of their jobs and provide better service as a result. This benefits both the employee and the organization he works for in large ways.
306
S. Suvarna et al.
Secondly, high-endurance athletes [4] and sports players understand the challenges of their jobs much better than those without athletic experience. Most employers understand that new employees need the training to learn their job well— but they also understand high-endurance athletes and sports players are already trained and ready to work. This allows employers to have fewer meetings and get work done faster with less downtime. Plus, employees will have more time to do what they do best—which is working hard at athletics or sports. This will lead to higher quality workouts that the employer can profit from in return. Fitness prediction will allow employers to hire easily trained high-endurance athletes or sports players for jobs requiring easy schedules, little training, or athletic experience. This is great for both employer and employee because it allows people to easily indulge in their passion while also providing useful service to society with no training required!
2 Related Work Sports is a large global business. The global market heightened from $355 billion in 2021 to $497 billion in 2022. This steep increase in market value has made players an important asset to sponsors and ambassadors. That is where injury prediction models have gained popularity recently. Several papers have been referred and models have been analyzed related to expect scores and wins above replacement models. Research work that emphasizes on deep learning and data imputation has been referred. There have been various sophisticated methods for dealing with injury prediction analysis, but a nominal method has not been introduced to date. The current model that is being used is similar to the historical analysis model which uses supervised learning (Table 1).
2.1 Particle Swarm Optimization Algorithm Particle swarm optimization (PSO) is a powerful meta-heuristic optimization algorithm inspired by swarm behavior observed in nature such as fish and bird schooling. PSO is a simulation of a simplified social system. The original intent of the PSO algorithm was to graphically simulate the graceful but unpredictable choreography of a bird flock. Under the particle swarm optimization algorithm, it was implemented to solve the nonlinear control problem. It is a population-based search algorithm that analyzes the flock under the given conditions. Significant research has been found where the algorithm was optimized to suit the player’s variables. This helped the algorithm to give a much broader and more efficient result. Under this algorithm, as PSO is a stochastic optimization technique, it mainly depends on the variables and the quantity of the data set.
Fitness Prediction in High-Endurance Athletes and Sports Players Using …
307
Table 1 Comparative analysis of the related works Related work
Concept used
Usage in sports science and analytics
Particle swarm optimization algorithm [5]
Powerful meta-heuristic optimization algorithm inspired by swarm behavior, observed in nature such as fish and bird schooling
The particle swarm algorithm-based High video analysis and recognition of physical coaching can effectively analyze information of video images of high-level athletes in normal training and competition, and analyze the athlete’s movement trajectory by real time, which can change the training mode in which coaches guide athletes’ technical movements only by manual observation and experience and improve the standardization
BP neural network Under the BP model prediction neural network method [6] model, a three-layered feedforward or monitor learning method. It is the most widely used neural network algorithm Historical data analysis method [7]
Complexity
The experimental results show that Medium GA-BP neural network algorithm has a faster convergence speed than BP neural network and can achieve the expected error accuracy in a shorter time which overcomes the problems of the BP neural network
Naïve Bayes Application of classification algorithm Low algorithm based on Naive Bayes on data analysis of performs well for fitness test categorical input and variables compared to numerical variables
2.2 BP Neural Network Model Prediction Method The Back Propagation Neural Network model is a three-layered feedforward or monitor learning method. It is the most widely used neural network algorithm. Research where the BPNN method was implemented for loss projection/injury prediction has been referred. It mainly works on gradient descent taking a function as a nominal point for the output. With a comprehensive model of the BPNN model and gray neural network mapping model, taking the characteristics of the players into consideration the team was able to get an analysis of the players’ injury prediction model. BP neural network method can deal with nonlinear and uncertain problems well and is widely used in the construction of classification, clustering, prediction, and other models. However, BP neural network method has some limitations in fitting nonlinear functions, such as slow convergence speed and easy local optimal convergence rather than global optimal convergence.
308
S. Suvarna et al.
2.3 Historical Data Analysis Model Method Naive Bayes algorithm performs well for categorical input variables compared to numerical variables. This algorithm is mostly used for analyzing and forecasting data based on historical data. Many papers that were adverted to have implemented algorithms from the Naive Bayes family. The most important factor in injury prediction models is the historical data of the players. It is a supervised machine learning concept that uses conditional probability for model prediction.
3 Methodology 3.1 The Process The process begins by selecting the attributes, in this case, the attributes are the most important tools for the type of results that are expected. Since our goal is to help the team by predicting whether a player is fit or not based on his injury status and training stress, it does not include a very medical-oriented approach, the best possible attributes have been put into effect for predicting a very different kind of result, the result itself aims at giving a brief insight on the players current status, therefore attribute selection is more important. These kinds of attributes have not been considered before for any kind of analysis before, that’s what makes this approach different and unique. After finalizing the attributes it’s time for the second step which is data integration, the different attributes selected above are combined, normalized and integrated, which consists of all the attributes. Model selection, model training, model, and evaluation are the steps that follow. These are one of the most common steps which are mandatory in any machine learning model. Figure 1 depicts the working of the model when linear regression is used, Fig. 2 is the working a decision tree classifier or classification, and regression tree (CART) is used. The only difference is that CART gives more importance to attribute selection and using certain methods, the model is trained based on the best attributes and not all like linear regression. The same can be implemented for logistic regression as well.
Fitness Prediction in High-Endurance Athletes and Sports Players Using …
309
Fig. 1 Working of linear regression
Fig. 2 Working of decision tree
4 Implementation 4.1 Technical Prerequisites The main implementation of this concept is solely dependent on the type of algorithm that is being used and the type of environment it is being worked on. These days it is observed that for data analysis and visualization generally, people use the R language and effectively analyze data using ggplot, which is a very common module for data interpretation, although R is famous, Python is the most well-known language to be used for any data science project. Even for the implementation of this idea Python is being considered. The libraries that are going to be used are matplotlib, scikit learn, pandas, etc. There can be several ML models that can used for this kind of a project, the most famous ones being the classification ones like decision tree, regression analysis, random forest implementation, etc. According to me the idea of using classification for predicting an injury is alright, although that doesn’t stop us from venturing out in the unknown, similar results can be expected even after using other kinds of models, if unlabeled data has been found then unsupervised learning comes into practice. The environment that can be used can be any Python
310
S. Suvarna et al.
IDE used like PyCharm, vscode, Jupyter Notebook. For us the most convenient and easy environment to work on is a Jupyter Notebook where the work is executed in code blocks known as cells. A person should have a basic understanding and knowledge in ML models and also be fluent in Python and its libraries.
4.2 Important Keywords There are few terms which are critical in understanding the implemented model. It is generally known that an athlete trains frequently to keep his skills intact and to prevent himself from becoming rusty, reports show that athletes generally spend 7–8 h in training which involve strength training, skill training and gym related activities. Fitness is the most important thing that this entire research is based upon, fitness of an athlete is critical for the athlete to perform and deliver up to expectations. Training related injuries are very common and very frequent also. Training Stress Score (TSS)—Training stress score is a description of how much physical stress a workout places on the body. Tracking TSS allows us to balance adaptive stress with proper recovery, helping to maximize the outcome of our training. Chronic Training Load (CTL)—The technical definition of CTL tells us that CTL is a reflection of consistency, duration and intensity of the last weeks to several months or more of your training [8]. CTL quantifies our training. CTL is a weighted average of your daily TSS for the last 6 weeks, with a greater emphasis placed on more recent workouts. Acute Training Load (ATL)—This is a very important factor as it measures fatigue. This relatively short-term metric is based off a combination of frequency, duration and intensity of the workouts that are performed. ATL is the average daily TSS for the last week, exponentially weighted to emphasize stress from the most recent workouts. Training Stress Balance (TSB)—This is the balance between CTL and ATL and is commonly referred to as form. TSB is an attempt to balance how fit the player is over the long term with how fatigued that the player was from recent workouts.
4.3 Processing and Training This injury prediction model does not involve the usage of complex medical analysis, for example, analyzing an injury in depth using an MRI or a CT scan or involving complex medical knowledge, this model mainly involves predicting an injury on humane/psychological and physical factors such as ATL, CTL, stress, sleep duration which a general layman can understand. People may not realize but the terms
Fitness Prediction in High-Endurance Athletes and Sports Players Using …
311
Fig. 3 Workflow of fitness/injury prediction
mentioned above are one of the most important factors which actually affect the performance and fitness of a player and determine whether that athlete has the capability to play with his entire strength and contribute toward the benefit of the team. Athletes who aren’t fit are considered to be dead assets as they can’t contribute or play until they are completely fit. ATL and CTL are the terms related to training load [9], players often tend to injure themselves during training. Most organized sportsrelated injuries which count to almost 62% occur during practice. CTL, ATL, TSB, and TSS are the terms required to understand before they are carried forward with the implementation of the model. The factors mentioned above express the fatigue and how prepared the athlete is, if his training related activities are going pretty well but his personal life is stressful that also can have an impact, in short this is a more generalized model where a person is judged on lighter aspects [10] like sleep, schedule, stress, training factors, etc. Together these factors are a force to reckon with and can definitely provide a suitable conclusion with the accuracy aiming at the higher end technically. The Fig. 3 describes the basic workflow of the model that is proposed. It starts with gathering the data then evaluating the model and finally predicting an output.
4.4 Code and Model Explanation It is well known that the first step in any data science related work is to import the libraries required. The libraries required here are numpy, sklearn, pandas, logisticRegression, and matplotlib. Each numpy array represents an athlete’s data in the following fields they are sleep (hrs), stress factor, TSS/d, ATL/CTL [11], and BMI. These are the 5 factors considered for the analysis. The model evaluation is an important core step in the model implementation. The logistic regression concept was used. Here, this type is preferred because of the preference given to supervised learning concepts and classification in specific. Any type of model that has just been used to give a fair understanding of how the model
312
S. Suvarna et al.
Fig. 4 Representation of the model creation and evaluation
works can be put into practice. The output of each input is either 1 or 0 indicating it is a classification model where it classifies a fit athlete as 0 and an injured athlete as 1 as explained in Fig. 4. From Fig. 5, it can be observed suitable conclusions can be made from the model made and implemented, in this case a direct comparison has been made between an unfit/player who isn’t ready and a fit/player ready for the match. It can be observed that a player who is prone to more injuries has less sleep, more stress, less TSS, and lesser BMI. In general, also all these factors are correct. Another classification model that can be used is decision tree classifier, decision tree classifiers are one of the most well-known classifiers often giving significant outputs based on the feature importance [1]. The trees are often structured around the importance of the attributes. In this case after importing libraries, training and evaluating models the feature importance graph has been observed in Fig. 6. Here, a, b, c, d, and e are the attributes which are mentioned in Fig. 6. According to our model, it has been observed that the feature importance is given to attribute b and attribute e they are stress and the BMI of the player, the output of the feature Fig. 5 Comparison between an injured and not injured athlete
Fitness Prediction in High-Endurance Athletes and Sports Players Using …
313
Fig. 6 Horizontal bar graph for feature importance
importance is dependent on the input data. It can be clearly concluded that for this particular group of athlete’s stress [12] and BMI is the most important factor. As it is known, the feature importance and attribute selection measures are very significant in decision tree classification, the structure of the decision tree is based on the order in which the attributes are preferred. The various methods that can be seen are Gini index, information gain, gain ratio, etc. The Gini index has been taken to calculate the importance of the attributes, and it turns out that the most significant attribute according to the method is attribute b and e, and the entire decision is made out of that. The Fig. 7 is a representation of the decision tree obtained from the model. The Decision tree is structured upon the Gini Index, the gini index is calculated and the data is split upon that attribute which was found the most important with gini index. Fig. 7 Final decision tree obtained from our model
314
S. Suvarna et al.
5 Conclusion Sports injuries are unavoidable, and to a certain extent, it is an inevitable cost that a sports player has to pay to be in such a competitive and athletic environment. Physical injuries that occur during plays and training have a long-term effect on the players form and plays in the forthcoming matches. Although an injury that is about to happen during the match is not predictable, it can be predicted if a player who has accumulated and strained a set of muscle should be put in a particular match before recovering it. The main agenda of this project is to make a suitable suggestion to management of a team if a player is fit to play and how long would it take before he/ she can be on the field. As mentioned in the implementation, multiple factors have been used that a player is judged on like, training stress score (TSS),which tell about much stress a particular workout puts on a given player, chronic training load (CTL), which gives information on the consistency, duration and intensity of the past records, acute training load (ATL), factor that measures fatigue which the heart of the predicting model that has been built, training stress balance (TSB), which tells about the long-term fitness and recently acquired fatigue due to overstained workout exercises. The data is collected, then a ML model is run on it to produce the accuracy and visualize it to give the management a better understanding. The visualization helps us understand the dynamics in which a particular is placed under. After getting the output which is either 0 or 1 for an athlete, these numbers are forwarded to the management in a better presented and understandable fashion, which can be used by them to put in or put out a player in the line-up based on the fitness. This in return can extend the life of the person as an active player in the long run. Before concluding it just needs to be considered that everything that has an advantage also has an advantage and so does this idea of fitness prediction use data science. The disadvantages can be of this sort. Sometimes what happens is there are not enough facilities and methods to test our model and idea on, as noticed many of these ideas remain ideas and don’t reach the industry due to lack of outreach, technical resources, and data. Another disadvantage can be that it can be applicable to a specific group of people, they are athletes/sports players and not the general public. This kind of research has not been done before and the real time accuracy and the true power of how it helps has not been tested before, generally the method of fitness prediction deals with pure medical data and the results with attributes which can be questioned by many professional sports scientists. The main disadvantage of this method is that it cannot be used to predict future performances because the data used in the prediction is based on past performance (fitness prediction). This means that if an athlete has not competed for a long time, then their current level of fitness will not be accurate. The other disadvantage is that this method does not take into account other factors such as age and gender which can have a significant impact on an individual’s performance (fitness prediction). The future scope of the fitness prediction [13] is not limited and artificial intelligence will play a big role in it. Already, AI has been used to predict heart rates and
Fitness Prediction in High-Endurance Athletes and Sports Players Using …
315
other physiological parameters in athletes [14], and this technology is only going to get better. In the future, AI and data science may be able to predict how an athlete will perform based on past performance, as well as factors such as weather conditions and altitude. This would allow coaches and trainers to make more informed decisions about training and conditioning for their athletes. Those were some of the few limitations and other features considering this model, apart from them this is a field worth exploring and working our brains on as it can help the sports industry in a significant way. Thus, it can be concluded that the contribution of data science, data analytics, machine learning, and data visualization is immense and can provide the best results if used properly and adequately.
References 1. Sindhu C, Vadivu G (2019) Sentiment analysis and opinion summarization of product feedback. Int J Recent Technol Eng 8:59–64 2. Van Eetvelde H, Mendonça L, Ley C, Seil R, Tischer T (2021) Machine learning methods in sport injury prediction and prevention: a systematic review. J Exp Orthop 3. Panch T, Szolovits P, Atun R (2018) Artificial intelligence, machine learning and health systems. J Glob Health 8:020303 4. Nejkovi´c V, Radenkovi´c M, Petrovi´c N (2021) Ultramarathon result and injury prediction using PyTorch. In: 2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS), Nis, Serbia, pp 249–252. https://doi.org/10. 1109/TELSIKS52058.2021.9606348 5. Lei H, Lei T, Yuenian T (2020) Sports image detection based on particle swarm optimization algorithm. Microprocess Microsyst 80:103345. https://doi.org/10.1016/j.micpro.2020.103345 6. Wang J (2021) Analysis of sports performance prediction model based on GA-BP neural network algorithm. Comput Intell Neurosci 2021:1–12. https://doi.org/10.1155/2021/4091821 7. Alfredo YF, Isa SM (2019) Football match prediction with tree based model classification. Int J Intell Syst Appl (IJISA) 11(7):20–28. https://doi.org/10.5815/ijisa.2019.07.03 8. Naglah A et al (2018) Athlete-customized injury prediction using training load statistical records and machine learning. In: 2018 IEEE international symposium on signal processing and information technology (ISSPIT), pp 459–464.https://doi.org/10.1109/ISSPIT.2018.864 2739 9. Perri E, Simonelli C, Rossi A, Trecroci A, Alberti G, Iaia M (2021) Relationship between wellness index and internal training load in soccer: application of a machine learning model. Int J Sports Physiol Perform 10. Sindhu C, Vadivu G (2021) Fine grained sentiment polarity classification using augmented knowledge sequence-attention mechanism. J Microprocess Microsyst 81. https://doi.org/10. 1016/j.micpro.2020.103365 11. Ethiraj B, Murugavel K (2020) Impact of resistance training plyometric training and maximal power training on strength endurance and anaerobic power of team handball players. Solid State Technol 63(3):4259–4271 12. Schmidt MD, Lipson H (2008) Coevolution of fitness predictors. IEEE Trans Evol Comput 12(6):736–749. https://doi.org/10.1109/TEVC.2008.919006 13. Prasanna TA, Vidhya KA, Baskar D, Rani KU, Joseph S (2020) Effect of yogic practices and physical exercises training on flexibility of urban boys students. High Technol Lett 26(6):40–44 14. Gang P, Zeng W, Gordienko Y, Rokovyi O, Alienin O, Stirenko S (2019) Prediction of physical load level by machine learning analysis of heart activity after exercises. In: 2019 IEEE symposium series on computational intelligence (SSCI), pp 557–562.https://doi.org/10.1109/ SSCI44817.2019.9002970
316
S. Suvarna et al.
15. Khaitin V, Bezuglov E, Lazarev A, Matveev S, Ivanova O, Maffulli N, Achkasov E (2021) Markers of muscle damage and strength performance in professional football (soccer) players during the competitive period. Ann Transl Med 16. Zafra A, Rubio V, Ortega E (2015) Sports injuries predicting and preventing sport injuries: the role of stress 17. Bai Z, Bai X (2021) Sports big data: management, analysis, applications, and challenges. Complexity 2021:1–11. https://doi.org/10.1155/2021/6676297
Grade It: A Quantitative Essay Grading System Roopchand Reddy Vanga, M. S. Bharath, C. Sindhu , G. Vadivu, and Hsiu Chun Hsu
Abstract Automated writing evaluation system that employs automated essay scoring technologies generates a rating to the writings which helps students with selfassessment. It provides an efficient and easy way for grading essays, which usually takes abundance of time to be evaluated by human graders; this can be greatly useful for educational institutions like schools and colleges. It provides ratings for essays based on the grammatical errors and topic relevancy by using specific tools implemented in the auto grading system. The generated result will help students grading essays which helps students in self-assessment, this will be useful for the student to understand the mistakes that he makes which will be pinpointed by the system in matter of time, not only that the system also displays the strengths of the user, but also the system will improve the accuracy compared to already existing automated grading system. The system uses scores from multiple features (handcraft features, coherence score, prompt-relevant score, and semantic score). These key features illustrated above will provide enough information on the essay for the student. Keywords Automated writing evaluation · Automated grading system · Prompt-relevance score · Semantic score
R. R. Vanga · M. S. Bharath · C. Sindhu (B) Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur 603203, India e-mail: [email protected] G. Vadivu Department of Data Science and Business Systems, SRM Institute of Science and Technology, Kattankulathur 603203, India H. C. Hsu Department of Information Management, National Chung Cheng University, Minxiong, Taiwan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_28
317
318
R. R. Vanga et al.
1 Introduction As the amount of data continues to grow at an exponential rate, it is increasingly imperative to find ways to extract valuable insights from it in a timely and efficient manner. Furthermore, in educational settings, manually grading assignments for large classes can be a daunting task that consumes a significant amount of time for both educators and students. The shift to online learning brought about by the COVID-19 pandemic has further highlighted the need for an efficient grading system, particularly for essays where various elements such as grammar, spelling, and semantics must be closely evaluated. Implementing ML-based grading systems can not only save time but also provide more accurate and objective evaluations, allowing educators to focus on other important tasks and allowing students to receive their grades in a timely manner. Despite the current limitations of AI-based writing evaluation (AWE) systems, they are still widely used by researchers and academics due to their potential to save time and provide more accurate and objective evaluations. However, the average accuracy of open-source AWEs used in the field is around 75%, and many of these systems are limited in their capabilities [1]. AWEs are commonly used in educational settings to grade competitive exams like the TOEFL, GRE, and other exams from ETS [2]. However, the low reliability of these systems and their lack of comprehensive grading can make them less useful for educators and students. The model aims to address these limitations by increasing the accuracy of an existing AWE model and making it more reliable for frequent use by educational institutions. We aim to detect writing errors by comprehending on various elements such as coherence, argumentation, and overall quality of the essay. By increasing the accuracy and accessibility of AWEs, we hope to make them a more valuable tool for educators and students, allowing them to save time and receive more accurate and detailed grading on their writing. The project’s objective is to enhance the grading process for students by developing a system that can provide more accurate and efficient results. This system utilizes natural language processing techniques and multiple tools such for structure checking, spelling and grammar checking, punctuation mark checking, and stopword removal to analyze and grade essays. The system is trained by using pre-existing data and grading the models based on it. Additionally, the system also checks for relevance between the written essay and the assigned topic before grading. The accuracy of the model will be evaluated by comparing its results with real-world manual evaluations done by professors and instructors. Our project’s report begins with an abstract that provides a general overview of the topic, followed by an introduction that gives a more in-depth explanation of the topic and the reasons for choosing it. We then provide an overview of the steps and components used in creating the system. We discuss the current existing systems and their shortcomings, which we aim to address in the following sections. The methodology and experimentation, along with their results, are discussed in depth in the fourth section of the report.
Grade It: A Quantitative Essay Grading System
319
2 Related Work By adding customized elements, the essay grading system created by a team of researchers [3] sought to provide an accurate evaluation of written essays. Different components, such as punctuation, missing letter detection, spell check, and others were added in these functions. The study’s findings were quite encouraging because they showed how accurately the model could evaluate articles. The system still had issues determining the coherence and semantic characteristics of the essays, despite the successful implementation of these custom features. Hence, handcrafted features were referred from their research work (Table 1). Another system has been found to exhibit remarkable coherence grades. The size of the dataset this system offered, however, was determined to be insufficient, which constrained its ability to match the sophistication and efficacy of the essay grading systems frequently used in academic institutions. Grading systems frequently use vectorization, a trait that has proved crucial in the processing of human language and the interpretation of context and meaning. Vectorization has established itself as a critical component of contemporary technology with a shown track record of success across numerous disciplines, including text categorization, information retrieval, and natural language processing. A hybrid model [5] uses BERT* model, it is mainly used in semantic analysis, which has the ability to process large amount of text and language, which is very helpful while learning of pretrained models, but this model also has its disadvantages as the model requires large amount of data set to train and weight to update. These features will provide a better grading system for the user but then they also have Table 1 Related recent work of essay grader Reference ID
Main feature
Algorithm used
Unaddressed features
[3]
Use of handcrafted features like checking punctuation, missing letters, spelling mistakes, etc.
Rule-based classification
In terms of Cocherence and semantics, it’s not the best model
[4]
Uses text vectorization and semantic analysis
Bi-LSTM-CRF model
Doesn’t detect adversarial essay and essays with permuted sentences
[5]
Scores different aspects of an essay and merges them
CNN+LSTM
Data used in this model is scarce, hence hard to find
[6]
Focusses on critical words and analyzes the logic semantic relationship
LSTM model
Variety of dataset available, hence hard to go with particular
[7]
Uses the BERT for embedding, along with handcrafted features to predict the score
Supervised learning algorithm
BERT model is slow to train due to weight to update, also its an expensive model
320
R. R. Vanga et al.
their disadvantages; the purpose of this review is to identify the drawbacks of other working models and try to update the current model with the features and try to provide a more efficient way of grading.
3 Methodology In this section, the proposed methodology will be introduced to argue how both handcrafted features and deep-encoding features with semantics are used for training in our essay grader to essentially score essays relatively [8]. Briefly, as shown in Fig. 1, during the first stage, post minimal cleaning of the data, handcrafted features score, semantic score, and prompt relevance score are calculated, and in the second stage all scores are concatenated to give a final score. The final score, though just a number, is an amalgam of a broad set of features that are considered in manual grading process. Additionally, tools used in finding few handcrafted features, such as punctuation marks and spelling errors are used to print respective types of errors.
3.1 Data Cleaning The dataset used in the project is automated student assessment price (ASAP). Essays in this dataset have tagged labels, such as “@acb” which must be removed to make the data conducive for finding handcrafted features. The scores or the “y” label are different for each set of the essays causing inconsistencies; hence, all scores must be normalized and can be projected back to actual scores before printing the output.
3.2 Handcrafted Features In Table 2 are few handcrafted features used in [5]. Some of the eight essay types in ASAP data: such essays of set-2 are not only scored based on the writing applications but also the language conventions which encapsulates features such as spelling mistakes, punctuation errors, grammatical errors, paragraphing, word count, etc. Generally, low-scored essays either contain very few words and sentences or be very lengthy. This implies writer’s subpar skills in writing concisely, features like vocab size, i.e., the number of unique words in the essay helps us understand the writer’s vocabulary skill. For grammar, punctuation, and spell check, it only makes sense to use some existing grammar error correction systems or tools as they check for spelling errors, uncased letters, context, punctuation, word repetition, etc. In the model, language Python [9] is used, as it is one of the most accurate systems [10], and ginger it tool is used to collect handcrafted features.
Grade It: A Quantitative Essay Grading System
321
Fig. 1 Workflow of grade it methodology Table 2 Types of features
No. Features 1
Character’s average and range of word lengths
2
Characters average and range of sentence lengths in words
3
Essay length in words and characters
4
Prepositional and comma usage
5
Word count for original essays
6
Average amount of clauses per sentence
7
Mean clause length
8
Maximum number of clauses of a sentence in an essay
322
R. R. Vanga et al.
3.3 Sentence Embedding/Vectorization It’s not exaggeration to say that Google’s Bidirectional Encoder Representations and transform (BERT) gave state-of-the-art embedding results and proved itself to be better than word2vec in NLP tasks such as text classification [3]. Given the achievements of BERT, sentence embedding is performed on the stop words removed texts using BERTbase 2 . BERTbase is a pretrained BERT model trained that has 768 dimensions and 12 encoder layers. “Pooled output” of the BERT result will be used to represent text into a multidimensional embedding. Pooled output: Is representation/embedding of CLS token passed through some more layers Bert pooler, linear/dense, and activation function. It is recommended to use this pooled output as it contains contextualized information of whole sequence.
3.4 Semantic Score Deep semantic features are essential to analyze the semantic soundness of the essay and also for prompt relevance in prompt dependent tasks. Sequential model is used with LSTM to map our data into a low-dimensional embedding and pass it to the dense layer for scoring the essays [11–13]. For every essay ex = {s1 , s2 , …, sm }, where sk indicates the kth (1 ≤ k ≤ m, sk ∈ Rd ) embedded sentence in the essay and d = 768 means the length of sentence embedding. The encoding process of LSTM is described as follows [5, 14]: i = σ Wi .st + U I .h t−1 + bi c
(1)
f t = σ W f .st + U f .h t−1 + b f
(2)
Ct∼ = σ Wc .st + Uc .h t−1 + bc
(3)
Ct = it ◦ ct∼ + f t◦ ct−1
(4)
0t = σ W0 .St + U0 .h t−1 + b0
(5)
h t = 0t◦ tanh(ct ),
(6)
ht —hidden state of sentence st . (W x , U x ) where x = (I, f , c, o) are the weight matrices for the input, forget, candidate, and output gates, respectively. b stands for the bias vectors of the specific gates. Sigmoid function = σ and z means elementwise multiplication. Hence, for every essay, we will get the hidden state set H = {h1 ,
Grade It: A Quantitative Essay Grading System Table 3 Grading values
323
Prompt
Essay
Avg. length
Score range
1
1783
350
2–12
2
1800
350
1–6
3
1726
150
0–3
4
1772
150
0–3
5
1805
150
0–4
6
1800
150
0–4
7
1569
150
0–30
8
723
650
0–60
h2 , …, hm }. H m is passed to dense layer to convert to scalar value. The values from dense layer output are then projected back to their respective ranges according to ASAP dataset as it has different sets of essays with different scoring range mentioned in Table 3.
3.5 Prompt Relevancy Score This score shows how relevant the essay is to the question/prompt. Just like semantic score sequential and LSTM model is used [15]. Prompt = {s1 , …, sk } and essay = {s1 , …, sk } where s refers to sentence are combined into one set of sentences. Now the exact same procedure followed for semantic score will be followed here, i.e., after LSTM hidden layer data is fed to dense layer which gives a scalar output scaled to 0–1 range and score 0 is received by essays that are irrelevant. The scores are projected back to their actual score ranges according to the dataset.
3.6 Training and Evaluation Metrics It is known that essay sets ASAP dataset have different range of scoring; so, to be consistent all scores in each set in the training dataset are scaled down to 0–1 scale and will be scaled back to original values before testing phase. The LSTM neural network used in our model is monolayered with 1024 as the size of hidden layer. It is known that the performance of multilayered and bidirectional LSTM model is subpar from [5]. A dropout proportion of 0.5 is set in order to avoid overfitting of the model. The epochs are set to 50 and are compiled five times once for each fold of cross validation [16].
324
R. R. Vanga et al.
4 Results and Discussion For evaluation we use kappa score as the metric. The final kappa score obtained after five folds is depicted in Fig 2. The confusion matrix to evaluate the performance metrics is in provided with Fig 4. In quantitative grading, agreement between scores is more important than similarity between scores. No two graders are expected to provide similar scores but what’s expected is agreement, i.e., for example in case of three grader’s scores, such as 7.5/10, 8/10, and 7.9/10 are expected as they all are considered as positive scores but scores, such as 8/10, 4/10, and 9/10 are not in agreement, so it may lead to ambiguity. To put it in perspective, the AWE is expected to provide scores that are in agreement with actual human grader’s scores to prove accuracy (Fig. 2). For comparisons, we used results sourced from two-stage learning (TSL) [5] which used CNN, CNN+LSTM and LSTM algorithms, and Bayesian linear ridge regression (BLRR) and support vector regression (SVR) which uses domain adaptation method [17] and various other features such as parts of speech and general handcrafted features. These models are chosen for comparison as they used a vast set of features for grading, proved to be reliable with high kappa scores. Table 4 and Fig. 3 shows us how our model compares with other reliable models. Developing an AWE that works consistently for all prompts of the essays is vital. BERT proved itself to be the finest when it comes to contextual text embedding and the pretrained BERTbase model is vast enough. Handcrafted features were definitely Fig. 2 Final kappa score after five folds
Table 4 Comparison of different results from different papers
Model
Score
EASE (SVR)
0.699
EASE (BLRR)
0.705
CNN
0.726
LSTM
0.756
CNN+LSTM
0.761
TSM
0.821
Grade It: A Quantitative Essay Grading System
325
Fig. 3 Comparison of different results from different papers
necessary for both scoring and providing mistakes in the output, provided the right set of features are chosen in the model with right tools to extract those features. The grade it methodology is definitely a step up when compared to other models, giving us an unparalleled kappa score of 0.928 and is consistent across all types of essays in the dataset. This proves that, using handcrafted features combined with deep semantic features and encoding with BERT gives us unprecedented results. Finally, same method is tested for each of the eight sets of essays separately, and the results were consistent and positive. As shown in Fig. 5, kappa scores were in the range of 0.7–0.8 paving ways for future improvements. ASAP dataset has eight different essay sets as mentioned before each of which not only have a different range of scores but also different set of scores, for example set 7 has four different scores apart from the total score each of which grading ideas, organization, style, and conventions of the writing. Given these wide range of scores, we can alter the model to run the pipeline for every type score of the essay and grade test essays with multiple scores each rating different characteristic of the essay. This also makes the process of giving basic feedback easier, for example if an essay has low conventions score grader can print something like “Limited use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level”. If implemented well this can also help students engage more in writing [1] (Fig. 4).
326
R. R. Vanga et al.
Fig. 4 Confusion matrix
Fig. 5 Set wise kappa score split [5]
5 Conclusion We recommend the two-stage technique (TSM) as it ensures the effectiveness and robustness of AES systems by utilizing both handcrafted and deep-encoded features. The prompt relevant score PR and the semantic score Se in the initial step, two different sorts of scores called Pe are calculated. Both of these scores are based on LSTMT. Second, the handcrafted features score is determined. Third, these three scores are combined. The resulting data is input to a boosting tree model [18], which undergoes additional training and outputs a final score. The results of the studies demonstrate how effective TSM is on the ASAP dataset; our model outperforms numerous trustworthy baselines and performs better than average with a kappa score of about 0.92. In conclusion, both manually created and vector-encoded characteristics are used to provide our system its strength and capabilities.
Grade It: A Quantitative Essay Grading System
327
References 1. Zhang Z, Zhang Y (2018) Automated writing evaluation system: tapping its potential for learner engagement. IEEE Eng Manage Rev 46(3):29–33. 1 Third Quarter, Sept. 2018. https://doi.org/ 10.1109/EMR.2018.2866150 2. Burstein J, Kukich K, Wolff S, Lu C, Chodorows M, Braden-harderss L, Harrissss M (2002) Automated scoring using a hybrid feature identification technique, vol 1. https://doi.org/10. 3115/980451.980879 3. Alqahtani A, Alsaif A (2019) Automatic evaluation for Arabic essays: a rule-based system. In: 2019 IEEE international symposium on signal processing and information technology (ISSPIT), pp 1–7.https://doi.org/10.1109/ISSPIT47144.2019.9001802 4. Yang Y, Xia L, Zhao Q (2019) An automated grader for Chinese essay combining shallow and deep semantic attributes. IEEE Access 7:176306–176316. https://doi.org/10.1109/ACCESS. 2019.2957582 5. Liu J, Xu Y (2019) Automated essay scoring based on two-stage learning 6. Wang Z, Liu J, Dong R (2018) Intelligent auto-grading system. In: 2018 5th IEEE international conference on cloud computing and intelligence systems (CCIS), pp 430–435. https://doi.org/ 10.1109/CCIS.2018.8691244 7. Prabhu S, Akhila K, Sanriya S (2022) A hybrid approach towards automated essay evaluation based on Bert and feature engineering. In: 2022 IEEE 7th international conference for convergence in technology (I2CT), pp 1–4. https://doi.org/10.1109/I2CT54291.2022.9824999 8. Chen H, Xu J, He B (2014) Automated essay scoring by capturing relative writing quality. Comput J 57(9):1318–1330. https://doi.org/10.1093/comjnl/bxt117 9. Näther (2020) An in-depth comparison of 14 spelling correction tools on a common benchmark. LREC. https://aclanthology.org/2020.lrec-1.228 10. Taghipour, Ng (2016) A neural approach to automated essay scoring. EMNLP. https://aclant hology.org/D16-1193 11. Alikaniotis D, Yannakoudakis H, Rei M (2016) Automatic text scoring using neural networks, pp 715–725. https://doi.org/10.18653/v1/P16-1068 12. Farag Y, Yannakoudakis H, Briscoe T (2018) Neural automated essay scoring and coherence modeling for adversarially crafted input, pp 263–271. https://doi.org/10.18653/v1/N18-1024 13. Taghipour K, Ng H (2016) A neural approach to automated essay scoring. https://doi.org/10. 18653/v1/D16-1193 14. Sindhu C, Vadivu G (2021) Fine grained sentiment polarity classification using augmented knowledge sequence-attention mechanism. J Microprocess Microsyst 81 15. Jin C, He B, Hui K, Sun L (2018) TDNN: a two-stage deep neural network for promptindependent automated essay scoring. https://doi.org/10.18653/v1/P18-1100 16. Mayfield, Black (2020) Should you fine-tune BERT for automated essay scoring? BEA. https:/ /aclanthology.org/2020.bea-1.15 17. Wang et al (2022) On the use of Bert for automated essay scoring: joint learning of multi-scale essay representation. NAACL. https://aclanthology.org/2022.naacl-main.249 18. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. CoRR, vol. abs/ 1603.02754. [Online]. Available http://arxiv.org/abs/1603.02754 19. Yannakoudakis H, Briscoe T, Medlock B (2011) A new dataset and method for automatically grading ESOL texts, pp 180–189
Spam Detection Using Naïve Bayes and Trigger-Based Filter Deepali Virmani, Sonakshi Vij, Abhishek Dwivedi, Ayush Chaurasia, and Vidhi Karnwal
Abstract Spam messages are irrelevant messages sent by unwanted organizations or individuals. Short message service (SMS) spam is a major problem in mobile phones. This paper works on tackling this problem by filtering spam and ham messages. The author researched different techniques to do spam filtering including support vector machine, K-Nearest Neighbors, random forest, naïve Bayes, rough sets, and deep learning. In this paper, the author tests the Naïve Bayes algorithm to detect SMS spam along with a set of trigger words that helps in determining the category of the message. A dataset has been used to train the system, and a set of keywords has been prepared to assist the task. The accuracy of the model has been feasible. Along with tweaks to the naïve Bayes algorithm, the paper achieved some favorable outcomes. The authors are currently developing a hybrid machine learning algorithm for spam detection in an effort to improve the approach. Keywords Spam detection · Trigger words · Spam filters · Naïve Bayes
D. Virmani (B) · S. Vij · A. Dwivedi · A. Chaurasia · V. Karnwal Vivekananda Institute of Professional Studies, Technical Campus, Pitampura, New Delhi 110034, India e-mail: [email protected] S. Vij e-mail: [email protected] A. Dwivedi e-mail: [email protected] A. Chaurasia e-mail: [email protected] V. Karnwal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_29
329
330
D. Virmani et al.
1 Introduction Undesirable and unwanted SMS that is sent to several recipients at once is referred to as spam. SMS spam has a significant financial impact on service providers and end users [2, 3]. The significance of this issue’s growth has inspired the creation of several strategies to combat it. Because users read every SMS they get, SMS spam directly affects users, impacting them more than email spam. Filtering is a crucial and well-liked method among those created to stop spam [1–4, 7]. It can be characterized as the automatic separation of SMS into spam and non-spam. This is because while customers open one in every four emails they receive, 82% of SMS messages are viewed within 5 min. Spammers are drawn to SMS because of its significance to mobile phone users. In reality, with the introduction of fresh security risks like SMS phishing, the number of SMS spam has significantly increased recently [4, 11, 18]. Even though modern mobile phones are becoming more and more equipped with a wide range of media messenger applications, SMS is still the preferred choice as a communication medium. However, today’s SMS rate drop also causes an increase in SMS spam, which some people utilize as a substitute for advertising and fraud. As a result, it becomes a significant problem because it can annoy and injure people, and computerized SMS spam filtering is one way to address it (Fig. 1). Compared to emails, SMS uses different characters. In contrast to SMS, mail has some organized information like the subject, mail header, greeting, sender’s address, etc. These make the work of SMS classification much more challenging. This issue highlights the need for creating an effective SMS filtering technique. Accuracy is one of the most difficult aspects of SMS spam filtering. The two main categories for identifying SMS spam are content-based and collaborative methods. The initial is based on user feedback and collective user experience. The other one concentrates on reading through message text. Due to the difficulty in gaining access to utilization and user experience data, this research uses the second, more widely used approach. Various algorithms explored to classify spam messages are mentioned in the literature review [4, 7].
Fig. 1 Basic concept of spam detection
Spam Detection Using Naïve Bayes and Trigger-Based Filter
331
2 Literature Review Before 1990, a few solutions for stopping spam started to appear in response to spammers who started automating the procedure for sending spam SMS. The earliest spam prevention program employed a straightforward method based on analysis by scanning QR code emails for suspicious senders or expressions like “free of charge” and “click here to buy.” Blacklisting and whitelisting techniques were put into use at the Internet Service Provider (ISP) levels in the late 1990s. However, there were some maintenance issues with these techniques. The following table describes various techniques that are being used in recent times (Table 1). The author works on naïve Bayes because of the straightforward nature of this technique, and the high accuracy yielding traits. The key benefit of naive Bayes is the efficiency with which supervised learning can be used to train naive Bayes classifiers. In many real-world applications, naive Bayesian classifiers are utilized for parameter estimates. The naive Bayesian model is simple to construct and does not require complex repetitive parameter estimates, making it very helpful in the field of medicine for identifying heart patients [8]. The Naive Bayesian classifier performs admirably despite its simplicity and is popular because it frequently outperforms more advanced classification techniques [14]. In the proposed model, the author first calculates all the constants used in the Bayes theorem, i.e., P(Spam) and P(Ham), The formula takes alpha to be 1 for Laplace smoothing [9, 15]. P(S\x1 , x2 , . . . , xn ) a P(S)∏n = 1 P(X i \S)
(1)
P(H \x1 , x2 , . . . , xn ) a P(S)∏n = 1 P(X i \H )
(2)
where P(S) denotes the probability of a message being spam, P(H) denotes the probability of a message being a ham, and x 1 , x 2 , …, x n denotes words of incoming messages. Now that all parameters are calculated, the spam filter is initialized. The spam filter can be defined as a function that: • Takes a new message as input (x 1 , x 2 , …, x n ) • Calculates P(S|x 1 , x 2 , …, x n ) and P(H|x 1 , x 2 , …, x n ) • Compares the values of P(S|x 1 , x 2 , …, x n ) and P(H|x 1 , x 2 , …, x n ). After comparing the values, if the system finds P(H|x 1 , x 2 , …, x n ) to be greater than P(Sp|x 1 , x 2 , …, x n ), then the message is classified as ham, if not then it checks if P(H|x 1 , x 2 , …, x n ) is less than P(S|x 1 , x 2 , …, x n ), then the message is classified as spam otherwise the algorithm may request human help.
332
D. Virmani et al.
Table 1 Techniques for spam detection S. No
Technique
Description
1
Deep learning
Deep learning is a branch of machine learning that employs methods inspired by neural networks. Despite having a futuristic sound, it is a crucial component of the development of artificial intelligence (AI). Machine learning uses data reprocessing guided by algorithms, whereas deep learning groups data to produce startlingly accurate predictions to mimic the human brain
2
Naïve Bayes
Naïve Bayes is a straightforward learning algorithm that makes use of the fundamental presumption that characteristics are conditionally independent given the class and the Bayes rule. Although this independence condition is regularly violated in practice, Naïve Bayes frequently yields competitive classification accuracy [8]. This, together with its computational effectiveness and numerous other appealing characteristics, contributes to naive Bayes’ wide use in practice
3
K-nearest neighbour
The K-Nearest Neighbor Algorithm, or KNN as it is more often known, aids in determining the closest group or category to which the new one belongs. Because it uses a supervised learning model, it will determine the group or category of the new one based on the labels we have already provided
4
Machine learning
In machine learning, the data points on which base margins are calculated and maximized are known as support vectors. One of the hyper-parameters to tweak addressed further down is the number of support vectors or the degree of their influence
5
Neural network
Neural network uses a technique that resembles how the human brain works. Neural networks are a set of algorithms that aim to identify underlying relationships in a piece of data [16]. In this sense, neural networks are networks of neurons that can have either an organic or synthetic origin
6
Support vector machine
An approach for supervised machine learning called SVM can be applied to both regression and classification problems. The SVM algorithm looks for a hyperplane in the data space that creates the maximum minimum distance (referred to as margin) between both the objects (samples are collected) that belong to various classes using a training set of objects (samples) divided into classes. The hyperplane that separates the hyperplane is the name given to the hyperplane
3 Proposed Work Using a collection of fundamental training data, the application detects spam that was transmitted to the user’s device successfully. The false positive gained through the model can be used to retrain the system and detect similar messages. This results in more accurate spam detection over time. Two datasets have been used for the proposed model, one for training and one for testing. The dataset and set of keywords have been discussed in the training part of the model along with the algorithm. Figure 2 describes the framework of this model.
Spam Detection Using Naïve Bayes and Trigger-Based Filter
333
Fig. 2 Framework of the model
The model performs data preprocessing on the raw training dataset which involves removing stop words tokenization, lemmatization, stemming, and feature extraction. After the preprocessing, it creates an array of the words that are significant in the message and evaluates the values of the constants required in the calculation. The value of constants for the given training data is found to be • • • • •
p_spam = 0.135 (probability of message being spam) p_ham = 0.865 (probability of message being ham) n_ham = 30,983 (number of words in messages that are ham) n_spam = 9234 (number of words in messages that are spam) n_vocabulary = 5577 (total number of unique words).
4 Implementation The implementation of the model involves training of machine learning model using naïve Bayes and then testing incoming new messages based on the trigger words. The accuracy of the model is then calculated to find out the feasibility of the model.
4.1 Training A dataset of 5572 messages classified into ham and spam was taken from UCI machine learning repository to train the model. We use a series of trigger words {T 1 , T 2 , T 3 , …, T 150 } along with the traditional naïve Bayes to differentiate spam
334
D. Virmani et al.
Fig. 3 Trigger words
messages from ham. Following is the list of selected trigger words. The words have been given a rating according to their severity of being spam words by intuitive logic (Fig. 3). The spam filter checks every word from the processed message against the trigger words. If the word occurs in the list of given keywords the proposed model checks the rating of the keyword and performs the following operations. • For keywords T 1 , T 2 , T 3 , …, T 50 , i.e., the rating range 101–150 the spam filter sets the probability of the message being spam to 10 because the words in that range are highly likely to be spam • For keywords T 51 , T 52 , T 53 , …, T 100 , i.e., the rating range 51–100 the spam filter sets the probability of the message being spam to 5 because the word in that range is likely to be spam • For keywords T 101 , T 102 , T 103 , …, T 150 , i.e., the rating range 0–50 the spam filter sets the probability of the message being spam to 2 because the word is less likely to spam.
Spam Detection Using Naïve Bayes and Trigger-Based Filter
335
After iterating through the entire list of words and setting the probability of spam to 10, 5 or 2 (if applicable), the model applies the traditional naïve Bayes algorithm to the message considering the new probabilities.
4.2 Testing A dataset of 100 SMS has been put together from different mobile phones to test the model. The model takes in the first message—“Free counseling contact 8,753,535,399 to get rich”. The filter first performs data cleaning and removes unnecessary words like “to” and “get” to extract only the words that are significant [“free”, “counseling”, “contact”, “rich”]. Then it checks the remaining words one by one against the trigger words, “free” for example, occurs in the list as well as the message. The proposed model sets the probability of the message being spam to a fixed number (1 in this case as free has a rating of 145 and is highly likely to be spam). The cumulative probability is obtained by repeating the process for every word in the message. The final probability obtained after the trigger word-based search is used in the formula of a Bayesian algorithm. For the second message, “Celebrate this festive season with us, don’t miss this opportunity”. The filter again removes the stop words and cleans the message to extract important features of the message [“celebrate”, “festive”, “season”, “miss”, “opportunity”]. Now the model checks the words against the list of trigger words, but in this case, no word appears in the list. Hence, the initial probability of the message being spam is considered to be zero, and the normal naïve Bayes is applied to the message based on the training.
5 Result and Discussion The same process is repeated for the next 98 messages. After the completion of the process, the output from the system is compared with the actual labels assigned to the SMS. Figure 3 shows that there are 27 true negative cases and 64 true positive cases that correspond to 91 messages being classified correctly. True positive signals are those that are correctly identified as spam, and true negative messages are those that are correctly identified as ham. The graph shows false positives as 3 and false negatives as 6, where false positives are the cases in which the machine recognizes the spam messages as ham and in false negatives, the machine recognizes the ham messages as spam. From the results depicted in Fig. 4, we can calculate the accuracy, precision, and recall values for the model. One parameter for assessing classification models is accuracy. The percentage of predictions that the model correctly predicted is known as accuracy. The precision of the true class (the same class actually belongs to the same class in prediction) and the total number of items belonging to that class in prediction is the precision for that class. Or, to put it another way, precision is the
336
D. Virmani et al.
accuracy of this class’s classification. Recall for a class is the proportion of true classes (items belonging to the same class in real and prediction) to all of the items belonging to that class in actual or to put it another way, recall is completeness in this case. Accuracy of the model =
TP + TN TP + FN + TN + FP
(3)
Precision for spam messages =
TP FP + TP
(4)
Precision for ham messages =
TN FN + TN
(5)
Recall for spam messages =
TP FN + TP
(6)
Recall for ham messages =
TN FP + TN
(7)
Precision for spam messages for the given dataset is calculated to be 0.95 and for ham messages 0.81. Recall for spam messages is 0.91 and for ham messages 0.90. Comparison of the model is done with the traditional Naïve Bayes on the basis of three performance parameters—accuracy, precision, and recall—which are used to analyze the results. Table 2 and Fig. 5 provide the values of these performance parameters for the instances stated above. Using the exciting methods of the naïve Bayesian model, a test is performed on a synthetic dataset, and the accuracy for the same is found out to be 88.50% [19, 20]. In comparison, the model performed well with a final accuracy of 91%. With the addition of trigger words, the precision to find spam messages is increased, and the recall value is almost similar for both instances. The novelty of the proposed works Fig. 4 Precision matrix
Spam Detection Using Naïve Bayes and Trigger-Based Filter
337
Table 2 Comparison using performance parameter Algorithm
Accuracy (%)
Precision
Recall
Trigger-based filter with Naïve Bayes
91.00
0.95
0.915
Simple Naïve Bayes
88.50
0.92
0.913
Fig. 5 Graph of performance parameters
helps in identifying messages which are highly likely to be spam and pose a bigger threat to the user. It also increases the number of ham messages being classified as spam. When evaluated on a synthetic dataset, the model generally outperformed the conventional naive Bayes. Though the margins were minimal. With improvement in feature selection and trigger words, the margins might be improved.
6 Conclusion A naïve Bayes algorithm-based spam filtering program for mobile devices was suggested, and it properly identified new incoming messages from users. Messages are categorized by comparing each word against a set of trigger words selected by intuitive logic and then applying naive Bayes on them based on the rating of the word. The result of this research shows that the model produces an accuracy of 91%. Further, the accuracy of the proposed model can be improved by increasing the
338
D. Virmani et al.
number of trigger words and assigning the rating of words precisely. More investigation is being done into the best ways to use hybridized machine learning algorithms to detect spam in mobile devices. This is required to get more reliable and effective results. The limitation of the model is the increased time complexity with the added trigger word check. Also, the trigger words need to be changed with time along with the ratings assigned to them to tackle a new variety of spam messages, and there is no feasible way to automate the process of assigning trigger words and their rating. This model could be used to do personalized spam detection and create trigger words from different languages to tackle the problem of spam detection in other languages.
References 1. Xia T, Chen X (2020) A discrete hidden Markov model for SMS spam detection. Appl Sci 10(14):5011 2. Liu X, Lu H, Nayak A (2021) A spam transformer model for SMS spam detection. IEEE Access 9:80253–80263 3. Sheikhi S, Kheirabadi MT, Bazzazi A (2020) An effective model for SMS spam detection using content-based features and averaged neural network. Int J Eng 33(2):221–228 4. Ghourabi A, Mahmood MA, Alzubi QM (2020) A hybrid CNN-LSTM model for SMS spam detection in Arabic and English messages. Future Internet 12(9):156 5. Bosaeed S, Katib I, Mehmood R (2020, April) A fog-augmented machine learning-based SMS spam detection and classification system. In: 2020 fifth international conference on fog and mobile edge computing (FMEC). IEEE, pp 325–330 6. Nyamathulla S, Umesh P, Venkat BRNS (2022) SMS spam detection with deep learning model. J Positive School Psychol 7006–7013 7. Kural OE, Demirci S (2020, October) Comparison of term weighting techniques in spam SMS detection. In: 2020 28th signal processing and communications applications conference (SIU). IEEE, pp 1–4 8. Maram SCR (2021) SMS spam and ham detection using Naïve Bayes algorithm 9. Asaju CB, Nkorabon EJ, Orah RO Short message service (SMS) spam detection and classification using Naïve Bayes. In: Conference organizing committee, p 62 10. Gupta SD, Saha S, Das SK (2021, February) SMS spam detection using machine learning. J Phys Conf Ser 1797(1):012017. IOP Publishing 11. Verma RK, Gupta S, Saini Y, Libang A, Jain MM Content-based SMS spam detection 12. Hanif K, Ghous H Detection of SMS spam and filtering by using data mining methods: literature review 13. Popovac M, Karanovic M, Sladojevic S, Arsenovic M, Anderla A (2018, November) Convolutional neural network-based SMS spam detection. In: 2018 26th telecommunications forum (TELFOR). IEEE, pp 1–4 14. Yang FJ (2018, December) An implementation of Naive Bayes classifier. In: 2018 international conference on computational science and computational intelligence (CSCI). IEEE, pp 301– 306 15. Peng W, Huang L, Jia J, Ingram E (2018, August) Enhancing the Naive Bayes spam filter through intelligent text modification detection. In: 2018 17th IEEE international conference on trust, security, and privacy in computing and communications/the 12th IEEE international conference on big data science and engineering (trustcom/bigdatase). IEEE, pp 849–854 16. Wei F, Nguyen T (2020, October) A lightweight deep neural model for SMS spam detection. In: 2020 international symposium on networks, computers and communications (ISNCC). IEEE, pp 1–6
Spam Detection Using Naïve Bayes and Trigger-Based Filter
339
17. Osa E, Elaigwu VO (2021) Modeling of a deep learning based SMS spam detection application. Money 3(4):163–173 18. Marsault B, Gigot F, Jagorel G (2020) SMS spam detection. Text analysis and retrieval 2020 course project reports 19. Gayathri A, Aswini J, Revathi A (2021) Classification of spam detection using the Naive Bayes algorithm over k-nearest neighbors algorithm based on accuracy. Nveo-natural Volatiles Essential Oils J (NVEO) 8516–8530 20. Marathe AP, Agrawal AJ (2020) Improving the accuracy of spam message filtering using hybrid CNN classification. Int J Emerg Technol Eng Res (IJETER)
Security Enhancer Novel Framework for Network Applications Vishal Kumar
Abstract In order to extract, analyze, and process the huge amount of datasets, most of the users use hired cloud services rather than their own HW/SW setup. Thus, a large dataset is needed to transfer over a cloud network, and a large amount of information flow attracts malicious intruders who attack the communication network to take private information for their benefit. High-level security for the user’s information flowing over the communication networks becomes the most crucial, and lots of data security algorithms are present that could proffer appropriate approaches to secure tasks to a group of VMs in many applications, yet the job security for large datasets has not been achieved. So, for big data processing, a secure job scheduling with minimal ET in the MapReduce framework is essential. Clients can use the hired resources for the time they need to help them with cost-cutting and ease of use. Data security is required while scheduling the sub-tasks generated from the MapReduce function to the VMs. A secure technique is the demand of time so that users’ private data can flow freely machine to machine over the cloud’s VMs without any risk from malicious hackers and intruders. An SHA-256 cantered Elliptical Curve Cryptography (SHECC) is employed to proffer data security (DS) through the proposed model. With this framework, it can flexibly create integrative security protection against diverse threats and attacks. It reduces the data encryption and decryption time and also enhances the security level. Keywords Cloud computing · Elliptical curve cryptography · MapReduce · Virtual machines · Data Security
V. Kumar (B) Department of CSE, Chandigarh University (Mohali), Chandigarh 140413, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_30
341
342
V. Kumar
1 Introduction The prevailing security techniques like RSA, DH, and Elagamel failed for MapReduce applications for encryption and decryption of big data within time constraints and for a desirable security level. A novel security enhancement framework has been proposed in order to make improvements. The major contributions of this research study include the following: • To make data security process time efficient for encryption and decryption of data, allow secure communication over public networks. • To make communication link most reliable by enhancing network security level. • To reduce storage space requirements with the use of shorten key length for high-level data security. Data is growing rapidly on a peta-scale level every day. The processing and security of this massive amount of data flowing over the network are extremely difficult. Without an effective and efficient security mechanism, the MapReduce programming model and other network systems cannot be 100 percent successful [1-3]. Changing the data into an unreadable form (encryption) before placing it on the network node is a mandatory process. These encryption procedures should be too hard to crack [4, 5]. After secure transmission of encrypted data, it is required to be decrypted at the receiver machine to receive the original information [6, 7]. Figure 1 shows the encryption and decryption procedures. Asymmetric key algorithms use two keys separately for encryption and decryption of data. The cryptographic procedures can be divided into two categories: single-key and double-key algorithms. A single key (the public key) is used in symmetric key algorithms for both encryption and decryption of data [8-10]. Asymmetric algorithms are regarded more secure than symmetric algorithms since they require two distinct keys. The transportation of the public key from sender to receiver is a dangerous activity because any intruder on the network path between sender and receiver might take the public key and use it for their own reasons [11]. There are several symmetric algorithms, such as blowfish, MARS, triple DES, and others, that perform well in network data security applications but not well in MapReduce applications [12]. The length of the key determines the strength of the encryption/decryption technique; the security level is directly proportional to the length of the key [13]. The prevailing algorithms like RSA, Elagamal Diffie Hellman, and elliptic curve cryptography are being used for data security in MapReduce applications. The elliptical curve cryptography method of data security provided a higher level of protection while using a shorter key length [24, 25]. The hash-coded algorithms SHA-256 and SHA-512 take a message and its length as input and produce a fixed-length output. They are used for high-security password and credit card information protection [14, 15]. The proposed data security method combines the hash code technique (SHA-256) and elliptical curve cryptography (ECC) to provide two-layer security for MapReduce output data moving across the network. The following are the various sections of this research study:
Security Enhancer Novel Framework for Network Applications
343
Fig. 1 Encryption decryption process
The existing work review is explained in Sect. 2, the suggested data security framework is elaborated in Sect. 3, the effectiveness of the proposed work is shown in Sect. 4, a comparison with existing methods is made in Sect. 4, and the conclusion is presented in Sect. 5.
2 Literature Survey A number of the methods for continuing in isolation in the cloud have been stated by many authors, and few of them are elucidated below. Rao et al. [16] introduced the MR-EIDS technique called MapReduce-based ensemble intrusion detection system. The motive of this system is to find malicious users and intruders in real-time datasets. It analyzes the huge datasets to find interesting patterns to identify malicious, unauthenticated users that try to use the datasets illegally. The huge datasets are divided into smaller, manageable sets, and encryption is applied to each data node separately using a 128-bit key for security purposes. All keys are stored in memory buffers. The proposed model identifies the data attacks with an accuracy of 98.4%. Sudhakar et al. [1] offered a combination of fuzzy c-means clustering and Knearest neighbor (KNN) clustering technique for preserving privacy over clouds in big data. It operates in three phases: clustering, the map and reduce phase, and classification. It used fuzzy C-means for clustering of data and the KNN method for classification in the last step. It improved the classification of data and enhanced privacy through an effective convolution process. It achieved clustering accuracy of 73%. With effective and efficient use of clustering and classification techniques, a high level of data security has been achieved.
344
V. Kumar
Zhao et al. [4] proposed a three-factor authentication procedure using elliptic curve cryptography. It involves a series of steps: initialization, registration, user login, authentication, and the user password update stage. During initialization, an elliptic curve over a finite field was selected. A secure path of communication has been chosen during the registration phase. Users log in to the system for their authentication using their IDs and passwords. In the final stage, users can update their passwords every time they enter and use the data. It provides a secure system with an accuracy of 98%.It is an attack-proof system that keeps intruders away from the datasets. Ullah et al. [17] offered a new TSS that needs less information and provides more security. This threshold signature scheme was based on two elliptic curve points, but most other such schemes used only one. ECC is a recent invention for providing a high level of security with a reduced key length. The key length of 256 bits and more was found to be too hard to crack. The elliptic curve-based discrete logarithm problem is a great challenge to breach the security level of the system. Yeh et al. [5] offered CP-ABE. It is an encryption technique based on selected attributes of the dataset but not applied to the entire block. It uses 128-bit keys and symmetric ciphers for encryption and decryption. It is time-efficient and enhances security levels as well because it applies to a specific part of the block that is needed to encrypt based on some predefined criteria. Its implementation complexity is higher and completely depends on predefined criteria. Kumari et al. [18] suggested two-factor authentication using one-time passwords and hash coding like SHA-256, SHA-384, and SHA-512 techniques to code the passwords that help to prevent them from being attacked by network intruders. Onetime passwords and secure hash-coded algorithms never generate the same output even for the same input, so it becomes difficult to guess or hack the codes. Twoway or multi-way authentication can reduce the risk of being hacked by malicious network users. Liu et al. [19] suggested an assembly implementation of SHA-512. It is a strong cryptographic function that has a high resistance against collision and pre-image attacks. With the help of four-sigma operations and careful use of memory accesses, it optimizes the compression function. This optimization approach not only accelerates the SHA-256 but can also be applied to microcontrollers where there are a lot of constraints concerning memory space and device speed. It can also be applied to applications involving IOT devices. Sadkhan et al. [12] proposed an integrating technique involving SHA-256, AES, and DH in MANETS. It used the hybridization of traditional security techniques with hash-coded techniques like SHA-256 and an enhanced two-layer security level. It is simulated in an NS2 environment, and it enhances the data security on wired and wireless machines. Zbakh et al. [10] introduced a tag-based MapReduce framework for secure communication over the cloud. The major motive of this study is to provide highlevel security with minimal overhead. When it passes through a man in the middle attack, it was not vulnerable and proved reliability. With a high level of data replication, it manages the integrity with virtual TPM. It offered flexible key generation
Security Enhancer Novel Framework for Network Applications
345
and user-defined security policies according to the type of migration. This design will need to be validated with further experimental setup. Dang et al. [11] offered a secure mechanism for the security of the private data of users. The big data was divided into smaller sets using the MapReduce framework for easier processing. Data can be on a storage device, in processing, or moving over the cloud network. It can be stolen from anywhere. It offered a mechanism to protect the data when it was processed using the MapReduce framework. Different pieces of data need different levels of security. So, the reliability level of each device involved in big data transmission needs to be computed, and every device is assigned weight as per its trust level. Using a bipartite graph, data requiring a high level of security is assigned to the most weighted resource. It offered a security level of 94.5% in big data applications.
3 Data Security Using SH-ECC It is essential to protect the security-critical MR application from being attacked by establishing an appropriate security service, as there are abundant security problems in the distributed computing system. Snooping, changing, and spoofing are common threats in cloud environment. A security model, i.e., SH-ECC, is employed to protect the distributed system from these attacks. A public key cryptosystem wherein the task is chiefly encrypted using the sender’s public key is elliptical curve cryptography (ECC). Yet, until now, the scheduling of MR job’s security limitations across multiple nodes has not been well addressed. Thus, a secured job scheduling system for MR jobs utilizing SH-ECC has been proposed. The proposed model enhances the security level of task execution by providing two-layer security. First, the elliptical curve is used to encrypt the task, and then, SHA-256 hash coding is used to strengthen the security level. In the next step, this set of tasks needs to be scheduled onto an available number of VMs. Before reaching a VM, these tasks have to cover network paths where intruders may steal the useful and confidential information of task users. In between the MapReduce function’s output and set of VMs, there is a security layer to protect the user’s data from intruders. In this model, the MR task is divided into a large number of sub-tasks, and then, the reducer function reduces the large set of sub-tasks into its subset. Figure 2 shows that the output of the MapReduce function is a set of tasks (T1, T2,…,TP) that are assigned to the function using elliptical curve cryptography to encrypt the tasks that are decrypted on the receiver side. Yet, the execution is more complex and engenders implementation errors, which lessen security. The task, after that, goes through the hash-coded function SHA-256, which inserts an additional hash-coded value to the output of the ECC function. Finally, encrypted tasks are transferred to the available virtual machines.
346
V. Kumar
Fig. 2 Data security architecture
3.1 Merged Procedure: ECC and SHA-2 SHA-256 (series 2) is merged to ameliorate the security in ECC. The hashing method that guarantees the integrity and authenticity of the information is SHA-256. By employing the public and private keys of the sender and receiver, a confidential key is generated in the proposed model. Afterward, both are merged using a hash-coded function utilizing SHA-256. This hash-coded key is called the confidential key. It should be wielded together with encryption and decryption of data to offer high security. The following are the steps entailed in the SH-ECC method. Step 1: Let P A be the public key of the sender (A) and B be the private key of the receiver (B). The task is sent via n number of nodes (o N = o1 , o2 , .. .. ., on ). Select a random node ι from 1 to (n − 1). The public key of A is computed as PA = ι × z.
(1)
The point on the elliptic curve is denoted by z in Eq. (1). The elliptic curve equation is given as, z 2 = t 3 + at + b.
(2)
Step 2: Here, the public key of A A (PA ) and the private key of B B is merged together. It is expressed as in Eq. (3) = PA + B .
(3)
Security Enhancer Novel Framework for Network Applications
347
Step 3: Next, to obtain the confidential key (h), the combined keys () are hash coded utilizing SHA-256. It is mathematically depicted as in Eq. (4) h = sh().
(4)
Here, a cryptographic hash function like SHA-256 is sh(). Step 4: The input task φ j is encrypted utilizing the public key of A and h after confidential key generation. The task is encrypted as, E 1 = ι ∗ z,
(5)
E 2 = φ j + ι ∗ z + h,
(6)
where the ciphertext of A is indicated by E 1 , E 2 as in Eqs. (5) and (6). Step 5: Then, the transmission of encrypted tasks securely to the receiver occurs. The task is decrypted utilizing the private key. B of the receiver (B) on the receiver side. This is symbolized in Eq. (7). φ j = E2 + B ∗ E1 − h.
(7)
Therefore, the node receives the original task φ j . Then, for processing, it is further scheduled on the VMs. With the proposed procedure, task can be transmitted security over the cloud-distributed environment.
3.2 Abstract Representation of SH-ECC Figure 2 shows the logical solution of placing n tasks securely over the m number of virtual machines. Every task is given two-way security. • Encrypt the data using public key of the sender. • Hash-coded key is added finally to the encrypted task to enhance security. With this security mechanism, the task is securely sent to the VM where it is decrypted using the private key of the receiver (Fig. 3).
3.3 Pseudocode The line-wise explanation of pseudo-code of Fig. 4. Line 1: the start of the procedure. Line 2: Loop for n no. of tasks.
348
V. Kumar
Fig. 3 Flowchart (logical abstraction)
Line 3: Take the private key of the receiver that is a random no. r between 1 and n. Line 4: Generate the public key by revolving r times the point Z on the elliptical curve. It is like throwing a ball r times on the wall and no one can judge about the first mark on the wall even if the final mark is known. Private key has been used to produce the public key. Line 5: Public and private keys are added. line 6: Find hash code of the result obtained in line 5. Line 7: Encrypt the task. It has two parts. • Revolve elliptic curve point r 2 times where r 2 is a random no. between 1 and n. This part is used only for decrypting the data on line 9. • After revolving it r 2 times and adding it the hash key for enhanced security.
Security Enhancer Novel Framework for Network Applications
349
Fig. 4 Pseudocode for proposed SH-ECC
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Begin for (i=1 to n)
Transfer end for End
Line 8: Securely transfer the encrypted task to the receiver end. • Encrypt the task by adding a public key. Line 9: To decrypt the data. • Use of receiver’s private key. • Use of first part of encrypted msg. Line 10: End the loop after sending n tasks securely. Line 11: End the procedure.
4 Results and Discussion The effectiveness of the proposed security enhancer novel framework is compared to existing methods to demonstrate the efficacy of the proposed strategy. The publicly available dataset is where the input data is gathered. The Java run-time environment is used for it. The superiority analysis using real data is demonstrated below. The suggested method’s performance is evaluated using the following parameters. • Encryption/decryption time. • Security level achieved. • Throughput (processing time).
4.1 Encryption/Decryption Time Here, cantered on encryption time, decryption time (DT), the superiority of the proposed SH-ECC technique wielded for DS is assessed. By analogizing the
350
V. Kumar
proposed system to the prevailing techniques, the performance assessment is made. ECC, Rivest–Shamir–Adleman (RSA), Elagamal, and Diffie–Hellman (DH) are the prevailing techniques. Figure 5 portrays the performance measure through encryption time. It is obvious that the SH-ECC model needs a lower time for data encryption than the other methodologies. The performance evaluation of the SH-ECC method by analogizing the DT of the prevailing techniques is portrayed in Fig. 6. The proposed framework outperforms taking least time for decryption process as compared to existing methodologies.
Fig. 5 Superiority analysis based on encryption time (ms) of the suggested framework in comparison with the existing frameworks
Fig. 6 Superiority analysis based on decryption time (ms) of the suggested framework in comparison with the existing frameworks
Security Enhancer Novel Framework for Network Applications
351
Fig. 7 Security level analysis of the suggested framework in comparison with the existing frameworks
4.2 Security Level Figure 7 showcases the SH-ECC system’s security level. As analogized to the prevailing methods, it is suitable that the SL of the proposed model increases as of Fig. 7 96.34% is the SL of the SH-ECC model. However, when weighed against the SH-ECC system, the prevailing methods have low SL, i.e., ECC (93.29%), RSA (90.78%), Elagamal (87.19%), and DH (85.49%). This implies that the SH-ECC model has a higher level of security than the prevailing methodologies. Therefore, the SH-ECC system’s performance is superior to the traditional methods. Figure 7 shows the better accuracy of the suggested framework for security level when compared with all prevailing methodologies. The proposed model has offered greater security and protection.
4.3 Throughput The proposed framework’s throughput is measured by calculating the time required to encrypt and decrypt a data file of a given size. Throughput is inversely proportional to the time in units required to encrypt or decrypt the data file. More time means less throughput. Table1 displays the throughput performance of the proposed DS framework. Files of various sizes are taken for encryption and decryption, and the time required for each file is recorded. It also computes the average time required for all files of varying sizes. The proposed model outperforms all existing data security techniques. The time required for each file, as well as the average time for all files, was found to be significantly less than that required by other techniques. When compared to all existing and prevalent data security techniques, the proposed technique provided
352
V. Kumar
the highest level of throughput. However, the achieved throughput level is acceptable, but it can be improved further by hybridizing most prevailing DS techniques. The throughput of the model is calculated by the ratio of total size of all files and total time taken for encryption/decryption of all files. The proposed model shows better performance for both encryption and decryption of data files, and hence, throughput performance of the suggested framework is best among all techniques as shown in Table 1.
5 Conclusion In this paper, utilizing SH-ECC algorithms, security enhancer novel framework is proposed. Utilizing the MRF, the large dataset is partitioned and reduced using the MRF. Further, to protect the data from attack, security preservation is performed. Next, for examining the proposed system’s performance, the performance and comparative analyses are done by analogizing the outcomes to the prevailing systems. A security level of 96.34% is achieved by the proposed secure job scheduling framework, and the ET needed for the job scheduling framework is 4.345 s and 5.879 s, respectively, as of the experimental investigation. When compared with most commonly used security methods, proposed method takes less time for encryption and decryption processes. It shows better efficiency with time optimization to convert plaintext into ciphertext and viceversa. The proposed model achieves better outcomes by analogizing it with the prevailing systems. Therefore, it is deduced that the proposed model is superior and more efficient than the other prevailing methodologies. The ECC technique helps to provide a high-level data protection with a shorter key length, and the inclusion of SHA-256 provides enhanced security. By utilizing hybridization of most prevailing security methods for job scheduling, the work will be improved in the future. 20, 21
DH
Proposed
25.86 KB/ ms
Throughput
23.52 KB/ ms
281
300
10 MB
135
251
143
203
2 MB
98
5 MB
90
24.65 KB/ ms
295
198
158
102
25.53 KB/ ms
290
187
132
96
31.80 KB/ ms
211
176
101
78
23.71 KB/s
299
210
152
98
25.71 KB/ ms
295
245
138
112
RSA
Decryption time (ms)
Elagamel
ECC
RSA
Encryption time (ms)
ECC
1 MB
File size
Time in ms for encryption/decryption of data files of different sizes
Table 1 Throughput comparison of proposed versus prevailing techniques
Elagamel
23.34 KB/ ms
312
192
157
110
DH
23.34 KB/ ms
332
198
130
111
Proposed
32.37 KB/ ms
205
170
99
82
Security Enhancer Novel Framework for Network Applications 353
354
V. Kumar
References 1. Sudhakar K, Farquad MAH, Narshimha G (2019) Effective convolution method for privacy preserving in cloud over big data using map reduce framework. IET Software 13(3):187–194 2. Tukkoji C, Shyamala B, Nadhan AS, Ramana PL (2020) Identify and overcome data processing challenges in cloud using map-reduce. Technology 11(11):737–747 3. Cheng H, Dinu D, Großschädl J (2019) Efficient implementation of the SHA-512 hash function for 8-bit AVR microcontrollers. In: International conference on security for information technology and communications. Springer, Cham, pp 273–287 4. Zhao X, Li D, Li H (2022) Practical three-factor authentication protocol based on elliptic curve cryptography for industrial internet of things. Sensors 22(19):7510 5. Yeh H, Chen T, Liu P, Kim T, Wei H (2011) A secured authentication protocol for wireless sensor networks using elliptic curves cryptography. Sensors 11:4767–4779 6. Kh-Madhloom J (2022) Dynamic cryptography integrated secured decentralized applications with blockchain programming. Wasit J Comput Math Sci 1(2):21–33 7. Derbeko P, Dolev S, Gudes E, Sharma S (2016) Security and privacy aspects in MapReduce on clouds: a survey. Comput Sci Rev 20:1–28 8. Srinivas J, Das AK, Kumar N, Rodrigues JJPC (2020) Cloud centric authentication for wearable healthcare monitoring system. IEEE Trans Dependable Secur Comput 17:942–956 9. Hu W, Qian J, Li X (2017) Distributed task scheduling with security and outage constraints in MapReduce. In: 2017 IEEE 21st international conference on computer supported cooperative work in design (CSCWD). IEEE, pp 355–359 10. Bissiriou CA, Zbakh M (2016) Towards secure tag-MapReduce framework in cloud. In: 2016 IEEE 2nd international conference on big data security on cloud (BigDataSecurity), IEEE international conference on high performance and smart computing (HPSC), and IEEE international conference on intelligent data and security (IDS). IEEE, pp 96–104 11. Dang TD, Hoang D, Nguyen DN (2019) Trust-based scheduling framework for big data processing with MapReduce. IEEE Trans Serv Comput 15(1):279–329 12. Sadkhan SB (2021) elliptic curve cryptography-status, challenges and future trends. In: 2021 7th international engineering conference “research and innovation amid global pandemic” (IEC). IEEE, pp 167–171 13. Kaaniche N, Laurent M (2017) Data security and privacy preservation in cloud storage environments based on cryptographic mechanisms. Comput Commun 111:120–141 14. Mahmood K, Chaudhry SA, Naqvi H, Kumari S, Li X, Sangaiah AK (2018) An elliptic curve cryptography based lightweight authentication scheme for smart grid communication. Futur Gener Comput Syst 81:557–565 15. Scholz D, Oeldemann A, Geyer F, Gallenmüller S, Stubbe H, Wild T, Carle G et al. (2019) Cryptographic hashing in P4 data planes. In: 2019 ACM/IEEE symposium on architectures for networking and communications systems (ANCS). IEEE, pp 1–6 16. Rao MSUM, Lakshmanan L (2022) Map-reduce based ensemble intrusion detection system with security in big data. Procedia Comput Sci 215:888–896 17. Ullah S, Zheng J, Din N, Hussain MT, Ullah F, Yousaf M (2023) Elliptic curve cryptography; applications, challenges, recent advances, and future trends: A comprehensive survey. Comput Sci Rev 47:100530 18. Kumari S, Karuppiah M, Das AK, Li X, Wu F, Kumar N (2018) A secure authentication scheme based on elliptic curve cryptography for IoT and cloud servers. J Supercomput 74(12):6428– 6453 19. Liu Z, Seo H, Castiglione A, Choo KKR, Kim H (2018) Memory-efficient implementation of elliptic curve cryptography for the Internet-of-Things. IEEE Trans Dependable Secure Comput 16(3):521–529
Security Enhancer Novel Framework for Network Applications
355
20. Seta H, Wati T, Kusuma IC (2019) Implement time based one time password and secure hash algorithm 1 for security of website login authentication. In: 2019 international conference on informatics, multimedia, cyber and information system (ICIMCIS). IEEE, pp 115–120 21. Hakeem SAA, El-Gawad MAA, Kim H (2020) Comparative experiments of V2X security protocol based on hash chain cryptography. Sensors 20(19):5719
Image Tagging Using Deep Learning Rajeswara Rao Duvvada, Vijaya Kumari Majji, Sai Pavithra Nandyala, and Bhavana Vennam
Abstract Deep learning is the process of training computers to think critically by implementing brain-inspired architectural design. An explanation in written form is given for each image in a collection through the process of image captioning. In the area of deep learning, it has been a significant and imperative effort. Image captioning has many applications. A application was made using picture captioning technology to help those with little or no vision. Image captioning may be considered as an end-to-end Sequence-to-Sequence problem since it converts photos, which are conceptualized as a series of pixels, into a sequence of words. This requires processing of both the words or assertions and the pictures. Recurrent neural networks (RNNs) are used to build the feature vectors for the linguistic component, whereas convolutional neural networks (CNNs) are used to create the feature vectors for the image component. We are working with information that is both semantically and visually based. In particular, we require a linguistic Long Short-Term Memory (LSTM) model to construct a word sequence. This prompts the query of whether to add vectors of image data to the language model. By searching for patterns in the pictures, convolution neural networks (CNNs) are highly useful for identifying objects, people, and settings in photographs. Considering datasets given during model development, Natural Language Processing (NLP) allows for the tagging of images with English keywords.
R. R. Duvvada · V. K. Majji (B) · S. P. Nandyala · B. Vennam Department of Computer Science and Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India e-mail: [email protected] R. R. Duvvada e-mail: [email protected] S. P. Nandyala e-mail: [email protected] B. Vennam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_31
357
358
R. R. Duvvada et al.
Keywords Deep learning · Convolution neural network · Recurrent neural network · Image captioning · Long Short-Term Memory · Natural language processing
1 Introduction The concept of automatically creating human-like explanations for the photos is referred to as automatic picture captioning. This job is very significant and has significant industrial and practical application. Practical applications of automatic image captioning may be found in a wide range of crucial industries, including manufacturing, infrastructure, monitoring, and many more. It is not only a highly significant but also a very challenging issue. It may be found in a wide range of crucial industries, including manufacturing, infrastructure, monitoring, and many more. Automatic picture captioning aims to understand the complete scene as well as the connections between the items, as opposed to classic object detection and image classification tasks that merely sought to identify the objects present in the pic. After completely understanding the incident, it is required to write a description of it in the form of a human. The sequential completion of the main tasks results in automatic picture captioning. Parameters are first retrieved. Several elements from a photo are found after precise feature extraction. The connections between the items must then be established. Once connections between such picture objects have been found and objects have been identified, it is crucial to create the text description, which entails rearranging characters into a dictionary that is appropriate given the relationships between the elements in the image. We choose to apply the convolutional neural network (CNN) and Long ShortTerm Memory (LSTM) techniques to build a model that generates a caption. A convolutional neural network is used to compare the target picture to the training images, which are part of a big dataset. Using the training data, this model provides a fair description. We require an encoder, and we utilize CNN as an encoder, to extract features from pictures. We utilize LSTM to decode the produced picture description. VGG-16 is a deep CNN of 16 layers, 13 of which are convolutional and 3 of which are completely linked. It is capable of achieving very strong performance on image classification tasks because it was trained on the ImageNet dataset, which has 1.2 million pictures and 1000 classes. Convolutional Neural Network (CNN) is a Deep Learning technique that takes in an input image and prioritizes various traits and objects in the image (learnable weights and biases) to aid in the distinction between various images. Because it overcomes the short-term memory constraints of the RNN, LSTM is significantly more efficient and superior to the regular RNN. In order to analyze sequential data, such as time series, spoken language, and audio data, recurrent neural networks (RNNs) were developed. The LSTM can process inputs while processing pertinent information, and it may ignore irrelevant information. Recurrent neural networks
Image Tagging Using Deep Learning
359
(RNNs) of the Long Short-Term Memory (LSTM) type are able to recognize order dependency in sequence prediction issues. The most frequent applications of this are in difficult issues like speech recognition, machine translation, and many others.
2 Literature Review In [1], a complicated part in natural language processing is automated information retrieval and text summarization due to the irregular great complexity and architecture of the texts. The technique of text summarizing turns a lengthy text into a summary. This study introduces a novel text summarizing model for deep learning-based information retrieval. The three main processes that make up the proposed model are text summarization, template development, and information retrieval. Following that, the DL model is used to generate templates. The text is summarized using as a document, the deep residual integration summary technique. The uniqueness of the work is demonstrated by the construction of BiLSTM using the DBN framework for captioning pictures and text summary. In [2], applications that seek to automatically produce captions or explanations regarding picture and video frames have a lot of potential when using deep learning methodologies. In the field of imaging science, captioning for both images and videos is regarded as an intellectually difficult subject. Areas among the application fields are general purpose robot vision systems, automatic picture and video indexing for search engine usage, automatic production for people with various levels of vision impairment, including subtitles for pictures and videos, and many more. Several other task-specific options may be extremely beneficial to each of these application scenarios. Instead of a detailed examination of image captioning, this article offers a brief summary of both deep learning-based video captioning and image captioning approaches. The computational similarity between captioning for images and videos is the focus of this study. In [3], since it necessitates comprehension of both complex pictures and conversational language, picture captioning is one of the most difficult issues facing AI. Recent developments in picture captioning have employed stochastic optimization RL to further investigate the complexities of term-by-term production since wordby-word generation is basically a sequential prediction challenge. Because of the task’s complex word and sentence structures, RL-based photo captioning systems typically focus on a single regulation connectivity and reward function, as well as its cross-perception and language aspects. We provide an unique multi-level incentive and policy structure for captioning images in order to address this issue. This system makes it simple to optimize speech metrics, graphic capabilities, and RNN-based captioning models. A multi-level policy network that alters the word and paragraph policies for word production, as well as a multi-level reward function that jointly uses a visual acuity incentive and a syntax reward to steer the policy, make up the specific components of the proposed framework. In [4], the automatic description of an image’s content has drawn a lot of study interest in the multimedia space. In
360
R. R. Duvvada et al.
several techniques, visual representations from an image are extracted using convolutional neural networks (CNNs), which are then fed into recurrent neural networks to generate plain text. A few methods have recently been developed that can identify semantic ideas in photographs and subsequently encode them into superior depictions. Although significant progress has been made, the majority of the prior techniques treated each entity in an image separately, without structured data that would have provided useful clues for picture description. In this research, we provide a scene graph-based paradigm for picture captioning. Scene graphs include a lot of structured information since they show pairwise relationships in addition to object entities from photos. In order to utilize both visual characteristics and semantic information, for schema’s representations in structured scene graphs, we extract CNN features from triples (such as a guy riding a bike) and clustering-based orientations of entity entities for visual representations. We next recommend a hierarchical consideration module to learn unique features for word production through each time step after learning these attributes. The experimental results on benchmark datasets demonstrate that our methodology outperforms a variety of cutting-edge methods. In [5], CNN performs the function of an image encoder, interpreting visual areas and encoding them into region-specific characteristics at various points. Regions can be captured using one of two methods. The most popular technique is dividing the picture into grid cells. LSTM is in charge of comprehending every word that has been produced and producing the next word at every time step. The dense pixellevel predictions of the semantic segmentation job are explicitly targeted by FCN(). Because of this, it is excellent for providing both esthetic qualities and semantic labels in the form of a spatial grid at a fine grained level that, in principle, may reach the pixel level. In [6], the approaches mentioned in this article are based on neural mechanisms for converting between the visual and linguistic domains. To create a string of words that corresponds to an insightful description, a picture is first encoded into a feature vector and supplied to a generative neural language model, a model for producing text utilizing its hidden state and parameters. We use a deep residually connected LSTM network. In addition to frame-level CNN features, we also employ two other segment-based feature types as video features: dense trajectories and C3D features based on the 3D CNN network. We also use the 20-dimensional one-hot feature vector of the video category data from the MSR-VTT database. In [7], in this study, the phrase that most properly reflects the meaning of a certain image or video is chosen from a pool of words. We suggest to accomplish this in a visual space only, as opposed to prior systems that trust on a mixed hyperspace for picture and acquisition of video captions. We also propose a deep neural network architecture called Word2VisualVec that discovers to anticipate a rendering of visual features from verbal input in addition to this conceptual breakthrough. Multi-scale phrase vectorization transforms example captions into textual embeddings, which are then transformed into deep visual characteristics via a straightforward multi-layer perceptron. By predicting from text not only 3D convolutional neural network features but also a visual audio representation, we significantly enhance Word2VisualVec for retrieving video captions. Experiments on Flickr8k, Flickr30k, the Microsoft Part of the visual dataset, and the most recent NIST TrecVid challenge for video caption
Image Tagging Using Deep Learning
361
retrieval provide details on Word2VisualVec’s attributes, merits over scriptural algebraic expressions, promising for multidimensional query composition, and state-ofthe-art results. In [8], here, we propose replacing the encoder RNN to accomplish this elegant formula with a deep convolutional neural network (CNN). Consequently, it makes logical to use a CNN as an imaging “encoder” by first training it for a supervised classification job and then feeding the final feature space to the RNN decoder that produces words. The Neural Image Caption, sometimes known as NIC, is the name of this model. A major challenge in AI that combines computer vision and NLP is the autonomous description of an portrait material. In this paper, we offer a deep repetitive topology based on generative model that makes use of recent advancements in machine translation and computer vision to generate understandable sentences that describe pictures. We subjectively and statistically demonstrate that our model is typically quite accurate. Finally, a competition was held in 2015 utilizing the freshly released COCO dataset due to the recent spike in interest in this topic.
3 Methodology The implementation contains the following modules.
3.1 Assemble Photo Data To understand the content of photos, we employ a pre-trained VGG model (Oxford Visual Geometry Group). This pre-trained model is directly supplied by Keras. The VGG class in Keras may be used to load the VGG model. Additionally, Keras offers tools for cropping the loaded picture to the model’s chosen size (224 × 224 pixel image). Each image will be loaded, ready for VGG, and then, the model’s predicted characteristics will be gathered.
3.2 Text Data Preparation For each image in the collection, there are several explanations, and only little text cleanup is needed. We will need to make use of the following to cut down on the number of terms in our vocabulary. • Make every word lowercase. • Avoid employing any punctuation. • Eliminate any words with one or less characters.
362
R. R. Duvvada et al.
Fig. 1 Merge model
• Erase any terms that contain numerals. Here, we provide a summary of the vocabulary’s size. We want a language that is as concise as possible while still being expressive. A smaller model that trains more quickly will have a lesser vocabulary. 3.3 Develop Deep Learning Model 3.3.1 Loading Data In order to employ the prepared photo and text data in fitting the model, we import it. All of the images and descriptions in the training dataset will be used to train the data. 3.3.2 Establishing the Model Based on the merge-model, we define deep learning has three sections.
3.3 Develop Deep Learning Model 3.3.1
Loading Data
In order to employ the prepared photo and text data in fitting the model, we import it. All of the images and descriptions in the training dataset will be used to train the data.
3.3.2
Establishing the Model
Based on the merge model, we define deep learning which has three sections (Fig. 1).
3.3.3
Photo Feature Extractor
This VGG model has 16 layers. The obtained features expected by this concept will be used as input after preprocessing the images with the VGG model.
3.3.4
Sequence Processor
This layer is a Long Short-Term Memory (LSTM) recurrent neural network layer, which deals with the text input.
Image Tagging Using Deep Learning
3.3.5
363
Decode
An output vector with a predetermined length is produced by the characteristic extractor and sequence processor. A dense layer combines and analyzes these to get the final forecast.
3.4 Analyze the Model By creating aspects for each photo in the test dataset, we may assess a model by comparing its predictions to the generated descriptions using a standard cost function, comparing a skilled model to a dataset of image statements and feature values. The corpus BLEU score, which denotes how well the created text matches the expected text, is used to assess both the actual and predicted descriptions jointly. For this, the start description token startseq is sent in, one word is formed, and then, the model is called once again with the phrase constructed as entry until the end of the sequence token endseq (Fig. 2).
4 Implementation Importing all essential packages is necessary at the beginning of the implementation. We upload the document file with images. Utilizing the Flickr 8 K dataset, we will develop picture captions. We will utilize the smaller Flickr 8 k dataset rather than the bigger Flickr 30 K and MSCOCO datasets since educating the network on them
Fig. 2 Architecture of image captioning
364
R. R. Duvvada et al.
Fig. 3 Mapping image id with relevant captions
might take weeks. The top-level dataset file, Flickr 8 k.token, which contains the names of each image and the captions for each, is located in the Flickr 8 k text folder. The captions are separated from the image names by newlines (“.n”), where data cleansing is done. The dataset file is initially loaded, and its contents are then read into a string format (Fig. 3). Build a dictionary of descriptions that associates each image with one of five captions, then do data cleaning on all descriptions. When working with text data, this is a necessary process when we choose the type of cleansing we wish to apply to the text based on our goals. In this instance, punctuation marks will be dropped, all text will be written in lowercase, and words comprising numbers will be eliminated. We must express English words with numbers because computers cannot understand them. As a result, we will give each word in the dictionary a unique index value. We will create tokens from our vocabulary using the Keras library’s tokenizer function, and we will store them as pickle files. All images will have their features which are extracted, and we will map each image’s name to its corresponding feature array before dumping the array into a dictionary as a picklefile. Depending upon your system, this procedure could take quite a while. The dataset for training the model should be loaded. For this, we utilize the Images.txt file, which has a list of 6000 image names. We further provide the IDs for each caption. This is required so that our LSTM model can identify the start and end of the caption. The data must then be split into training and testing sets. Then, using the training data, we must train a deep learning model. In this step, the caption is generated by an LSTM and the image is encoded using a convolutional neural network (CNN). Additionally, we must specify the optimizer and loss function that the model will employ while being trained. Just after model has been trained, we can use it to generate captions for the pictures in the test set and analyze how well it performed here are the following steps to do. • Load the trained model and the test data (i.e., the images and their corresponding captions). • Pre-process the test data in the same way as we did for the training data (e.g., resize the images, tokenize the captions). • Use the model to generate predicted captions for the images in the test set.
Image Tagging Using Deep Learning
365
• Compare the predicted captions to the true captions to evaluate the model’s performance. There are several metrics we can use for this, including accuracy, BLEU score, and perplexity. BLEU score is what we have relied on. • Report the results of the evaluation, including the chosen metrics and any other relevant information such as the model architecture and hyperparameters.
5 Experimental Results 5.1 Generating Caption for an Image Deep learning is used to generate the caption for the input image that is provided. We are creating an LSTM-based model for photo captioning that anticipates word sequences using the feature vectors obtained from the VGG network, known as the caption. Only three photographs were tested for consistency’s purpose, and the outcomes are shown in the following pictures (Figs. 4, 5, and 6).
Fig. 4 Output for picture 1
366
R. R. Duvvada et al.
Fig. 5 Output for picture 2
5.2 Evaluation Metrics An image captioning model can be evaluated using a confusion matrix, but first we must define a set of predefined categories for the predicted captions and the actual captions. The confusion matrix can then be used to assess how well the model performs in classifying the captions into these categories. Here, few samples are taken, and for every sample which almost works marked as positive, and for samples which are not good enough, marked them as negative. Since it is a subjective classification as the captions generated could not be totally true or false (Fig. 7; Table 1). However, it might not be the most feasible choice for evaluating an image captioning. Instead of classifying the image into one of a set of established categories, image captioning consists of creating a textual description of the image. Therefore, it may be more acceptable to evaluate an image captioning model using a metric that is more focused on evaluating text creation tasks, such as BLEU or ROUGE.
Image Tagging Using Deep Learning
367
Fig. 6 Output for picture 3 Fig. 7 Confusion matrix
5.2.1
Using N-gram Approach
We can calculate performance metrics like accuracy, precision, recall, and F1score using the n-gram technique. This is more appropriate than measuring their performance using a confusion matrix.
368
R. R. Duvvada et al.
Table 1 Experimental scores Measure
Value
Derivations
Sensitivity
0.8889
TPR = TP/(TP + FN)
Specificity
0.3333
SPC = TN/(FP + TN)
Precision
0.8000
PPV = TP/(TP + FP)
Negative precision value
0.5000
NPV = TN/( TN + FN)
False-positive rate
0.6667
FPR = FP/(FP + TN)
False discovery rate
0.2000
FDR = FP/(FP + TP)
False-negative rate
0.1111
FNR = FN/(FN + TP)
Accuracy
0.7500
ACC = (TP + TN)/(P + N)
F1-score
0.8421
F1 = 2TP/(2TP + FP + FN)
Matthew’s correlation coefficient
0.2582
TP*TNFP*FN/sqrt((TP + FP)*(TP + FN)*(TN + FP)*(TN + FN))
We have followed these procedures to determine an image captioning models accuracy, precision, and F1-score using the n-gram overlap approach: • Gather a set of anticipated captions and a set of reference captions for a collection of photos. • Count the number of n-grams that appear in both the reference caption and each anticipated caption to determine the overlap in n-grams between them. • To determine the accuracy, divide the total number of predicted captions by the number of successfully predicted captions (those with nonzero n-gram overlap). • To get the precision, divide the total number of predicted captions by the number of accurately predicted captions. • Calculate the recall by dividing the total number of reference captions by the number of captions that were successfully anticipated. • Using the formula F1 = 2 * (precision * recall)/(precision + recall), get the F1-score using the precision and recall data. • The resulting F1-score ranges from 0 to 1, with a higher number suggesting a more effective model (Table 2). Table 2 Performance metric using N-gram
Metric
Score
Accuracy
0.6
Precision
0.7
Recall
0.243
F1-score
0.67
Image Tagging Using Deep Learning Table 3 BLEU score metric
5.2.2
369
BLEU metric
Score
BLEU-1
0.431266
BLEU-2
0.311073
BLEU-3
0.437775
BLEU-4
0.507523
BLEU Score
The BLEU score is based on the assumption that a good caption will overlap with the reference captions to a significant extent and that the overlap will be spread across a variety of n-grams (Table 3).
6 Conclusion By developing an image caption generator for this project, we have put a CNN-RNN model into work. It is important to keep in mind that because our model is depended on data, it cannot generate words that do not exist in its vocabulary. We worked with a tiny dataset of 8000 photos. We ought to train the models at the production level on datasets more than 100,000 photos in order to get models with improved accuracy for which we can use datasets like fickr32k, etc. In Further, we can generate speech from caption which is helpful for blind people.
References 1. Mahalakshmi P, Fatima NS (2022) Summarization of text and image captioning in information retrieval using deep learning techniques. IEEE Access 10:18289–18297 2. Amirian S, Rasheed K, Taha TR, Arabnia HR (2020) Automatic image and video caption generation with deep learning: a concise review and algorithmic overlap. IEEE Access 8:218386– 218400 3. Xu N, Zhang H, Liu AA, Nie W, Su Y, Nie J, Zhang Y (2019) Multi-level policy and rewardbased deep reinforcement learning framework for image captioning. IEEE Trans Multimedia 22(5):1372–1383 4. Li X, Jiang S (2019) Know more say less: image captioning based on scene graphs. IEEE Trans Multimedia 21(8):2117–2130 5. Zhang Z, Wu Q, Wang Y, Chen F (2018) High-quality image captioning with fine-grained and semantic-guided visual attention. IEEE Trans Multimedia 21(7):1681–1693 6. Shetty R, Tavakoli HR, Laaksonen J (2018) Image and video captioning with augmented neural architectures. IEEE Multimedia 25(2):34–46 7. Dong J, Li X, Snoek CG (2018) Predicting visual features from text for image and video caption retrieval. IEEE Trans Multimedia 20(12):3377–3388 8. Panicker MJ, VikasUpadhayay GunjanSethi, Mathur V (2018) Image caption generator. IJITEE 10:2278–3075
Data Driven Scheme for MEMS Model Satyavir Singh
Abstract The elemental parts of nonlinearities associated with MEMS devices which were modeled by distributed-parameter equations. The numerical computation of fidelity model derived from discretization scheme creates a large number of ODEs which demands the huge computational efforts. The dynamics of MEMS device reduced with a Galerkin—POD approach, which will suffer their own drawbacks in nonlinearity computation. Therefore, improved strategy over the POD for MOR intricate. This work addressed the drawbacks of POD for MEMS model specifically the dynamical behavior of center point deflection of a beam and their improvements for efficient simulation in data driven framework. Keywords Proper orthogonal decomposition (POD) · Mems device · Nonlinear systems · Koopman operator · Dynamic mode decomposition (DMD)
1 Introduction Micro-electromechanical systems (MEMS) is manufactured with the property of silicon, referred to as Microsystems. MEMS devices include mechanical and electrical aspects, or both, and vary in size from a few micrometers to millimeters. Such models are having capability of microscale sensing, control and actuation [1]. MEMS devices are prominent and have extensive nonlinearity in a wide range of commercial applications, including smart phones, automobiles, telecommunication (optical and wireless), bio-medical, process control, automative industry (airbag sensor), defence etc. Developing accurate MEMS models with inherit required feature for end user is an open problem at the acceptable level of accuracy. MEMS devices are often designed using lumped parameter models because they offer a simplified way of characterizing the device’s behavior. These models are represented lesser number of S. Singh (B) Department of Electrical and Electronics Engineering, SRM University, AP, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_32
371
372
S. Singh
ordinary differential equations to represent the behavior of the device, and are easy to compute and analyze. However, because they do not consider all of the dynamical characteristics of the device, they may not accurately depicted its behavior dynamics [1]. The device’s geometry or their parameters can be a seldom issue in MEMS model, where small changes in device geometry can have a significant impact on its behavior. Numerical discretization (ND) is a model-based approach that can precisely estimate dynamical attributes, but it generates a large number of equations that make it difficult to use as a macro-model for system-level simulation. In contrast, a data driven approach aims to achieve a specified level of accuracy in dynamical characteristics without relying on a complex model. Instead, it uses data from experiments or simulations to learn the behavior of the system. This approach can be more adaptable and less computationally intensive than ND, but it may not be as precise in certain situations [2]. Although some work has been narrated on the computation of MEMS model of equations to describe their nonlinear behavior. In this paper, an attempt has been made to develop effective macro models for nonlinear MEMS device using data driven scheme, Koopman Theory [3]. POD is a motivating technique for reducing the dimensional of a system in a lowdimensional subspace that captures the dominant behavior of the system. However, the technique may not be suitable for problems with general nonlinearities, especially when it comes to computing a projected nonlinear term with POD. Under such conditions, the reduced system may still have the full-order complexity of the original system, which defeats the purpose of model reduction. To address such issue, techniques such as proper generalized decomposition (PGD) and dynamic mode decomposition (DMD) have been developed to handle nonlinearities and improve the computation of the systems [4]. POD is a technique used to reduce the dimensionality of a dataset by finding the most important modes or patterns that capture the most variance in the data [5]. These modes can be characterized by a continuous frequency spectrum. The number of modes captured by POD depends on the size of the dataset and the degree of variability in the data. DMD, on the other hand, is a technique to extract dominant coherent structures or modes from a dataset. DMD works by decomposing the data into its eigenvectors and eigenvalues, which represent the dominant modes and frequencies of the system. DMD is characterized by a discrete set of frequencies, which correspond to the eigenvalues of the system [6]. DMD technique to capture finite-dimensional Koopman operators, this is an active area of research in the field of dynamical systems. The Koopman operator is a linear operator that describes the evolution of the probability distribution of a dynamical system [7]. This work is organized in the following sections. In the first section, a general introduction about the background of the problem is given. In the next section, mathematical modeling of the MEMS device is presented. This is followed by the description of the data driven framework for nonlinear dynamical systems [8]. Section 4 gives a numerical experiments with the proposed algorithm, followed by the conclusion in the last section.
Data Driven Scheme for MEMS Model
373
2 Modeling of MEMS Switch We take into consideration the time-dependent large coupled electro-mechanical and fluidic model to demonstrate the process of modeling MEMS switch. Such model is connected to an elastic deformable beam suspended above a silicon substrate with a narrow layer of air working as a damper. The applied voltage between the beam and substrate causes the beam to deflect toward the substrate. Dynamical behavior of a MEMs are represented as follows: ∂4z ∂2z EI 4 − S 2 = Felec + ∂x ∂x
∮w ( p − pa )dy − ρ 0
∇ .[(1 + 6K )z 3 p ∇ p] = 12μ
∂ pz ∂t
∂2z ∂t 2
(1)
(2)
Here electrostatic force across the plate is Felec = − (∈02zwv2 ) , where v denotes the applied voltage and u = v 2 , is the input voltage to the system. Center point height of the beam is a system output [9]. The symbols are, E; Young’s modulus, I ; moment of inertia of the beam, S; stress coefficient, ρ; density, pa ; ambient pressure, μ; air viscosity, K ; the Knudsen number [10]. The height of the beam above the substrate is determined by z = z(x, t), and the pressure distribution in the air within the beam is determined by p = p(x, w, t). The beam is fixed at both ends, hence the starting circumstances are [11], z(x, 0) = z 0 2
p(x, y, 0) = pa z(0, t) = z(l, t) = z 0 ∂ p(0, y, t) ∂ p(l, y, t) = =0 ∂x ∂x p(x, 0, t) = p(x, w, t) = pa for all x ∈ (0, l), y ∈ (0, w), and t > 0. Now setting the state space variables x1 = z, 3 x2 = ∂∂tz , and x3 = p, the dynamical system becomes as follows ∂ x1 x2 = 2 ∂t 3x1
(3)
374
S. Singh
⎡ w ⎤ ∮ ∂ x2 2x22 3x12 ⎣ ∂ 2 x1 ∂ 4 z ⎦ 3∈0 w 2 v (x3 − pa )dy + S 2 − E I 4 − = 2+ ∂x 2ρ ∂t ρ ∂x 3x1
(4)
0
∂ x3 x2 x3 1 ∇ =− 2 + 12μx ∂t 3x1 1
((
λ 1+6 x1
)
) x13 x3
∇ x3
(5)
The aforementioned equations can be expressed in the compact form as shown in (6), ∂x = f (x(t)) + Bv 2 (6) ∂t where f (x(t)) is the nonlinear function and B is a scalar input vector subject to an appropriate choice of selected state variables. Using the grid points listed below in the x-y plane, we do the spatial descretization of the computing domain in order to solve Eq. (6), x α = α · Δx, y β = β · Δy, for α = 0, . . . , (n + 1) and β = 0, . . . , (m + 1), where x 0 = 0, x n+1 = l, y 0 = 0, y m+1 = w, Δx = nl and Δy = mw [12]. Equations (3–5) produce a significant nonlinear dynamical system (7) of the following type when a common finite difference method is applied to it. dx = f (x(t)) + Bu(t) dt
(7)
The order of the discretized system is (2n + mn), which is very large and thus computationally expensive to solve. Therefore, MOR techniques can be applied for reducing the size of the system (6). POD is the powerful techniques use to reduce such nonlinear models. However, POD has its own issues regarding the evaluation of nonlinear term in the ROM [4]. To address this issue, DMD is used to make the procedure computationally feasible. DMD is similar to POD but POD lacks in dynamical information of data. DMD allow to extract dynamical behavior in time resolve fashion [3]. As POD scheme requires to have mathematical model. Hence, the data driven framework is discussed for equation free model in the next section.
3 Review of Data Driven Framework Data driven approach reformulate nonlinear model to linear framework that enable to predict, estimation and control complex nonlinear dynamics using linear control theory. Assume dynamics of the model is f = f (x, w, t), mostly, f contains (one or more) components of the state vectors. Purpose of a decomposition is to decouple the spatial and temporal dynamics of the displacement, and formulate the entire dynamics as a superposition of spatial modes φ j whose time evolution is governed by scalar coefficients a j : scalar measurement function g that operates on an infinitedimensional Hilbert space and referred as an observable subspace [4].
Data Driven Scheme for MEMS Model
g(x, y, t) =
375 ns ∑
a j (t)φ j (x, y)
(8)
j=1
(8) is having time and space dynamics where koopman operator is using to get linear dynamics and hence we can perform eigen value decomposition of observable subspace [13] over discrete time dynamics. κφ j (x j ) = λ j φ j (x j ), j = 1, 2, . . . , n s where κ is linear (Koopman operator) and λ, φ are koopman eigen value and eigen vector respectively. In (8) coefficient ak (t) is called koopman mode associated with its corresponding eigen vector, φ j (x, y). Hence, it allows to represent koopman modes as a projection of observables. κg(xk ) = g( f (xk )) =
∞ ∑
λjφjaj
(9)
j=1
Here koopman operator is an iterative set of triples λ, φ and a. These are representing koopman mode decomposition and hence concept of koopman operator is used in data driven identification. The dynamics of MEMS model is measured over the observable set and recorded as evolution of nonlinear sequence of projections of infinite-dimensional state vector [3]. It is treated as linear vector space under unknown operator κ. Here we observe that POD modes are orthogonal however, DMD modes are not orthogonal and also dynamically invariant. DMD is inherently a data driven scheme that collect dynamics over the time as stated in algorithm 1. The column space of X will be an eigen Algorithm 1 D M D Algorithm 1: Compute the DMD modes, κφ(x) = λφ(x), where φ(x) are the eigenvectors of the κ matrix and λ are the corresponding eigenvalues. 2: Measure time-shifted two dynamics X ' ≈ κ X , where x j+1 = κ x j , and find the best fit linear operator κ using the pseudo-inverse. The matrix X contains snapshots of the dynamics. 3: DMD modes, κφ(x) = λφ(x). 4: Compute the DMD modes, κφ(x) = λφ(x), where φ(x) are the eigenvectors of the κ matrix and λ are the corresponding eigenvalues. ' 5: Compute κ = X ' X † , where the eigenvectors of κ should ∑∞be the column space of X . 6: Perform a data driven spectral decomposition, xk = j=1 λ j φ j a j , where xk are the snapshots of the dynamics and a j are the mode amplitudes. 7: Compute the mode amplitudes a using the equation a = φ † x1 , where φ j are the DMD modes (eigenvectors of the κ matrix), λ j are the DMD eigenvalues (eigenvalues of the κ matrix), and a j is the mode amplitude.
vector space in an exact DMD. Theoretically, for the dynamical system with lowrank approximation, the column spaces of X and X ' will tend to be almost identical, resulting in convergence of the projected and exact DMD modes. Algorithm 1 used to decompose a set of measured dynamics into DMD modes, which can then be used to analyze the system’s behavior. This low-order approximation is typically easier
376
S. Singh
to compute than the full nonlinear model as in contrast to the POD technique. It requires to evaluate a low-rank set of dynamical characteristics (9). In keeping the fact of DMD, It will not required mathematical equation to compute future states. For DMD scheme data is collected experimentally for all the future time.
4 Numerical Experiments The proposed Algorithm 1 is experimented with MEMS device (Fig. 1) and serves as the benchmark example for the DMD algorithm [10]. The benchmark MEMS model was originally analyzed in [9]. As depicted in Fig. 1, the beam is l = 610μ meters in length, w = 40μ meters in width, and h = 2.3μ meters tall (Here μ = 10−6 ) [12]. Employed model with n = 40 and m = 20, results in a system order of 880. The chosen output for the experiment is the center point of the beam ' s deflection away from the balance point, denoted as y(t) = z( 2l ). The MEMS switch ' s gap height is expressed in micrometers. The experiment considers two standard inputs: u(t) = (M H (t))2 or u(t) = (M cos(ωt))2 , where H (t) is a unit-step input starting at t = 0 with magnitude M = 7 V and f = 1 MHz. The inputs are used to create experimental data for two scenarios involving sinusoidal and step inputs. The experiment aims to investigate the behavior of the large nonlinear dynamical system under different input conditions and to use the experimental data to develop a data driven model that can be used to predict the system’s future behavior. The chosen output for the experiment, y(t) = z( 2l ), represents the deflection of the beam away from its balance point and is an important parameter for characterizing the behavior of the system. Case-1: For MEMs device as depicted in Fig. 1: with sinusoidal input. The order of the large system is n = 880, reduced-order model is of size k = 15. Applied cosine input u(t) = (7 cos ωt)2 and time-step δ = 10−6 . The data is arranged in the matrix form with selected time-step size results matrix X . The data is collected to exploit low-dimensional framework and project the future state of the systems without computing 1 and 2 PDE solution. Constructed data matrices
Fig. 1 Nonlinear MEMS device model
Data Driven Scheme for MEMS Model
377
2.35 Orig POD DMD
2.3
2.25
2.2
2.15
2.1
2.05
2 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 10 -4
Fig. 2 Output of the MEMS device for external excitation u(t) = (7 cos ωt)2 10 -8
4 3.5 3 2.5 2 1.5 1 0.5 0 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 10 -4
Fig. 3 Error in response to input u(t) = (7 cos ωt)2
are used in DMD alogrithm 1. The subset of data matrix gives eigen vector and eigen values in (9) and results into (8). Time evolving solution of the PDE versus DMD along with POD is presented 2. Using algorithm 1 the approximated dynamics are having eigenvalues outside the unit circle their growing modes appear and may result to exponential increase for longer time of projections of the dynamics under consideration. Even-though, the error remains small for the selected times interval and giving an upper range for the DMD approximation holds. DMD model is faster to
378
S. Singh
Table 1 Case-1: assessment of time taken to simulate, original model, POD and DMD for u(t) = (7 cos ωt)2 Original model comp. POD DMD – 228.90
Time (s)
185.22
45.66
Table 2 Case-1: assessment of errors profile for the input u(t) = (7 cos ωt)2 Scheme % error (states) % error (output) 1.9 × 10−7 8.03 × 10−7
POD DMD
3.06 × 10−7 3.06 × 10−7
2.35
2.3
2.25
2.2
2.15
2.1
2.05
2
1.95 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 10-4
Fig. 4 Response of MEMS device for input u(t) = (7H (t))2
simulate compared to the full-order model computation. This is a desirable feature for many applications where computation is important. It’s also interesting to compare the error rates between the POD and DMD models for the MEMS model, as presented in Tables 1 and 2. The error profile with respect to original model and proposed data driven model are shown in Fig. 3. Case-2: For MEMs Model as shown in Fig. 1: with step input (Fig. 4). Consider the same system as described in case-1 with another input, i.e., step input u(t) = (7H (t))2 with time-step δ = 10−6 . The numerical computation time for the reduced-order models, such as the POD and DMD models, is much lower compared to the full-order simulation, as presented in Table 3 and compare the error rates between the full-order simulation, POD and DMD models to assess the accuracy is also presented in Table 4.
Data Driven Scheme for MEMS Model
379
10 -12
6
5
4
3
2
1
0 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5 10 -4
Fig. 5 Error in MOR for input u(t) = (7H (t))2 Table 3 Case-2: assessment of time taken to simulate, original model, POD and DMD for u(t) = (7H (t))2 Original model comp. POD DMD – Time (s)
248.32
189.91
50.95
Table 4 Case-2: assessment of errors profile for the input u(t) = (7H (t))2 Scheme % error (states) % error (output) POD DMD
9 × 10−11 1.91 × 10−8
9.2 × 10−11 9.2 × 10−11
The error profile with respect to original model and proposed data driven model are shown in Fig. 5. From above it is clear that the dynamical behavior of proposed data driven scheme is similar to the original MEMS model (see Tables 2 and 4). Computational resources required to nonlinear system get reduced with DMD and it will not rely on the mathematical model of the system. In DMD only data (experimental) is sufficient to characterize behavior of the system. Such model will have an error, but the overall amplitude is very small. The computational savings are significant in the simlation time (see Tables 1 and 3) and numerical inefficiency to nonlinear term evaluation is also not required since it was data-based scheme. In these notes I tried to experiment a nonlinear model and how to compute POD and DMD modes and their application with required computational efforts for physical systems are studied. Perhaps in the
380
S. Singh
future work will actually to implement more intelligent computations in data driven framework.
5 Conclusion MEMS model was used to analyze and test the dynamic behavior of a system. POD and DMD modes were computed. It has been presented how to compute the POD and DMD modes to the sampling frequency and time span. Initial 10 modes of POD showed varying energy changes over time, but the variations were less than one percent. The measurements of the center point and its amplitudes presenting stable DMD modes in observable subspace. DMD modes with lower damping rates consistently emerged from the two datasets and correlated with observable subspace. However, spurious modes and modes devoid of any physical relevance had no significant effect are discarded. Hence, DMD makes computational procedure easy for capturing equal energy modes. The MEMS model has been analyzed and tested for the dynamical behavior of the system using POD and DMD modes for two set of inputs. Hence, It shows simulation time to capture the equal energy modes, DMD makes computational procedures easy.
References 1. Santorelli J, Nabki F, Khazaka R (2014) Practical considerations for parameterized model order reduction of mems devices. In: 2014 IEEE 12th international new circuits and systems conference (NEWCAS). IEEE, pp 129–132 2. Xie WC, Lee HP, Lim SP (2003) Nonlinear dynamic analysis of mems switches by nonlinear modal analysis. Nonlinear Dyn 31(3):243–256 3. Kutz JN, Brunton SL, Brunton BW, Proctor JL (2016) Dynamic mode decomposition: datadriven modeling of complex systems. In: SIAM 4. Kutz JN (2013) Data-driven modeling and scientific computation: methods for complex systems and big data. Oxford University Press 5. Singh S, Bazaz MA, Nahvi SA (2018) A scheme for comprehensive computational cost reduction in proper orthogonal decomposition. J Electr Eng 69(4):279–285 6. Milan K, Igor M (2018) On convergence of extended dynamic mode decomposition to the Koopman operator. J Nonlinear Sci 28:687–710 7. Schmid PJ (2010) Dynamic mode decomposition of numerical and experimental data. J Fluid Mech 656:5–28 8. Quesada-Molina JP, Mariani S (2022) Uncertainty quantification at the microscale: a datadriven multi-scale approach. Eng Proc 27(1):38 9. Hung ES, Yang Y-J, Senturia SD (1997) Low-order models for fast dynamical simulation of mems microstructures. In: 1997 international conference on solid state sensors and actuators, TRANSDUCERS’97 Chicago, vol 2. IEEE, pp 1101–1104
Data Driven Scheme for MEMS Model
381
10. Michal R, Jacob W (2003) A trajectory piecewise-linear approach to model order reduction and fast simulation of nonlinear circuits and micromachined devices. IEEE Trans Comput-Aided Des Integr Circ Syst 22(2):155–170 11. Bond B, Daniel L (2005) Parameterized model order reduction of nonlinear dynamical systems. In: Proceedings of the 2005 IEEE/ACM international conference on computer-aided design. IEEE Computer Society, pp 487–494 12. White JK et al (2003) A trajectory piecewise-linear approach to model order reduction of nonlinear dynamical systems. PhD thesis, Massachusetts Institute of Technology 13. Brunton SL, Tu JH, Bright I, Kutz JN (2014) Compressive sensing and low-rank libraries for classification of bifurcation regimes in nonlinear dynamical systems. SIAM J Appl Dyn Syst 13(4):1716–1732
Detection and Mitigation of ARP Spoofing Attack Swati Jadhav, Arjun Thakur, Shravani Nalbalwar, Shubham Shah, and Sankalp Chordia
Abstract The address resolution protocol (ARP) is the fundamental method of data communication in which packets are sent across the network between client and server nodes. However, the lack of authentication on the node side can allow the attackers to spoof the communication and redirect the data, leading to an ARP poisoning attack. With increased network communication and data exchange comes the threat of increased cyberattacks. Different strategies for detection of an ARP spoofing attack at an individual and organizational level are proposed in this paper. We have placed an ARP spoof attack on the client machine and detected the packet traffic. Detection takes place using two different methods: Python algorithm and other software tools. Along with potential defenses against such attacks, methods for securing the computer in the case of ARP spoofing have been studied and highlighted. Keywords Address resolution protocol (ARP) spoofing · IP address · Network · Wireshark · Scapy
S. Jadhav (B) · A. Thakur · S. Nalbalwar · S. Shah · S. Chordia Department of Computer Engineering, Vishwakarma Institute of Technology Pune, Pune 411037, India e-mail: [email protected] A. Thakur e-mail: [email protected] S. Nalbalwar e-mail: [email protected] S. Shah e-mail: [email protected] S. Chordia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_33
383
384
S. Jadhav et al.
1 Introduction The internet plays a critical role in our lives in the modern period with most functioning being completely reliant on it. With the rise in internet usage, there is a surge in the amount of data being generated and shared. Hence, a number of problems have emerged, with cyber security being one of the main concerns. Recent reports suggest that around $6 trillion dollar damage was caused due to the cybercrimes worldwide in 2022, while approximately 71 million people were a victim of some cyberattack or have experienced hacking threats. Most attacks take place through hardware-oriented connections over the network, giving access to the machine and host information. Every computer interface in a network has a physical (MAC) address and logical (IP) address. Several computer softwares transmit or receive data using IP addresses, despite the fact that actual communication occurs via the hardware address known as the MAC address. Finding the matching MAC address is necessary for IPv4 communication [1]. Address resolution protocol, i.e., ARP is responsible for IP addresses being converted into their matching MAC addresses. It is a protocol that transforms the IP address into the corresponding MAC address. It is responsible for the packet delivery to the respective node over the communication channel. This indicates why ARP is a crucial component of network communication. The mapping of IP address to the respective MAC address results in a pair of the two which will be used in the transportation of the packets during the communication. The ARP request is broadcast in nature, while the ARP reply is unicast. Figure 1 displays the header associated with ARP packet containing the information about source and destination addresses [2]. The ARP protocol is prone to a variety of network breaches due to some loopholes, the ARP protocol’s statelessness and lack of authentication being the major flaws. These gaps are simple for hackers to use as a launchpad for more sophisticated attacks. ARP poisoning is usually considered a type of insider threat because the insider is the one who begins the attack. The importance of data is increasing, providing insiders more motivation to steal data and raising concerns against such Fig. 1 ARP header (size is in bytes)
Detection and Mitigation of ARP Spoofing Attack
385
threats. Consequently, higher-level attacks, like Man-in-the-Middle (MITM) attack and Distributed Denial of Service (DDoS) attack, are taking place through the ARP spoofing attacks. Researchers have always been driven to develop new techniques to counteract ARP poisoning attacks and the resulting network threats [2]. There have been recent advancements in the detection and mitigation of the ARP poisoning threat and related attacks. Multiple frameworks are proposed in order to overcome the faults and make the network more secure. In this paper, we have discussed how the ARP spoofing attack takes place through a client-server system and how the packets received by the attacker can be detected on the machine. We have used multiple tools for detection, like Wireshark software which helps in identifying the ARP requests that were broadcasted over the network. We have used a Python library called Scapy for spoofing and the sniffing of the attack. The following sections goes over the entire procedure for launching the attack, its mechanism, detection algorithm, and prevention methods.
2 Literature Review Rohatgi et al. [1] provide an outline of ARP’s operation. ARP spoofing and its numerous attacks are covered. In the working of ARP, after receiving the ARP request, the devices verify to see if their IP address matches it. In case they do not match, the device will merely reject the packet. An ARP reply with the destination MAC address is sent by the user whose IP address matches. Following this, the cache tables of both the devices are modified for future reference. Different variants of spoofing attacks, for example, man in the middle, DDoS, cloning, MAC flooding, and session hijacking are elaborated along with a detailed literature survey about the same. Finally, compares several detection and mitigation strategies, together with their advantages and disadvantages, for preventing ARP attacks. In-depth discussions and comparisons of several detection and mitigation approaches that are presented in the past are documented in this study. But after examining such methods, it was noted that each method had drawbacks. Due to this, those procedures must be modified in order to ensure security and shield the network against ARP spoofing attacks. Tripathi et al. [2] suggest that ARP poisoning is regarded as a unitary fundamental attack that is used to launch more advanced attacks. To identify and stop them, several remedies have been put up against the attacks. However, each of the options put out has its limitations. While some methods are more appropriate for a particular band of scenarios, others are more effective in a specific set of scenarios. Researchers are driven to provide new solutions as new ARP poisoning strategies have developed throughout time. This comparison analysis offers a pre-designing strategy that should be taken into account when suggesting fresh ARP poisoning prevention methods. This comparison analysis can also be utilized to create a more effective and efficient plan that, on the one hand, benefits from the combined power of many mitigation strategies and, on the other, does not have the previous restrictions.
386
S. Jadhav et al.
Xing et al. [3] explained about the different modules involved. The IP address and MAC address are retrieved from the data packets in the analysis module, and are then passed to the checking module. The system first judges whether the last number of the IP address is one. If present, it then equates IP/MAC to the correct IP/MAC, and passes the generated conclusion to the response module. If it does not match, the system stops handling the packet. Under the response module, as the ARP spoofing attack is detected, the system relays the information to the users, and is also stored in the database server table and stores the information to the database server. It also records the data in the ARP cache table. These modules constitute a major role in understanding the functioning of ARP cache. Bhirud et al. [4] discuss the network intrusion detection system (NIDS), which is a prominent technology being considered for network security. Although it can recognize network assaults, it is powerless to stop them. In order to protect networks from intrusions, researchers are developing a system called network intrusion detection and prevention system (NIDPS). It also mentions about the various types of ARP spoofing associated attacks and its effects on the system. The mechanism proposed suggests that to construct and organize a table consisting of IP-MAC address pairs for systems present in the LAN, while the NIDPS employs IP-MAC address pairs delivered by lAs. The suggested method for detecting IP and ARP spoofing has the advantages listed below: can thwart an attack at the point of origin. It is capable of both detecting and thwarting IP and ARP spoofing-based attacks. It is appropriate for bigger enterprises as it does not involve physical entries while maintaining tables of IP-MAC pairs. The suggested approach is a lightweight one that doesn’t call for changing the ARP protocol. This mechanism’s drawback is that it is unable to identify IP spoofing attacks that are drawn against the LAN, but from the LAN itself. To prevent similar assaults, research should be conducted. Zhao et al. [5] highlighted the root causes of ARP spoofing and its detailed mechanism. They overcame the limitations of conventional methods by introducing a novel approach to stop ARP spoofing attacks. Additionally, they are working to reduce man-in-the-middle attacks, which lead through ARP spoofing, both in wired and wireless LANs. Due to factors including current ARP compatibility, lower costs for infrastructure upgrades, and increased configuration capabilities, new nodes are added without any human configuration. Ibrahim et al. [6] add a module to the SDN controller that inspects each ARP packet in the network to identify and block any potentially spoofing ones. This mechanism’s flaw will become apparent as the network grows and the volume of traffic rises. This will consequently increase the roundtrip time and CPU strain on the controller. In order to address this issue, the additional module has been modified to manage ARP congestion, giving the controller access to the proxy ARP capabilities in the process. The simulation’s findings demonstrated that the suggested approach is resistant to ARP spoofing attacks, effectively blocking ARP broadcast messages in larger environments. It also speeds up response times by centrally handling ARP queries. The module forwarding.l2 learning is updated in the proposed mechanism to do ARP testing for poisoned packets in addition to ARP replying. Proto.dhcpd, a different
Detection and Mitigation of ARP Spoofing Attack
387
Pox component module, is used to provide information about the addresses of IPMAC mapping of every host present in the network to the Main table in the controller. The proposed approach should be employed in a shared controller scheme in which several controllers share robustness and divide workload among themselves for one another. Jinhua et al. [7] elucidate ARP spoofing attack and several related works and analysis of the same. In light of these considerations, the study suggested a powerful method based on the ICMP protocol to identify harmful hosts involved in an ARP spoofing attack. The method entails gathering and examining the ARP packets, followed by the injection of ICMP echo request packets to explore for malicious hosts based on their answer packets. It won’t interfere with the hosts on the network’s operations. During an assault, it can also identify the actual address mappings. In order to ensure that the source and destination addresses within the Ethernet header and ARP header are accurate, it first executes an inter-layer control operation. The address entries in acceptable ARP packets are compared with those found in the database in the second stage. The ARP Spoof Detection Module will thereafter receive all new ARP packets for review. Because the method provided to confirm the validity of each ARP packet is extremely active, the amount of time between capturing the packets and identifying the spoofing attempt is as short as possible. Pandey [8] mentions how different network domains are frequently targets of varied network-oriented attacks. One of the major types of such an attack is ARP cache poisoning. ARP spoofing occurs frequently as it is a stateless protocol, and because there is not any method for confirming the identity of the sender host. ARP spoofing has been observed to be the main cause of LAN attacks. Therefore, stopping this issue through prevention, detection, and mitigation will halt a variety of network attacks. In fact, it is the deliberate alteration of the linkages between IP and MAC addresses that are kept on any network host’s ARP cache. Enhanced Spoof Detection Engine (E-SDE) is an advanced technique that is suggested in this study. This technique identifies ARP spoofing and recognizes real IP and MAC relationships. As probing packets, ARP and ICMP packets have been employed. An algorithm is used to demonstrate how E-SDE functions. In order to fully comprehend the E-SDE’s incremental development and make it effective against the majority of attacker types, an attacking model was also proposed. The suggested method is also used to measure network traffic. Saini et al. [9] elaborated about the ARP protocols and its complete working. The IP address was essentially mapped to media access control via ARP. Each host’s entry is maintained via the ARP protocol, which stores entries in a table known as the ARP cache. Here, the ARP spoofing principle states that any host A on a local area network is attacking any host B. Sniffing is the act of listening to two machines speak while without knowing either of them. It can be of two types–at internal network, or external network. It has also elaborated on how ARP spoofing leads to DDoS attack and MITM attack, and how the attacks take place in the network. For the ARP sniffing, tying logical address to physical address is one of the proposed solutions. Attackers utilize this method to accomplish ARP spoofing because when an ARP request is sent to a host, it stores the MAC address in the ARP cache. The MAC
388
S. Jadhav et al.
address changes dynamically once there is a request for ARP. Therefore, a static entry is required to associate the IP and MAC addresses, thus preventing spoofing by the attacker in order to fix this problem. Rupal et al. [10] present an application that provides users with authentication and aids in ARP poisoning detection and prevention, though in a dynamic IP state. The application offers a framework that leverages secondary cache and the Internet control management protocol (ICMP) to validate pair entries of IP-MACs for every setup within the network. The prevention of ARP breach is suggested using a reliable and secure method. The system consists of three modules: detection and prevention of ARP poisoning; DHCP IP setup with the help of DHCP server; and user authentication through a server and MySQL database. If a new user joins the LAN network, the administrative process should be started first. A utility is installed on the system, and the system has been registered with a username, password, and MAC address. Then enter the lab to access the system when the utility and IP have been assigned. Users are prompted to input their authentication credentials once the system generates a unique ID. The server checks to see if the user has authenticated. If the user is verified, he is granted access to the Internet and the background sniffer process begins, starting with the phases of the algorithm for detection and avoidance. It was discovered that while XArp offers a detection mechanism, it lacks a prevention function. Sun et al. [11] discuss how software-defined network can be applied in order to prevent the address resolution protocol attacks in cloud computing domain. In the suggested method, hosts send the ARP packets, and a group of controllers detects them to detect the malicious packets and prevent the attack. Additionally, controllers occasionally check the statistical information contained in ARP packets to look for attacks while flooding the ARP packets. Controllers designate the flow entries on related switches to halt flow for a predetermined period of time once an assault is detected. Firstly, the real-time packet processing attempts to take in packets delivered by hosts, handle them, install entries on switches, or drop them. The second method involves periodically checking traffic, and it is used to check packet statistics on ports of switches. This section restricts traffic on a related port in the event of an ARP flooding attack in order to manage traffic. Duddu et al. [12] describe how the ARP spoofing attack can lead to the breach of data transfer through the HTTPS protocol and hence lead to the secure socket layer stripping attack. ARP can be used if a host in a certain network intends to transport any number of packets to a host in another network. Hence, the MAC address of the router and the ARP database for the IP address of the next node is checked and determined using the ARP table. The catch in this method is that particularly only those websites which do not implement HSTS, i.e., HTTP Strict Trasport Security are most prone to such attacks and are easier to break through. As the ARP attack takes place successfully, the traffic encountered is redirected to the HTTP port which strips the SSL. Selvarajan et al. [13] highlight on how without altering the protocol itself, ARP spoofing attacks can be recognized and countered. The suggested approach imporovises an existing method for detecting ARP spoofing that uses ICMP. The method
Detection and Mitigation of ARP Spoofing Attack
389
suggests the reconfiguring of ICMP echo-request datagram according to certain rules based on the already existing methodology. It discusses the scenario where there might be multiple machines already spoofed by the hacker before attacking other victim machines. This algorithm proposed can be applicable for fewer machines but may not be the best alternative in case of larger scalable networks. Majidha Fathima et al. [14] explain the softwares and programs such as Wireshark and Ettercap that play a crucial role in efficiency analysis in the traffic flow over the network. They can be used in detecting and identifying the ARP spoofing attack and the packets that are sneaked in through the traffic by the attacker machine. They monitor the datagrams that flow in the network through the switches and routers. If there is any malicious packet detection, then the particular traffic can be distinguished and separated from others, hence preventing any further packet flooding. The various visualization tools included in these softwares provide much details about the kinds of ARP packets received and their overall statistical throughput in the traffic.
3 Additional Information Required from Authors The section discusses how an ARP spoofing attack is placed when a client and server exchange packets. Detecting the attack on victim’s machine forms the next phase of study which includes tallying the ARP table entries, capturing hostile packets, identifying duplication of address using Wireshark software or running a specific highlighted Python script on the victim system. Preventing an ARP spoofing attack which facilitates mitigating the risk of attack and safeguarding the system has been proposed after the detection phase.
3.1 ARP Spoofing The problem regarding address resolution between the IP and Ethernet protocol is resolved with the help of ARP. A host first checks the local ARP cache before attempting to send IP messages to other hosts with the target IP address. The relevant target MAC address is received if the cache is hit. Otherwise, the ARP protocol launches an ARP request-response mechanism. The checking of this request-response process, however, does not entail any security measures. Thus, whenever an ARP reply packet is received by the host who initiated the request, it does not run any procedure to authenticate the sender of the ARP reply packet. The attacker can take advantage of this unauthenticated procedure to place an ARP spoofing attack. The following Fig. 2 displays how an ARP spoofing attack looks like. The research study is based on an ARP spoofing experiment carried out over a Wi-Fi network having two computers (one being the attacker and the other being the victim) with Windows 10 operating system. The router acts as the gateway in the
390
S. Jadhav et al.
Fig. 2 ARP spoofing
network. An ARP spoofer is developed by the attacker using a Python script which primarily utilizes the packet manipulation functionalities provided by the Scapy library. Thus, an ARP spoofing attack is launched using the following steps: Step (1) The IP address of the target machine (victim) along with the gateway IP is provided in the python script. Step (2) A function is defined to obtain the MAC address corresponding to the specified IP address. It utilizes the specified IP to create ARP requests using scapy.ARP() followed by setting the broadcast mac address to “ff:ff:ff:ff:ff:ff” using scapy.Ether(). ‘/’ is used to join them together so as to create a single packet which is to be broadcasted over the network. scapy.srp() is used to obtain the list of IP addresses which responded to the request (packet). Thus, using the ‘hwsrc’ parameter, the MAC address corresponding to the provided IP address could be retrieved. Step (3) A spoofing ARP packet is created using the target IP address and spoofing IP address along with the MAC address obtained in Step 2. This packet can also be termed as a falsified ARP response by the attacker which declares that the MAC addresses corresponding to the IP addresses of the target machine and router is the attacker’s MAC address. This packet when sent on the network will update the entries in the ARP table of the gateway and the target machine. After the ARP cache entries have been modified, the router and the target computer will speak to the attacker rather than one another directly. As a result, the attacker is now covertly intercepting all conversations.
Detection and Mitigation of ARP Spoofing Attack
391
Fig. 3 ARP table before the attack is placed
3.2 ARP Spoofing Detection 3.2.1
Detection by Checking ARP Table Entries
One of the simplest ways to detect the ARP spoofing attack is by checking the entries in the ARP table on the victim’s machine. It would be observed that the physical address corresponding to the dynamic IP of the gateway will be changed to the physical address of the attacker’s machine when ARP spoofing attack is placed. The ARP table can be viewed in the Windows command prompt using the ‘arp a’ command. The following Fig. 3 displays the ARP table entries on the victim’s computer before the attack is placed. The highlighted entry corresponds to the gateway IP (192.168.0.1). After placing the attack, the physical address of the gateway is changed, as depicted in Fig. 4. It can be noted that there are two entries with the same physical address but different IP addresses indicating that the machine is under ARP attack. In Fig. 4, the entry highlighted in red corresponds to the gateway, while the entry highlighted in blue corresponds to the attacker with IP address 192.168.0.10. However, the above approach involves manually checking the ARP table entries which is not always feasible and efficient. Thus, two more approaches are proposed in the study, one which involves running a Python script on the victim’s machine to detect ARP spoofing by sniffing the packet over the network by utilizing the Scapy library. The other method involves capturing the packets using the Wireshark tool.
3.2.2
Detection Using Python Script
The Python program involves sniffing the ARP response packets over the specified network interface using scapy.sniff (). A function is defined to obtain the MAC address corresponding to a specific IP, same as the function defined in the step no.
392
S. Jadhav et al.
Fig. 4 ARP table after the attack is placed
2 of the ARP spoofer. Further, the original (real) MAC address is retrieved and it is compared with the MAC address specified in the ARP response. If both the values are different, then it indicates that the system is under ARP spoofing attack.
3.2.3
Detection by Capturing Packets Using Wireshark
Capturing the packets using the Wireshark tool can determine the duplication in ARP responses, thus indicating that the system is under ARP spoofing attack.
3.3 ARP Spoofing Attack Prevention The ARP protocol does not confirm that a response to an ARP request actually originates from an authorized party. Additionally, it enables hosts to receive ARP answers even though they have never made a request. The ARP protocol has a flaw in this area, which makes it vulnerable to spoofing attacks and hence prevention methods have to be established. Static ARP entries: Every workstation on the network can have a static ARP entry defined for an IP address, which stops devices from listening to ARP answers for that address. This method requires a lot of overhead since manually updating ARP tables for all hosts is not conveniently feasible. Virtual Private Networks (VPNs): Allows devices to connect to the Internet through an encrypted tunnel where the attacker receives the cipher text only. Lessfeasible solution at the organizational level because encrypting and decrypting on that scale hinders the network’s performance.
Detection and Mitigation of ARP Spoofing Attack
393
Packet filters: By examining the contradictory source information, packet filtering systems can spot poisoned ARP packets and stop them before they reach network devices. Network Isolation: ARP messages only have a scope of the local subnet; hence, a well-segmented network performs better than a typical network. An attack on one area of a network has no impact on how the other areas operate. Important resources can be placed in a dedicated segment with high security to mitigate risks.
4 Experimental Results The experimental results of ARP spoofing and its detection using the Wireshark tool are presented in this section. Once the attacker runs the ARP spoofer Python script, continuous ARP requests are broadcasted over the network as seen in Fig. 5. After applying the ARP filter on Wireshark, the ARP requests and responses can be analyzed properly. Figure 6 displays the ARP requests broadcasted on the network indicated by ‘Who has..’. The first request asks over the network, who has IP address 192.168.0.1 (gateway IP), to convey it to the IP address 192.168.0.10 (attacker’s IP). Similarly, the second request asks to tell the attacker’s IP address (192.168.0.10), who has the IP address 192.168.0.5 (victim’s IP) over the network. After a while, duplication of addresses for the victim’s IP and gateway IP is detected by the wireshark tool indicating that an ARP spoofing attack is being placed as depicted in Fig. 7.
Fig. 5 ARP packets sent by the attacker over the network (captured using Wireshark)
Fig. 6 ARP requests broadcasted on the network by the attacker
394
S. Jadhav et al.
Fig. 7 Duplication of address in the packets indicating ARP spoofing attack has been placed
Thus, ARP spoofing attack can be detected using Wireshark. However, it is quite important to protect the computer from such attacks. One of the simplest solutions to safeguard the computer system from ARP spoofing attack is by modifying the ARP table with static address entry using the ‘arp -s’ command. The ‘arp -s’ command takes two parameters including the IP address and corresponding MAC address. Thus, the attacker IP along with the real MAC address could be specified in the command as given below, to create a static entry in the ARP table. ar p − s attacker _I P r eal_M AC This will create an entry of the host with the specified internet address and corresponding physical address. The entry created using ‘arp -s’ command is permanent. Wireshark provides a feature to obtain the network traffic I/O graph which is plotted between the time (sec)—X axis and the number of packets sent per seconds— Y axis. The following graph displayed in Fig. 8 showcases the entire network traffic (all packets) in brown line and the ARP packets sent over the network with respect to time in red line with a dot. As depicted in the following graph, the frequency of ARP packets over the network has increased due to the ARP spoofing attack.
Detection and Mitigation of ARP Spoofing Attack
395
Fig. 8 ARP packets within the entire network traffic
5 Conclusion Facilitating communication between two hosts on the same network has been the purpose of ARP. However, the absence of any authentication due to the fundamental assumption of the communicating hosts to be trustworthy exposed a substantial vulnerability. The experimental analysis of ARP spoofing, its detection at individual level, and methods to prevent it have been discussed in the study. Highlighting the method of placing an ARP spoofing attack and its detection using multiple approaches and strategies forms the center of the study. The proposed techniques for detection of ARP spoofing include detection via checking the entries in the ARP table, running a Python script on the victim’s computer, capturing the ARP packets and identifying the duplication using the Wireshark tool. A solution to safeguard the computer system from ARP attack after its detection, includes execution of the ‘arp -s’ command for adding a static entry corresponding to the attacker’s IP and real MAC address in the ARP table. However, the methods involve a lot of computation resources and are not feasible when the extent of the network increases manifold. Hence, it becomes utmost necessary to prevent the attack from taking place, and the methods involved are discussed during the later phases of the study. Precautionary measures include using virtual private networks (VPN), packet filters, and network isolation which reduce the chances of an attack by 83%.
396
S. Jadhav et al.
References 1. Rohatgi V, Goyal S (2020) A detailed survey for detection and mitigation techniques against ARP spoofing. In: 2020 fourth international conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC), pp 352–356. https://doi.org/10.1109/I-SMAC49090.2020.924 3604 2. Tripathi N, Mehtre BM (2014) Analysis of various ARP poisoning mitigation techniques: a comparison. In: 2014 international conference on control, instrumentation, communication and computational technologies (ICCICCT), pp 125–132. https://doi.org/10.1109/ICCICCT.2014. 6992942 3. Xing W, Zhao Y, Li T (2010) Research on the defense against ARP spoofing attacks based on winpcap. Second Int Workshop Educ Technol Comput Sci 2010:762–765. https://doi.org/10. 1109/ETCS.2010.75 4. Bhirud SG, Katkar V (2011) Light weight approach for IP-ARP spoofing detection and prevention. In: 2011 second Asian Himalayas international conference on internet (AH-ICI), pp 1–5. https://doi.org/10.1109/AHICI.2011.6113951 5. Zhao, Y, Guo R, Lv P (2020) ARP spoofing analysis and prevention. In: 2020 5th international conference on smart grid and electrical automation (ICSGEA), pp 572–575. https://doi.org/10. 1109/ICSGEA51094.2020.00130 6. Ibrahim HY, Ismael PM, Albabawat AA, Al-Khalil AB (2020) A secure mechanism to prevent ARP spoofing and ARP broadcasting in SDN. In: 2020 international conference on computer science and software engineering (CSASE), pp 13–19. https://doi.org/10.1109/CSASE48920. 2020.9142092 7. Jinhua G, Kejian X (2013) ARP spoofing detection algorithm using ICMP protocol. In: 2013 international conference on computer communication and informatics, pp 1–6. https://doi.org/ 10.1109/ICCCI.2013.6466290 8. Pandey P (2013) Prevention of ARP spoofing: a probe packet based technique. In: 2013 3rd IEEE international advance computing conference (IACC), pp 147–153. https://doi.org/10. 1109/IAdCC.2013.6514211 9. Saini RR, Gupta H (2015) A security framework against ARP spoofing. In: 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO) (trends and future directions), pp 1–6. https://doi.org/10.1109/ICRITO.2015.7359227 10. Rupal DR, Satasiya D, Kumar H, Agrawal A (2016) Detection and prevention of ARP poisoning in dynamic IP configuration. In: 2016 IEEE international conference on recent trends in electronics, information and communication technology (RTEICT), pp 1240–1244. https://doi.org/ 10.1109/RTEICT.2016.7808030 11. Sun S, Fu X, Luo B, Du X (2020) Detecting and mitigating ARP attacks in SDN-based cloud environment. In: IEEE INFOCOM 2020 - IEEE conference on computer communications workshops (INFOCOM WKSHPS), pp 659–664. https://doi.org/10.1109/INFOCOMWKSHP S50562.2020.9162965 12. Duddu S, Rishita Sai A, Sowjanya CLS, Rao GR, Siddabattula K (2020) Secure socket layer stripping attack using address resolution protocol spoofing. In: 2020 4th international conference on intelligent computing and control systems (ICICCS), pp 973–978. https://doi.org/10. 1109/ICICCS48265.2020.9120993 13. Selvarajan S, Mohan M, Chandavarkar BR (2020) Techniques to secure address resolution protocol. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). pp 1–7. https://doi.org/10.1109/ICCCNT49239.2020.9225413 14. Majidha Fathima KM, Santhiyakumari N (2021) A survey on network packet inspection and ARP poisoning using Wireshark and Ettercap. In: 2021 international conference on artificial intelligence and smart systems (ICAIS). pp 1136–1141. https://doi.org/10.1109/ICAIS50930. 2021.9395852
Stochastic Differential Equation-Based Testing Coverage SRGM by Using ANN Approach Ritu Bibyan, Sameer Anand, Anu G. Aggarwal, and Abhishek Tandon
Abstract Organizations must build software that is extremely dependable due to the high expense of resolving errors, safety issues, and legal obligations. Software developers have created models for measuring and tracking the evolution of dependability in their products. Most of the proposed software reliability growth models takes fault detection into account throughout both the phases of testing and the operational as counting process. In addition, the size of the software system affects how many faults are discovered during testing and also how many are discovered and rectified during debugging relative to the fault content at the beginning of the testing period. So, in a situation like this, we may conceptualize the software fault detection process in terms of testing coverage as a stochastic process with a continuous state space. In this research, we offer an Ito-type stochastic differential equation-based ANNbased testing coverage software reliability growth model. The proposed approach has been tested and examined using real failure datasets from software projects. The suggested model that incorporates the idea of stochastic differential equations in testing coverage-based SRGM outperforms the current NHPP-based model. Keywords SRGM · Testing coverage · Neural network · Stochastic differential equation (SDE) · Stochastic process
R. Bibyan · S. Anand (B) · A. G. Aggarwal · A. Tandon Department of Operational Research, University of Delhi, Delhi, India e-mail: [email protected] R. Bibyan e-mail: [email protected] A. G. Aggarwal e-mail: [email protected] A. Tandon e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_34
397
398
R. Bibyan et al.
1 Introduction The requirement for highly dependable software products is a result of modern society’s rising reliance on software technologies. Therefore, a software developer’s primary goal is to create software that can satisfy customer needs and address their dependability concerns. Many time-dependent SRGMs have been suggested under various assumptions since 1970 [1–4]. Throughout development, a number of external elements, such as testing skill, fault exposure ratio, change point, testing coverage, testing efficiency, fault reduction factor, have an impact on how reliable the program is. So, to identify and further enhance the SRGM’s accuracy, a model must be created that takes into account certain actual problems encountered during the testing process. In this paper, we take into account testing coverage with the random effect which is crucial for improving reliability. A software engineer may assess how thoroughly software has been tested using testing coverage. By including coverage in the SRGM, software reliability is improved, and defects are predicted more realistically and precisely. Testing coverage serves as a confidence-building exercise for users, letting them know when they may purchase software [1]. Many testing coverage SRGMs have been addressed by various researchers. In this study, we use the exponential testing coverage function for our proposed model. The fault detection rate is expressed in terms of testing coverage. Numerous errors are found and fixed throughout the extensive testing phase before the system is made available for purchase. However, as a result of the users’ discoveries of several faults, the software developer released an upgraded version of the system. Because of this, it is possible to assume the number of remaining faults as a stochastic process with continuous state space [5]. The Itô-type stochastic differential equation (SDE) was applied by Tamura and Yamada to explain the fault detection process during the testing phase, and numerous software reliability metrics were obtained utilizing the probability distribution of the stochastic process [6]. One of the most challenging jobs is to estimate ranges and start values for each parameter in the traditional SRGM parameter estimation. The model parameters estimate may vary greatly depending on the settings used for estimation. Without exercising caution, the researcher could end up with parameter values that are either too low or too high, even if the goodness of fit might still be satisfactory. Nonparametric models, on the other hand, make it possible to estimate the parameters of SRGMs without making any assumptions. Nonparametric models include all soft computing approaches such as artificial neural networks, fuzzy systems, and genetic algorithms. In implementing artificial neural networks (ANNs) for parameter estimation, we apply machine learning methodologies to choose the most pertinent weights for our model that would match both past and future data equally. We examine the proposed model’s goodness of fit performance by using the well-known exponential testing coverage function. • The artificial neural network model overcomes the limitations with classical SRGM parameter estimation [7].
Stochastic Differential Equation-Based Testing Coverage SRGM …
399
• When we use ANN, we do not have to perform the time-consuming task of providing the range of values for each parameter in advance. The influence of outside inputs and model assumptions may be eliminated when we create a model that can grow on its own based on facts about software failures. • The parameter estimation is improved using ANN. Nonparametric approaches can also provide models with higher prediction accuracy than parametric model. Since the accuracy of the estimation is enhanced by ANN and provides a better fit than conventional statistical parametric models, we consistently employ it for parameter estimation in all situations [8–11]. The structure of this paper is as follows: The literature survey is provided in Sect. 2 and the notation used the paper in Sect. 3. The assumptions and the model development for the suggested SRGM based on testing coverage utilizing SDE are given in Sect. 4. Section 4.3 describes the parameter estimation procedure using neural network. The data analysis and results are shown in Sect. 5, and the conclusion with future scope is provided in Sect. 6.
2 Literature Survey The overall modeling of Software Reliability Growth Models is based on NonHomogeneous Poisson Process, and during the past few decades, several features have been added to the SRGMs that affect the software’s dependability [4, 12, 13]. The NHPP failure process at time t is represented by the mean value function m(t) in SRGMs. Several SRGMs were proposed in the past by various researchers [14–21]. Testing coverage is a measure of how effective and thorough the tests were throughout the software system’s testing phase. Numerous researchers have used various time-dependent testing coverage functions such as Rayleigh [22], s-shaped [23], log-exponential [24], etc. Malaiya et al. [25] suggested a model explaining the connection between testing coverage and reliability. The suggested model explored the connection between both testing and defect coverage, and the authors also presented a hypothesis regarding the detectability of various coverage measures. Malaiya et al. developed a model with three main components: testing coverage, testing duration, and reliability and validated the model on four failure dataset [26]. The testing coverage and efficiency were defined by the author using the Logarithmic Exponential Model. Furthermore, Pham and Zhang suggested an SRGM along with the cost model while taking testing coverage into account, and they evaluated the model’s goodness of fit using several datasets [1]. Later, they compared the model to the recent SRGMs, which led to a superior forecast. Pham developed two models that account for the operational environment’s uncertainity and Log-Log distribution for the fault detection rate and testing coverage respectively [27]. Li and Pham looked at testing coverage as the rate of fault discovery in the context of an unpredictable operating envrironment [28]. Three real-world datasets are used to verify the model, and findings demonstrate a significant improvement in fitness as compared to existing
400
R. Bibyan et al.
SRGMs. The testing coverage function is viewed in this work as an exponential distribution function. Before the system is made available to the public, a lengthy testing phase allows for the discovery and correction of several faults. When consumers discover a lot of faults, the software developer updates the system and releases it. Thus, in this instance, it is possible to think of the number of defects that are still present as a stochastic process with a continuous state space [5]. By using Itô-type stochastic differential equations (SDE), Shigeru and Akio provided a straightforward SRGM to explain the fault detection process while testing procedure and obtain reliability measures based on the probability distribution of the stochastic process [29]. Soon after, during the system-testing stage of the distributed development environment, Lee et al. created a felxible SDE model reperesenting a fault detection process [30]. The researchers have estimated the parameter of SRGMs using MLE and LSE, but lately, nonparametric models of machine learning have been boosted up for estimation as well. Numerous types of research have been written about the potential of neural networks for predicting and estimating software dependability. Karunanithi et al. was a pioneer in the use of neural network architecture for software reliability estimation [31]. He also demonstrated the value of connectionist models for forecasting software reliability growth [31]. Cai et al. identified how the number of neurons of the input, hidden layers influences the network. Also, it was inferred that the neural network model is equally affected by the number of hidden layers [32]. He used the last 50 inter-failure intervals as the inputs to forecast the failure time in the future. Instead of developing probabilistic software reliability models, they promoted the creation of fuzzy SRGMs. In order to analyze reliability, Haque and Bansal created a neural network-based method that combined many preexisting models into a Dynamic Weighted Combinational Model (DWCM) [33]. Su and Huang proposed an ANN-based method for estimating and modeling software reliability [9]. Roy et al. presented a dynamic weighted combination model using a recurrent neural network (PRNNDWCM) for predicting software reliability [8]. To train the ANNs, a genetic algorithm (GA) is suggested. Lakshmanan and Ramasamy proposed a feed-forward NN approach for reliability modeling [10]. He compared the result with traditional SRGMS and achieved better goodness of fit. Later in 2016, he introduced an SRGM based on testing efforts using multi-layer feed-forward ANN.
3 Notations N (t): It is a random variable that depicts the number of faults discovered at time t throughout the testing procedure; m(t): : Expectation of N(t) which provides the anticipated fault content observed during the testing procedure up to time t; a: total number of faults present before testing; c(t): testing coverage rate;
Stochastic Differential Equation-Based Testing Coverage SRGM …
401
s(t): fault detection rate (FDR); σ : A Positive constant that represents the intensity of irregular fluctuation causing random effect; ϕ(t): Standard Gaussian white noise.
4 Model Development 4.1 Assumptions i. The fault detection process of the software is modeled as a stochastic process with continuous state space. ii. As the testing process progresses, the number of remaining faults decreases gradually. iii. The software is subjected to failure due to faults present in the software while execution. c' (t) iv. The FDR is taken in terms of testing coverage, i.e., 1−c(t) . Here, c(t) is the testing coverage rate, i.e., the proportion of software code covered up to time t. v. The fault detection process is assumed to be perfect which means no new faults are introduced.
4.2 Framework for the Model In past research, the software reliability growth models are modeled using NHPP’s assumptions, in which the fault detection/observation is treated as a discrete counting process during the testing phase. Shigeru and Akio [29] recognized that the size of the software system is directly proportional to the faults observed during the testing period. Also, the number of faults observed and executed while debugging process becomes significantly small when compared to the actual number of faults present at the initial of the testing period. This stochastic behavior of the fault detection process is studied by modeling it as a stochastic process with continuous state space. Since the underlying faults are detected and executed from the software while testing, the number of fault content also reduces with time while the testing process continues. So, we can assume the following differential equation. dN (t) = s(t)[a(t) − N (t)] dt
(1) '
c (t) The fault detection rate s(t) can be written in terms of testing coverage as 1−c(t) [1]. The behavior of the s(t) is unknown, and it is assumed that there is an influence of random effect on the testing coverage causing irregular fluctuation. We can represent s(t) in Eq. (2) as
402
R. Bibyan et al.
s(t) =
c' (t) + Noise 1 − c(t)
(2)
and a(t) = a, as we have the assumed perfect debugging process. Equation (1) can be written as ] [ ' dN (t) c (t) + Noise [a − N (t)] = (3) dt 1 − c(t) We assume Standard Gaussian white noise as ϕ(t) and σ as a positive constant. Therefore, Eq. (3) can be written as ] [ ' c (t) dN (t) = + σ ϕ(t) [a − N (t)] dt 1 − c(t)
(4)
The above Eq. (4) is converted to an Itô-type SDE [5, 29]. ] 1 2 c' (t) − σ [a − N (t)]dt + σ [a − N (t)]dw(t) dN (t) = 1 − c(t) 2 [
(5)
where W (t) is a one-dimensional Wiener process. Now we apply the Itô solution to Eq. (5) by considering the initial condition as N (0) = 0, we get ⎡ N (t) = a ⎣1 − exp
⎧ t ⎨∮ ⎩ 0
⎫⎤ ⎬ c' (x) dx − σ W (t) ⎦ ⎭ 1 − c(t)
(6)
The Weiner process is a Gaussian process and has the following properties [34]: Prob[w(0) = 0] = 1, E[w(t)] = 0, [ ] [ ( )] E w(t)w t ' = min t, t ' , Taking the expectation of N (t) given in Eq. (6), we get ⎡
⎛ t ∮ ⎣ ⎝ E[N (t)] = m(t) = a 1 − exp 0
⎞ ⎤ c' (x) dx ⎠ E(exp(−σ W (T )))⎦ 1 − c(x)
) ( and the E[exp(−σ W (t)] = exp 21 σ 2 t . So, we get the final expression for m(t) in Eq. (8) as
(7)
Stochastic Differential Equation-Based Testing Coverage SRGM …
⎡ m(t) = a ⎣1 − exp
⎧ t ⎨∮ ⎩ 0
403
⎫⎤ 1 2 ⎬⎦ c' (x) dx + σ t ⎭ 1 − c(x) 2
(8)
Now, consider exponential testing coverage for our proposed model. The reliability modeling of software has been significantly influenced by the conventional Goel Okumoto Model [16]. It is one of the fundamental models with physically based parameters. The model claims that NHPP models the failure process, using MVF as its ( ) m(t) = a 1 − e−bt
(9)
λ(t) = abe−bt
(10)
The Coverage function and FDR are given as c(t) = 1 − e−bt s∗ (t) =
(11)
c' (t) =b 1 − c(t)
(12)
Now, we use the FDR of the exponential testing coverage provided in Eq. (12) in the final expression of the model in Eq. (8), and we get ⎡
⎛
m(t) = a ⎣1 − exp⎝−
⎞⎤
∮t bd x +
σ t ⎠⎦ 2 2
0
)] ( σ 2t m(t) = a 1 − exp −bt + 2 [
(13)
4.3 Neural Network Architecture This section of the paper briefly explains the application of an ANN-based approach for modeling software reliability [9]. The ANN architecture for our proposed model given in Eq. (13) consists of two hidden layers as shown in Fig. 1. The activation functions assigned to each node of the hidden layer are α1 (x) = x
(14)
x 2
(15)
α2 (x) =
404
R. Bibyan et al.
Fig. 1 Artificial neural network architecture
β(x) = 1 − e−x
(16)
The activation function used for the output layer neuron is γ (x) = x. The input provided to the first node of the first hidden layer is h 1 input (t) = w1 (t)
(17)
The obtained output from the first node of the first hidden layer is ( ) h 1 (t) = α1 h 1 input (t) = α1 (w1 t) = w1 t
(18)
The input to the second node of the first hidden layer is h 2 input (t) = w2 (t)
(19)
The output from the second node of the first hidden layer is ( ) w2 t h 2 (t) = α2 h 2 input (t) = α2 (w2 t) = 2
(20)
The input provided to the node of the second hidden layer is h 3 input (t) = w3 w1 t + w4
w2 t 2
The output obtained from the second hidden layer is
(21)
Stochastic Differential Equation-Based Testing Coverage SRGM …
) ( ( ) w2 t w2 t = 1 − e−(w3 w1 t+w4 2 ) h 3 (t) = β h 3 input (t) = β w3 w1 t + w4 2
405
(22)
The input given to the output layer is ) ( w2 t yinput (t) = w5 1 − e−(w3 w1 t+w4 2 )
(23)
The output from the output layer is ) ( ( ) w2 t y(t) = γ yinput (t) = γ w5 1 − e−(w3 w1 t+w4 2 ) ) ( w2 t = w5 1 − e−(w3 w1 t+w4 2 )
(24)
If we use w1 = b, w2 = σ 2 , w3 = 1, w4 = 1, w5 = a Eq. (24), we can obtain Eq. (13) which is exponential testing coverage-based SRGM using SDE.
5 Results We have performed the parameter estimation of the proposed model on two real software failure datasets provided in Table 1.
5.1 Comparison Criteria The effectiveness of SRGM is assessed based on their ability to match historical software fault data (goodness-of-fit). How well a model matches the data is the question that the term “goodness-of-fit” refers to. The comparison or performance criteria are described in Table 2. Table 1 Dataset description
Datasets
Description
DS-1
Release-1 for tandem computers company from woods [35]
DS-2
Brooks and Motley dataset [36]
406
R. Bibyan et al.
Table 2 Performance criteria Criteria
Description
MSE
MSE is used to determine the difference between the real and estimated values. The quality of fit is better the lower the value ∑k
MSE =
| |
|
2| ˆ i )) | (m i −m(t
i=1 |
k
MAE
The mean absolute error gives an absolute value deviation. The quality of fit is better the lower the value ∑k ˆ i ))| |(m −m(t MAE = i=1 i
Bias
The average prediction error at all times t provides the bias. The quality of fit is better the lower the value ∑k ˆ )−m i | |m(t Bias = i=1 k i
Variance
The standard deviation of the prediction error provides the variance / ∑k 2 ˆ i )−Bias) i=1 (m i −m(t Variance =
k
k−1
Root mean square The percentage error is measured by the RMSPE. It assesses how accurately percentage error the model forecasts the values. The quality of fit is better the lower the value √ RMSPE = Variance2 + Bias2
5.2 Data Analysis In order to choose proper weights for our model, we employ machine learning techniques. The goodness of fit analysis is done to show how well historical data was fitted. The goal is to guarantee that our proposed model will accurately predict future data in addition to improving the fit for the historical data. In the past, the predictive validity of software reliability models—both short and long term—was assessed to make sure that the model would accurately forecast future data. We use hold-out crossvalidation, a traditional machine learning technique, to obtain superior goodness of fit for the historical dataset and predictive validity to characterize the prospective data. When we utilize ANN, different weight settings may result in an equally excellent fit. The significant fits of the models are dependent on the random values assigned initially to the weights. The model might not reliably similarly predict future data if weights are chosen exclusively based on the lowest training error. If the weights used provide a low training error but a large validation error, this implies overfitting or maybe there is a huge variation. In order to ensure that the model can appropriately fit new data, we run validation on the remaining 20% of the nonoverlapping validation dataset after obtaining a low training error on 60% of the dataset using the selected weights. The cross-validation process for selecting acceptable model weights is done by following the given steps. Step 1: Use the 60% training dataset to train.
Stochastic Differential Equation-Based Testing Coverage SRGM …
407
Step 2: Determine the correctness of the training dataset by propagating errors across the entire network and modifying the weights with the ANN feed-forward backpropagation technique. Step 3: Now, 20% validation dataset is used to validate the model. Step 4: Once we obtain the accuracy criterion, discontinue the training. Otherwise, resume the procedure of training till the time accuracy criterion is attained on the validation dataset. Step 5: Run a test on the rest of the 20% of the test dataset to make sure that acceptable weights were chosen for the model. After determining the right weights for the model, the model is evaluated for performance. To evaluate the accuracy of the model given in Eq. (11), we compute different performance measures like MSE, MAE, Bias, Variation, and RMPSE given in Table 3. The results show that our proposed SDE model produces better results due to decreased MSE, MAE, Variation, Bias, and RMSPE. The goodness of fit curve for both the datasets are provided in Figs. 2 and 3. From the curves, we can observe that the model fits the data significantly well. Firstly, we have compared our model based on SDE with the classical G-O model [16]. It is observed that our model provides a better fit. Secondly, we have done a comparison with the model proposed by Wang and Li [37]. It is noted that our proposed model has shown more improvement in terms of RMPSE. Table 3 Results for performance criteria Dataset
MAE
Variation
DS-1
MSE 6.446
5.236
2.790
BIAS 1.658
RMPSE 1.623
DS-2
10.523
6.759
1.758
-0.089
2.558
Goodness of fit for DS-1 CUMULATIVE FAULTS
120 100 80 60
ACTUAL
40
PRED
20
0 0
5
10
15
TIME (IN WEEKS)
Fig. 2 Goodness of fit curve for DS-1
20
25
408
R. Bibyan et al.
Fig. 3 Goodness of fit curve for DS-2
6 Conclusion and Future Work This study uses an ANN technique to provide a testing coverage SRGM based on Itôtype stochastic differential equations. The goodness-of-fit analysis was performed by considering two real failure datasets of software. The proposed model of goodnessof-fit is compared to Goel and Okumoto [16] for the DS-1. We have also compared our proposed model to the model proposed by Wang and Li [37]. It is noted that our proposed model has shown a larger improvement in terms of RMPSE. The collected findings suggest that the model has a better fit and broader applicability to failure datasets. The numerical illustrations show that the model proposed using SDE produces better outcomes due to decreased MSE, MAE, Variation, Bias, and RMSPE. SDE’s applicability is not only limited to our proposed model; it may also be implemented to enhance the outcomes of other SRGMs. In the future, the proposed model, in conjunction with imperfect debugging and error generation, can be utilized for further study.
References 1. Pham H, Zhang X (2003) NHPP software reliability and cost models with testing coverage. Eur J Oper Res 145(2):443–454 2. Anand S, Verma V, Aggarwal AG (2018) 2-Dimensional multi-release software reliability modelling considering fault reduction factor under imperfect debugging. Ingeniería Solidaria 14(25):1–12 3. Gandhi N, Gondwal N, Tandon A (2017) Reliability modeling of OSS systems based on innovation-diffusion theory and imperfect debugging. In: ICITKM 4. Kapur P, Aggarwal AG, Nijhawan N (2014) A discrete SRGM for multi release software system. Int J Ind Syst Eng 16(2):143–155 5. Øksendal B (2003) Stochastic differential equations. Stochastic differential equations. Springer, pp 65–84
Stochastic Differential Equation-Based Testing Coverage SRGM …
409
6. Tamura Y, Yamada S (2006) A flexible stochastic differential equation model in distributed development environment. Eur J Oper Res 168(1):143–152 7. Pham H, Nordmann L, Zhang Z (1999) A general imperfect-software-debugging model with S-shaped fault-detection rate. IEEE Trans Reliab 48(2):169–175 8. Roy P et al (2014) Robust feedforward and recurrent neural network based dynamic weighted combination models for software reliability prediction. Appl Soft Comput 22:629–637 9. Su Y-S, Huang C-Y (2007) Neural-network-based approaches for software reliability estimation using dynamic weighted combinational models. J Syst Softw 80(4):606–615 10. Lakshmanan I, Ramasamy S (2015) An artificial neural-network approach to software reliability growth modeling. Procedia Comput Sci 57:695–702 11. Ramasamy S, Lakshmanan I (2016) Application of artificial neural network for software reliability growth modeling with testing effort. Indian J Sci Technol 9(29):90093 12. Kapur P et al (2012) Two dimensional multi-release software reliability modeling and optimal release planning. IEEE Trans Reliab 61(3):758–768 13. Xie M (1991) Software reliability modelling, vol 1. World Scientific 14. Bittanti S, et al (1988) A flexible modelling approach for software reliability growth. In: Software reliability modelling and identification, pp 101–140 15. Downs T, Scott A (1992) Evaluating the performance of software-reliability models. IEEE Trans Reliab 41(4):533–538 16. Goel AL, Okumoto K (1979) Time-dependent error-detection rate model for software reliability and other performance measures. IEEE Trans Reliab 28(3):206–211 17. Kapur P, Garg R (1992) A software reliability growth model for an error-removal phenomenon. Softw Eng J 7(4):291–294 18. Ohba M (1984) Software reliability analysis models. IBM J Res Dev 28(4):428–443 19. Ohba M (1984) Inflection S-shaped software reliability growth model. Stochastic models in reliability theory. Springer, pp 144–162 20. Yamada S, Ohba M, Osaki S (1983) S-shaped reliability growth modeling for software error detection. IEEE Trans Reliab 32(5):475–484 21. Kapur P, Younes S, Agarwala S (1995) Generalised Erlang model with n types of faults. ASOR Bull 14(1):5–11 22. Xie M et al (2007) A study of the modeling and analysis of software fault-detection and fault-correction processes. Qual Reliab Eng Int 23(4):459–470 23. Hwang S, Pham H (2008) Quasi-renewal time-delay fault-removal consideration in software reliability modeling. IEEE Trans Syst Man Cyber-Part A: Syst Humans 39(1):200–209 24. Huang C-Y, Lin C-T (2006) Software reliability analysis by considering fault dependency and debugging time lag. IEEE Trans Reliab 55(3):436–450 25. Malaiya YK, et al (1994) The relationship between test coverage and reliability. In: Proceedings of 1994 IEEE international symposium on software reliability engineering. IEEE 26. Malaiya YK et al (2002) Software reliability growth with test coverage. IEEE Trans Reliab 51(4):420–426 27. Pham H (2014) Loglog fault-detection rate and testing coverage software reliability models subject to random environments. Vietnam J Comput Sci 1(1):39–45 28. Li Q, Pham H (2017) NHPP software reliability model considering the uncertainty of operating environments with imperfect debugging and testing coverage. Appl Math Model 51:68–85 29. Shigeru Y, Akio N (2003) A stochastic differential equation model for software reliability assessment and its goodness-of-fit. Int J Reliab Appl 4(1):1–12 30. Lee CH, Kim YT, Park DH (2004) S-shaped software reliability growth models derived from stochastic differential equations. IIE Trans 36(12):1193–1199 31. Karunanithi N, Malaiya YK, Whitley D (1992) The scaling problem in neural networks for software reliability prediction. In ISSRE 32. Cai K-Y et al (2001) On the neural network approach in software reliability modeling. J Syst Softw 58(1):47–62 33. Haque F, Bansal S (2012) Software reliability estimation models: a comparative analysis. Int J Comput Appl 43(13):27–31
410
R. Bibyan et al.
34. Kapur P, et al (2009) Stochastic differential equation-based flexible software reliability growth model. Math Prob Eng 35. Wood A (1996) Predicting software reliability. Computer 29(11):69–77 36. Brooks W, Motley R (1980) Analysis of discrete software reliability models. IBM Federal Systems Div Gaithersburg MD 37. Wang G, Li W (2010) Research of software reliability combination model based on neural net. In: 2010 second world congress on software engineering. IEEE
Integrated Quantum Health Care with Predictive Intelligence Approach Tridiv Swain, Sushruta Mishra, Deepak Gupta, and Ahmed Alkhayyat
Abstract According to recent studies, nondeterministic computers have an edge over standard computer devices. The use of nondeterministic computers accelerates illness detection and therapy. It delivers an exponential improvement in computer speeds, reducing processing time from years to minutes. In health care, these computers have the potential to provide a broad range of applications for healthcare service providers, including healthcare planning, diagnostic speed, medicine customisation and pricing optimisation. Moreover, as connectivity to mental wellbeing data sources increases, so does the use of quantum computing and traditional modelling techniques that can save civilian lives. Since quantum technologies are believed to be incapable of forming unusual patterns that classical technologies are supposed to be not capable of producing, it is plausible to expect quantum computers to surpass classical computers on tasks involving machine learning. The mapping from quantum optics to quantum circuits will be explained further below. Keyword Qubit · Quantum computing · Silico trials · Quantum random access coding (QRAC) · Tequila · VQC · HER · Healthcare security
1 Introduction Quantum computing has the potential to improve medical image processing operations such as edge detection and picture matching. These improvements would vastly improve image-aided diagnosis. Single-cell techniques may also be employed T. Swain · S. Mishra (B) Kalinga Institute of Industrial Technology, Deemed To Be University, Bhubaneswar, India e-mail: [email protected] D. Gupta Maharaja Agrasen Institute of Technology, Delhi, India e-mail: [email protected] A. Alkhayyat College of Technical Engineering, The Islamic University, Najaf, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_35
411
412
T. Swain et al.
in current diagnostic processes. Computer technology may enable supersonic therapy innovation, in silico clinical trials with virtual persons replicated “live”, full-speed entire genome testing and analytics, hospital data transfer, anticipated wellness and clinical information assurance through quantum uncertainty. Quantum computing [1], which is based on subatomic particles principles, has the potential to offer considerable improvements over traditional computers. This capability of quantum computing allows for the solving of many hitherto intractable issues in encrypted communication and finance. IoT is a revolutionary innovation that deals with massive amounts of data. New progress in quantum computing technology [2], particularly the dramatic advancement of the quantum computing environment via the cloud, such as in IBM Q, has stimulated various studies on the implementation of intermediate-scale quantum systems, also known as noisy intermediate-scale quantum (NISQ) handsets. Such research on quantum computer application examples varies from material science simulations [3] to social science models such as finance. In each of these circumstances, it is critical to investigate how quantum computers might be utilised to solve equations that are required in certain domains. Quantum automation brings together two of the most intriguing fields of contemporary research: nondeterministic computing and conventional machine learning. It investigates how results and techniques from one field may be used to resolve conflicts in the other. Due to an ever-increasing volume of data, modern machine learning systems are fast reaching the boundaries of conventional computational models. Quantum processing capability may give an edge in tasks involving machine learning in this regard. The area of quantum learning algorithms studies how to create and deploy quantum software that can do machine learning quicker than ordinary computers. While quantum systems are likely to be unable of producing intriguing motifs that theory of legitimacy is projected to be incapable of producing effectively, it is reasonable to expect supercomputers to outperform processing power on processing learning tasks. Public health system relies on internet data exchange to allow better connection and reducing service delivery. Smart health care establishes communication between the real and the virtual worlds services which are offered everywhere. Medical specialists offer excellent medical outcomes while also minimising the time required to evaluate health data by keeping systems up to date and categorising medical data in a logical framework, as well as retrieving and accessing patients’ previous data. Because of rapid growth in recent health care, there is development of quantum machine learning using gate-based technology which will be used as realistic scenario in future.
Integrated Quantum Health Care with Predictive Intelligence Approach
413
2 Literature Review The researchers in research [3] explain that for quantum effects with just few hundred particles, the equivalent memory needed to store or modify the core wave equation exceeds the capability of the best classical computers, posing severe hurdles in numerical simulation. This indicates that new quantum device and experiment verification and design are inherently confined to small system sizes. In the research [4], to find optical computer chip devices that execute a particular shift between input and output states, algorithms and optimisation methodologies can be applied. The technique discovers circuits for producing a desired qubit in the simplest situation of a 1 bit state as shown in Fig. 1. In the research [5], the latest work shows how to translate discrete properties with fewer quantum bits by employing Quantum Random Access Coding (QRAC), a fundamental technique for encoding binary sequences into quantum fluctuations. In the research [6], in order to speed quantum computing, the researcher offers an improved approach for installing quantum computing programs on a cloud computing platform that facilitates information transferring between servers via a message passing interface.
Fig. 1 Model of quantum state in quantum computing
414
T. Swain et al.
Researchers [7] provide a quantum walk method for implementing decentralised quantum computing in a mesh node in this paper. To conduct distributed quantum activities, the protocol employs a quantum stroll as a conventional process variable. They study the quantum walk model, a discrete-time extension that takes into account the link of a quantitative walker device in the directed graph with quantum registers within network devices. The researchers describe in this study [8] that they conduct a systematic survey, categorise and evaluate publications, tools, methodologies and platforms that aid quantum developing software from an application and quantum computing viewpoint. They offer quantum information layers as well as quantum computer access. And, if appropriate, a circuit simulator, as well as accessible technologies like as Cirq, TensorFlow Quantum, ProjectQ and many more, make it easier to construct quantum applications in Python using a powerful and simple language. In this publication [9], researchers provide an optimum framework for expressing dispersed policy measures that execute a dynamic network with gates operating on qubits in separate nodes. The fault tolerant actions in the quantum network may be explained using a programmable logic plane quantum walk protocol. In this study [10], researchers look at the digital system needed to accomplish arbitrary one-qubit rotations and manage NOT gates in precisely specified photodetectors. In the existence of a theme, MBQC provides a state-generating system with the primary goal of understanding the synchronisation limits set by digital systems on transmission media and quantum equipment. In this research [11], quantum random access coding (QRAC) is utilised to efficiently convert such discontinuous properties into a limited amount of qubits for VQC. We illustrate the constraints and capabilities of various encoding techniques in computer modelling. The impact of different numbers of federal network phases, noise error parameters and interruption situations on algorithm output correctness was explored in the study [12], which was conducted on the IBM Quantum Lab infrastructure with superconductor quantum server and simulators. In the research [13], researchers created a heralded single photon source utilising a high-quality silicon microring resonator, from which we designed and built lowloss quantum photonic chips enabling of entangled state creation, bell projection or fusion operation and spatial evaluation all on the same chip. In the study [14], researchers develop and demonstrate a strategy for reducing software crosstalk noise in the NISQ system. Our findings show that reducing crosstalk in software is doable and can considerably enhance the reliability of chaotic quantum computers. In the research [15] ProjectQ framework which is an open source program initiative for quantum computing was developed and it allows for the analysis of quantum computing as well as their implementation on actual quantum hardware. In the research [16], here the preceding research looked at how to build a library for quantum computer simulation utilising hardware acceleration via the OpenCL framework.
Integrated Quantum Health Care with Predictive Intelligence Approach
415
Fig. 2 Model of quantum interface in quantum computing
In the research [17], researchers demonstrated how error mitigation may improve the outcomes of a noisy quantum computing using experimental and numerical demonstrations as depicted in Fig. 2.
3 Proposed Model Quantum algorithms bring together two of the most intriguing fields of contemporary research: Quantum computing and conventional machine learning. It examines the connection between quantum computing and machine learning, looking at how discoveries and methodologies from one discipline might be applied to issues in the other. Due to a growing volume of data, current machine learning techniques are fast reaching the boundaries of conventional numerical simulations. Quantum processing capability may give an edge in machine learning projects in this regard. The area of quantum learning algorithms studies how to create and deploy quantum software that can do machine learning quicker than ordinary computers. Because of recent major advances in the creation of quantum computational intelligence computers in health care, we predict that building optoelectronic hardware with nondeterministic systems will become a plausible case there in consideration with future. Yet, the optimisation methodologies provided here might be the useful benchmarks in addition to quantum systems. The proposed quantum optimiser for health care along with its steps is shown in Fig. 3. Topological Optimiser: Topology optimisation is a computational function for optimising the spatial distribution of material within a defined region by satisfying previously established constraints and minimising a predefined cost function.
416
T. Swain et al.
Fig. 3 Proposed model for quantum optimiser for healthcare security
Variational Optimiser: Variational optimisation has been proposed as a way for solving optimisation problems faster and on a greater scale than traditional methods enable. Digital Quantum Computer: The goal of experimental quantum computing is to improve the capabilities of present quantum processors so that some of quantum computers’ numerous potential applications can be implemented on real physical devices. The hardware quantum circuits for hardware are shown in Fig. 4. Let us now talk about qubits, which are the basic components of quantum computers. These are computational systems that can take a wide range of quantum values and scale exponentially beyond the usual ones and zeros. A two-component, for example, can perform four concurrent calculations, whereas a three-qubit system can perform eight, and a four-qubit system can perform sixteen. Consider Fig. 3, where a bit can have values of 0 s or 1 s and is represented by the letters A and B. The sphere representation, on the other hand, demonstrates that the qubit can accept multiple values identified on the sphere’s surface. Each point is paired with a latitude–longitude pair that represents 0 or 1 and phase values, respectively. One advantage of modelling electro-optic systems on a networked quantum computer is the use of parametrised components within a completely automatically differentiable framework. The goal is to find the efficient in performing for a quantum electronic viewfinder for quantum teleportation, quantum sensing and experiments verifying quantum physics basics. There are several advantages to improve the optical transmission setup on a widely used gate-based dynamic method. We can only examine the repeatability of an actual product state as a substitute for the validity of a complex entangled state due to the unique scenario.
Integrated Quantum Health Care with Predictive Intelligence Approach
417
Fig. 4 Model for quantum circuits for hardware in health care
The nondeterministic element of the optimisation is carried out in the spirit of finite difference quantum eigensolvers (VQEs), which were first suggested to variationally approximate the eigenstates of a specific Hamiltonian. We employ a variational approach in this work to maximise true reflection for a given goal state, which may be represented as the expected value. Fψ = |hψ||ϕi|2 = hϕ|H |ϕi, with the Hamiltonian H = |ψihψ| and where ψ is the desired target state. Depending on ψ, the number of measurable components (tensor products of Pauli matrices) is in the Hamiltonian. Pseudocode for implementation of Tequila: a = tq.variables(“a”) U = tq.gates.Ry(angle = (−a**2).apply(tq.numpy.exp)*pi, target = 0) U + = tq.gates.X(target = 1, control = 0) H = tq.QubitHamiltonian.from_string(“−1.0*X(0)X(1) + 0.5Z(0) + Y(1)”) E = tq.ExpectatioonValue(H = H, U = U) dE = tq.grad(E, “a”) Objective = E + (−dE**2).apply(tq.numpy.exp)
418
T. Swain et al.
Result = tq.minimize(method = ”phoenics”, objective = objective) For measuring fidelities: F = | < ϕ(θ )|ψ >|2 F =< ϕ(θ )|H|ϕ(θ ) >=< H >U(θ) So after implementing the above algorithm using machine learning paradigm, we got the following logarithmic graph, which is found to be increasing exponentially in number of trotter steps.
4 Quantum Healthcare Applications The preceding represents some of the most significant functions of quantum computing in wellness [18]. Below are the use cases of quantum nondeterministic computing in health care: 1. Regenerative medicine and development: Quantum computing allows doctors to replicate complex reactions at the nanoscale, which is necessary for medical research. This will be notably useful in diagnosis, therapy, regenerative medicine and monitoring [19]. AI approaches are increasingly being used to help with patient diagnosis. When compared to traditional computer skills, quantum computing speeds up the processing of this data by hundreds of times. 2. Radiotherapy: Radiation therapy, which employs radiation beams to kill malignant cells and prevent their proliferation, has been used to treat cancer. However, radiation is a delicate technique that involves very exact computations to target cancer-causing areas while preventing any influence on healthy body cells [20]. The range of options for simulations analysing quantum computing is extensive, allowing numerous simulations to run concurrently and developing an ideal strategy faster. 3. Sequencing and analysing DNA full speed: Genetics and genomics have seen dramatic developments in the previous. However, projections will be more reliable since computer programs can process considerably more information than traditional computers and are able to go through every bit of genetic makeup into health records [21]. Quantum computing has the ability to eliminate the uncertainty from genomes and genetics, which leads to better health for us all. 4. Preclinical studies in silico discovery: The silico clinical trials utilise no humans, animals, or even a single cell, professional decision, or drug to evaluate a specific treatment, but its impact may be carefully documented [22]. Quantum computing has the potential to greatly enhance the advancement of “virtual characters” and comprehensive simulations such as the HumMod, which contains over 1500 equations and 10,000 biological mucus secretion, vascular tissue, electrolytes, hormones, metabolic activity and galvanic skin response which are all factors.
Integrated Quantum Health Care with Predictive Intelligence Approach
419
5 Result Analysis Here to do the analysis of the model, we used tequila which is a package in Python used to solve various quantum computing algorithms in machine learning that were created for the rapid and adaptable creation, testing and formation of new quantum computation in electrical structure and other domains. Tequila uses abstract expectancy values that can be combined, altered, rejected against and improved. After being evaluated, the abstract data structures are assembled to execute on cutting-edge quantum simulators or interfaces. As highlighted in Fig. 5, the proposed model is found to be more accurate, and the accuracy is found to be 98.9%, which is suitable for machine learning models using tequila. Also in comparison with other existing predictive intelligence models, the quantum healthcare model is more reliable and efficient as shown in Fig. 6.
Fig. 5 Accuracy of the machine learning model in quantum computing
Fig. 6 Graph for quantum computing using different machine learning models
420
T. Swain et al.
6 Conclusion Throughout this study, researchers looked at quantum computing solutions from the standpoint of healthcare systems. They spoke about the most important needs for quantum computing system implementations in the healthcare realm. Combinations with classical algorithms, on the other hand, should not be limited, but rather utilised to achieve the best of both worlds. Our proposal has various advantages than working directly with photonic equipment in health care. Because quantum systems are likely to be unable of establishing uncommon patterns that conventional systems are anticipated to be hard of successfully producing, it is reasonable to expect quantum computers to surpass algorithms on computational challenges. This method becomes more efficient when we use tequila as the package as it is faster and provides more accurate results, so our accuracy came to be as 98.9%. The only challenge facing presently will be the system requirements; if we can revoke that, then it will be better way to calculate the accuracy. Future Scope The subject of quantum computation is rapidly developing, and so are the fields of research. There is certainly a lot of space for advancement in the realm of quantum algorithms, where the utility of various algorithms will grow with time. Quantum computing has the ability to improve medical image processing, including processing processes like edge detection and picture matching. These enhancements would significantly improve image-aided diagnosis. Furthermore, single-cell approaches may be used in current diagnostic procedures.
References 1. Mohapatra SK, Mishra S, Tripathy HK, Alkhayyat A (2022) A sustainable data-driven energy consumption assessment model for building infrastructures in resource constraint environment. Sustain Energy Technol Assess 53:102697 2. Mohanty A, Mishra S (2022) A comprehensive study of explainable artificial intelligence in healthcare. In: Augmented intelligence in healthcare: a pragmatic and integrated analysis. Springer, Singapore, pp 475–502 3. Kottmann JS, Krenn M, Kyaw TH, Alperin-Lea S, Aspuru-Guzik A (2021) Quantum computeraided design of quantum optics hardware. Q Sci Technol 6(3):035010. https://doi.org/10.1088/ 2058-9565/abfc94 4. Arrazola JM, Bromley TR, Izaac J, Myers CR, Brádler K, Killoran N (2019) Machine learning method for state preparation and gate synthesis on photonic quantum computers. Q Sci Technol 4(2):024004. https://doi.org/10.1088/2058-9565/aaf59e 5. Thumwanit N, Lortaraprasert C, Yano H, Raymond R (2021) Trainable discrete feature embeddings for quantum machine learning. In: 2021 IEEE international conference on quantum computing and engineering (QCE). https://doi.org/10.1109/qce52317.2021.00087 6. Huang Z, Qian L, Cai D (2022) A quantum computing simulator scheme using MPI technology on cloud platform. In: 2022 IEEE international conference on electrical engineering, big data and algorithms (EEBDA). https://doi.org/10.1109/eebda53927.2022.9744891
Integrated Quantum Health Care with Predictive Intelligence Approach
421
7. De Andrade MG, Dai W, Guha S, Towsley D (2021) A quantum walk control plane for distributed quantum computing in quantum networks. In: 2021 IEEE international conference on quantum computing and engineering (QCE). https://doi.org/10.1109/qce52317.2021. 00048 8. Upama PB, Faruk MJH, Nazim M, Masum M, Shahriar H, Uddin G, Barzanjeh S, Ahamed SI, Rahman A (2022) Evolution of quantum computing: a systematic survey on the use of quantum computing tools. In: 2022 IEEE 46th annual computers, software, and applications conference (COMPSAC). https://doi.org/10.1109/compsac54236.2022.00096 9. De Andrade MG, Dai W, Guha S, Towsley D (2021) Optimal policies for distributed quantum computing with quantum walk control plane protocol. In: 2021 IEEE international conference on quantum computing and engineering (QCE). https://doi.org/10.1109/qce52317.2021.00074 10. Scott JR, Balram KC (2022) Timing constraints imposed by classical digital control systems on photonic implementations of measurement-based quantum computing. IEEE Trans Q Eng 3:1–20. https://doi.org/10.1109/tqe.2022.3175587 11. Yano H, Suzuki Y, Itoh K, Raymond R, Yamamoto N (2021) Efficient discrete feature encoding for variational quantum classifier. IEEE Trans Q Eng 2:1–14. https://doi.org/10.1109/tqe.2021. 3103050 12. Resch S, Karpuzcu UR (2021) Benchmarking quantum computers and the impact of quantum noise. ACM Comput Surv 54(7):1–35. https://doi.org/10.1145/3464420 13. Ding Y, Llewellyn D, Faruque II, Bacco D, Rottwitt K, Thompson MG., Wang J, Oxenlowe LK (2020) Quantum entanglement and teleportation based on silicon photonics. In: 2020 22nd international conference on transparent optical networks (ICTON). https://doi.org/10.1109/ict on51198.2020.9203437 14. Murali P, Mckay DC, Martonosi M, Javadi-Abhari A (2020) Software mitigation of crosstalk on noisy intermediate-scale quantum computers. In: Proceedings of the twenty-fifth international conference on architectural support for programming languages and operating systems. https:/ /doi.org/10.1145/3373376.3378477 15. Steiger DS, Häner T, Troyer M (2018) ProjectQ: an open source software framework for quantum computing. Quantum 2:49. https://doi.org/10.22331/q-2018-01-31-49 16. Kelly A (2018) Simulating quantum computers using OpenCL. arXiv.org. https://arxiv.org/abs/ 1805.00988 17. LaRose R, Mari A, Kaiser S, Karalekas PJ, Alves AA, Czarnik P, El Mandouh M, Gordon MH, Hindy Y, Robertson A, Thakre P, Wahl M, Samuel D, Mistri R, Tremblay M, Gardner N, Stemen NT, Shammah N, Zeng WJ (2022) Mitiq: a software package for error mitigation on noisy quantum computers. Quantum 6:774. https://doi.org/10.22331/q-2022-08-11-774 18. Sahoo PK, Mishra S, Panigrahi R, Bhoi AK, Barsocchi P (2022) An improvised deeplearning-based mask R-CNN model for laryngeal cancer detection using CT images. Sensors 22(22):8834 19. Sivani T, Mishra S (2022) Wearable devices: evolution and usage in remote patient monitoring system. In: Connected e-Health. Springer, Cham, pp 311–332 20. Mishra S, Jena L, Tripathy HK, Gaber T (2022) Prioritized and predictive intelligence of things enabled waste management model in smart and sustainable environment. PLoS ONE 17(8):e0272383 21. Tripathy HK, Mishra S, Suman S, Nayyar A, Sahoo KS (2022) Smart COVID-shield: an IoT driven reliable and automated prototype model for COVID-19 symptoms tracking. Computing 1–22 22. Suman S, Mishra S, Sahoo KS, Nayyar A (2022) Vision navigator: a smart and intelligent obstacle recognition model for visually impaired users. Mobile Inf Syst 2022
A Smart Data-Driven Prototype for Depression and Stress Tracking in Patients Pragya Pranjal, Saahil Mallick, Malvika Madan, Sushruta Mishra, Ahmed Alkhayyat, and Smaraki Bhaktisudha
Abstract One of the most significant issues of our time is how stress affects depression and how depression affects stress. The aim of this paper is to provide a brief overview of IoT use in health care and to add to the body of knowledge regarding IoT devices being used in the sector to track patients’ levels of stress and anxiety throughout treatments. This will assist us in gathering data about potential future medications, including those for different mental diseases. Many hospitalized patients have some level of depression, which is linked to higher mortality risks and depression that lasts longer than a year. In addition to capturing “micro-expressions” or controlled body language expressions, emotion detection and recognition methods also use facial expressions for emotions including happiness, sadness, surprise, and wrath. Additionally, this article will collect information on patient monitoring practices to ensure that no patient receives subpar care and will promote a few mobile health apps that will assist regular people in exercising caution. A prototype model for assessing mental risk is created and put into use, with a success rate of 96.1% in identifying the patients’ level of mental risk. Keywords Mood analysis · Emotion detection · Depression · Healthcare · Internet of Things · Mental health · Sensors
P. Pranjal · S. Mallick · M. Madan · S. Mishra (B) · S. Bhaktisudha Kalinga Institute of Industrial Technology University, Bhubaneswar, India e-mail: [email protected] P. Pranjal e-mail: [email protected] S. Mallick e-mail: [email protected] M. Madan e-mail: [email protected] A. Alkhayyat College of Technical Engineering, The Islamic University, Najaf, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_36
423
424
P. Pranjal et al.
1 Introduction India is a massive country with 141.71 billion citizens (2022). There were roughly 187,000 suicide deaths in India in 2010, accounting for around 3% of all deaths studied [1]. The World Health Organization classifies depression as a mental illness. By the year 2030, depression is expected to be one of the top three pandemic illnesses. It is a psychological disorder with symptoms such as loss of appetite or overeating, insomnia or oversleeping, difficulty concentrating, recurring thoughts of death, sadness or lack of interest, psychomotor slowing, lack of energy and fatigue, feelings of worthlessness and guilt, and difficulty making decisions [2]. Despite the fact that many people with depression and anxiety disorders continue to experience pain because they are not given the appropriate care, it is believed that depression is curable with medication and assistance. Additionally, some individuals are not even conscious of their depression. Depending on the stage of depression a person is experiencing, some are able to get better, some struggle, and many end up taking their own lives. Due to Internet technology’s long-standing entry into the sector, the Internet of Things (IoT) and embedded technologies have had a substantial impact on the healthcare sector [3]. Thanks to Internet of Things (IoT)-enabled connectivity, this dysfunctional healthcare system has the potential to be converted into an integrated, efficient, and patient-centered system. This will help to shift the current focus from curative care to wellness and well-being and, as a result, cut the cost of health care by using all-inclusive strategies. As a result, IoT in Indian health care will aid in meeting the needs of today’s patients, who are becoming more aware. Both will benefit from it equally: • By gaining real-time access to patient data and streamlining workflows with the help of sensor-based smart chips, real-time positioning systems, etc., doctors and hospitals may save costs. • Health insurance providers by cutting back on claim payments. • Pharmaceutical firms by enabling patients to start taking medications earlier (due to early sickness detection and diagnosis of medical disorders) and tracking and assuring compliance with treatments. • Governments maintain population health by monitoring it.
2 Related Works In the framework of evaluating depression and stress as potential mental health hazards in humans, many models were offered by various writers. Below is a list of some important extant works. Using IoT-based biomarkers, Kumar et al. [4] demonstrated a deep neural network with hierarchical learning capabilities. It picks up new information using both wrist- and chest-based sensor-based biosignal characteristics. It can be represented as a fusion of high-level representations for stress state classification at the model level. Unfortunately, this technique only uses a small number of
A Smart Data-Driven Prototype for Depression and Stress Tracking …
425
biomarkers and automatic responses. Amna Amanat et al. propose a feature extraction technique from text collections to indicate underlying depressive symptoms and attitudes in tweet data [5]. Sadly, using derogatory or depressing language in conversations or blog postings is not necessarily a reliable sign of melancholy. The authors in [6] kept tabs on the patients using their written data, photographs, videos, and audio recordings. Depression was lessened by using robots as social beings, or “humanoids.” However, we could see written content that was not from a trustworthy source, causes, and signs that were not taken into account and depression treatment options that did not work. Anwar Ahmed Khan et al. have created a system that tracks and reports temperature, heart rate, and SpO2 measurements [7]. The system’s implementation made use of communication channels that were helped by space– time spreading (STS). However, it did not account for real-world data. Christopher Burton et al. carried out a thorough analysis of digital artifact use among depression patients. [8]. Actigraphy, the information source, should be emphasized because it has a tendency to overestimate sleep in some patient populations. Researchers from [9] have published a study that lays the groundwork for a smartphone mood monitoring app to merge actively and passively collected data to understand the relationship between prenatal depressive symptoms. However, upon analysis, we discovered that it does not account for depression instances in the broader community and is solely reliant upon ground reality. A technique for diagnosing depression with the least amount of human assistance was developed by Ezekiel Victor and his colleagues [10]. A multimodal deep learning model took into account the responses of the participants—word data, video, and audio—as well as demographic data and additional metadata. Sadly, it is a one-time web-based evaluation with no response mechanism and subjective outcomes because the responses depend on the taker’s current state. Using convolutional neural networks (CNN), a novel technique for automatically identifying depression in a subject’s voice was introduced in [11]. In tests, spectrograms were supplied to residual CNNs, so they could automatically create visuals from audio recordings. The experimental results, which used different ResNet designs, showed a baseline accuracy of 77%. We infer from the technology gap that vocal factors such as intonation and frequency are unreliable for detecting depression, require high-end recording equipment for clarity, and cannot integrate real-time status.
3 Methodology We review the literature on IoT-based mental health and wellness systems, focusing on key components such as data acquisition, self-organization, service level agreements and create two approaches: search strategies and the Basic Search Qualifications Standard. Search strategy turns out to be one of the most important factors in identifying the issues through search for two pertinent domains, “patients and depression” and “terms,” i.e., primarily focusing on the keywords “mental health,” “mental disorders,” “internet of things,” and “IOT-based systems.” Examining current systems
426
P. Pranjal et al.
that have been labeled IoT technologies is the basic condition for study eligibility [12]. IOT is referred to as a “New Internet” that facilitates communication between individuals, between individuals and objects, and between objects and other objects. IOT has been explored in the past for a variety of uses, including morale boosting, diagnostics, and mental health care. The IoT has become important in psychology by using a multitude of physiological and environmental devices and sensors to determine factors such as sensations, personality traits, and mental illness. The word “emotion” is used to describe everything that is meant by “feelings,” “states of feeling,” “pleasures,” “pain,” “passion,” “sentiment,” and “affection.” To detect emotions, the Web of Things (WoT) collects information about users’ physical sensations (such as various actions), textual sensations (such as Facebook posts, tweets, and messages), and social network communications (such as voices, photos, and videos). After analysis of these data components, user emotions are identified and all relevant services are activated. Millions of people require medical therapy each year for depression, and the majority of them will make a full recovery with the right diagnosis and care. Interviewing the affected person is the primary way of diagnosing depression, and other questionnaires can be used to gauge the severity of the disorder. The Diagnostic and Statistical Manual of Mental Disorders IV Text Revision’s questionnaires, such as the Hamilton Rating Scale for Depression (HRSD), Beck Depression Inventory, Patient Health Questionnaire Depression Scale (PHQ-8), Zung Self-Rating Depression Scale, and the Center for Epidemiologic Studies Depression Scale (CES-D), is useful screening tools for measuring depression and anxiety. The Patient Health Questionnaire for Depression and Anxiety (PHQ-9) is widely used as shown in Table 1. PHQ4 is a screening tool with four questions and two illustrations each question. The psychiatrist uses it to pinpoint the patients’ symptoms. Depending on the score, it is classified as normal, mild, moderate, or severe. The resultant evaluation will use the PHQ-9 for depression and the GAD-7 for anxiety. Nine practical questions make up this evaluation, which is intended to gauge the severity of depression. This allows for the division of depression into four levels: mild, moderate, quite serious, and severe. The PHQ-9 provides a precise evaluation of the degree of depression. The results had an 88% perceptivity and particularity. Each wearable device has measured different depression-related factors that can be divided into three categories: activity/sleep, physiological, and subjective. Wearable technology can assess activity and sleep habits, and previous research has found that people who are more active are less likely to suffer from depression. Depressive symptoms are more prevalent in people who do not exercise frequently, and depression is a major risk factor for leading a sedentary lifestyle [13]. Antidepressant therapy for depressed inpatients led to a rise in their actigraphy-measured activity levels. Additionally, Winkler et al. noted that even with electroconvulsive therapy, activity was increased (ECT). Last but not least, Peis et al. demonstrated that increasing motor activity and early patterns of actigraphic data enabled reliable prediction of hospital discharge dates using a hierarchical generalized linear regression model. Insomnia is a risk factor for depression, and sleep-related signs can be used to anticipate depression. Additionally, sad people often report more sleepless nights than
A Smart Data-Driven Prototype for Depression and Stress Tracking …
427
Table 1 Patient health questionnaire (PHQ-9) How frequently did any of the following bother you over the previous two weeks?
None
Few days
Half of the time
Almost every day
1. Lack of enjoyment or interest in activities
0
1
2
3
2. Feeling negative, downhearted, or hopeless
0
1
2
3
3. Having trouble getting or remain asleep, or resting excessively
0
1
2
3
4. Tired or drowsy
0
1
2
3
5. Loss of appetite or overeating
0
1
2
3
6. Feeling bad about yourself—either you are a failure or you have let yourself down or you have let your family down
0
1
2
3
7. Difficulty concentrating on things like reading the newspaper 0 or watching TV
1
2
3
8. Moving or talking so slowly that others notice? Or on the contrary, do you feel so restless or agitated that you move a lot more than usual?
1
2
3
0
healthy controls. Numerous researches have shown a connection between depressive symptoms and factors related to the circadian rhythm. Wearable technology uses an optical thermometer, an EDA sensor, and a PPG sensor to detect physiological traits such skin conductance (SC), skin temperature (ST), and heart rate variability (HRV). Diabetes patients have a lower quality of life than those who do not have the disease. They also have a higher chance of developing depressive symptoms, which could further lower their quality of life. Each study found a link between depressive symptoms and at least one component of diabetes patients’ quality of life that was detrimental. As a result, in many diabetic treatment settings, higher awareness and monitoring for depression are required. It is reflected in Fig. 1. Basic wearables and hand-held gadgets were used to keep the patients linked. Temperature, heart rate, and SPO2 sensors are all a part of wearable electronics. In order to constantly gather patient data and send it to distant emergency teams and healthcare specialists, the data obtained from these wearable devices will be concurrently sent to the server through the cloud. The threshold values of the physiological markers should be based on the unique health assessment of each patient. An alert is quickly generated and sent to a healthcare expert, consultant, or emergency services when a parameter is discovered to be above a preset value. Mobile applications that provide real-time access to the most current health statistics will be available to patients, medical professionals, and emergency service providers [14]. Below is a quick explanation of the main elements of the suggested architecture as depicted in Fig. 2. • Tracking Technology: The gadget that the patient would be wearing consists of a set of temp, heart/pulse rate, and SPO2 detectors for measuring body statistics,
428
P. Pranjal et al.
Fig. 1 Sample demonstration of depression in diabetic patients
Fig. 2 Proposed framework for depression and stress assessment
a WiFi component for obtaining the sensor’s information, a microcontroller unit configured to automatically activate “Effective Notifications” when the factors surpass a specified cutoff point, as well as an SD memory card for collecting and streaming music and movies. • In-hand Mobile Device: The mobile has a WiFi module for transferring information to a cloud service for further analyzing and delivering the data collected to a doctor, a warning generating module for creating an alert when a measured factor exceeds the limit specified, a digital module for listening to music, showcase interface for warnings when a given task should be performed [15]. • Cloud Platform: Depending on a person’s status and health, the necessary solution will be enabled by the cloud-based computing system, such as earlier detection or emergency assistance [16].
A Smart Data-Driven Prototype for Depression and Stress Tracking …
429
4 Research Obstacles in Mental Health Systems with IoT Support There are still many obstacles to overcome in the study and the creation of IoT-based mental health care applications and networks. The functioning of IoT devices and the capacity to capture and analyze enormous amounts of data in order to provide intelligent health services are some of the problems, despite the fact that they also affect other businesses. The potential for real-time prediction of significant occurrences in people with mental illnesses such severe depression presents another difficulty. We also discovered problems with privacy and security, as well as identity management. Some of these difficulties—some of which are related to the scarcity of employment in particular disciplines—are highlighted in the following sections. • Data Acquisition: One issue that requires further focus is the acceptance of various IoT devices for mental health. This is the term used to describe the application of numerous technologies to study how people interact with their environment and predict their mental states in the present or the future. The collection of data about the human environment, however, may necessitate the use of a number of devices, including wearable, mobile, and those installed in various locations around the regions where patients reside or move around. Numerous firms may soon release far more sophisticated wearable, mobile, and pervasive gadgets, many of which lack interoperable data standards for transmission and regulation. Sensors built within the instrument are utilized to gather these data, which result in data that can be quantified [17]. However, the information from these devices is obtained utilizing sensors that are integrated into the tool and create quantifiable data. Even when data may be collected in a repository, these devices are often not immediately compatible, and the frequently significantly diverse data formats make analysis difficult. • Self-Organization: IoT devices can employ a variety of platforms and providers. When replacing devices because of an update or a malfunction, this presents an additional challenge. This could get worse if the new device has different readings, precision, or frequency for the same data (e.g., activities, sleep). Modern IoT systems and devices consequently typically lack the capacity to self-organize. In instances involving mental health concerns, self-organization is desired and may even be required, as seen in natural phenomena like flocks and herds. Coordination between multiple IoT systems and devices is thought to be essential. If the first gadget fails, another one will be able to provide comparable data. • Identity Management: IoT-based systems may acquire sensitive or vital information, transfer patient information over the WWW, continuously monitor, and allow the collection of targeted records in order to support persons with mental health issues. Security is still a significant problem and a major roadblock in IoT-enabled research, especially in the context of therapies for mental health conditions. Despite the many benefits of IoT, there are new risks and flaws which might be harmful to individuals’ well-being, such as loss of healthcare data and medicines, use of equipment without authorization, and personal data breach. A
430
P. Pranjal et al.
strong and modern management system is needed to meet these security concerns. Patients, carers, healthcare professionals, health insurance companies, pharmacies or drug stores, and other connected institutions must all be personally recognized, vetted, and authorized to establish secure connections in order to protect sensitive and critical patient data on IoT devices. So, they need to be secured from dangers by utilizing effective identity management.
5 Results and Analysis One in 20 adults in India over the age of 18 has dealt with depression at some point in their lives. Researchers from a variety of fields have been seeking to pinpoint sadness across a range of age groups by combining several technologies. For diagnosing depression or treating those who are depressed, researchers have taken into account speech traits, facial expressions, and writing patterns on social networking websites. Technologies like artificial intelligence, smart health care, EEG signal processing, and others are all aiming to detect depression. This essay offers a comprehensive analysis of the numerous techniques that have been employed in the past to pinpoint depression. Due to the lack of an easy approach to detect depression, there is still some ambiguity in detecting depression using a single accurate method and computation. Table 2 shows the analysis of IoT-enabled health models. Figure 3 depicts depression among patients on the basis of gender. As it is seen, it successfully manages to determine the degree of risks by categorizing it among no risk, mild, moderate, moderately severe, and severe categories. The prototype model will be improved to become the finished product employing a hand-held device and wearable device for sensing. Figure 4 highlights the impact of dataset samples on the accuracy rate of model implementation. It is observed that with the rise in data samples, the accuracy still remains consistent throughout. With 100 data samples, the accuracy noted is 96.85%, while with 5000 samples, it is still 95.04%. The mean accuracy recorded is 96.1%.
6 Conclusion Mental illness is spreading around the world. For instance, the present COVID-19 pandemic has made social isolation and stressful situations more prevalent, which has a negative impact on the majority of the population who carry out their daily work from home. Given the significant prevalence of depressive illness in healthcare settings, there is much to be learned from a review of the efficacy of treatment for depression. The mental health costs also affect medical personnel, who are frequently subjected to extreme stress, worry, depression, and insomnia as a result
Schizophrenia
Schizophrenia related
✔ ✔ ✔
✔ ✔ ✔
✔
✔
✔
✔
✔
✔
Depression
Bipolar
Depressive
Bipolar related
✔ ✔
✔
✔
✔ ✔
Psychological stress
Stress related
Smartphone
Technologies Wearable
Social data
Behavioral data
Measures Physiological data
Mental stress
Reported mental disorder(s)
DSM 5 disorder
Table 2 Review of IoT-enabled mental healthcare systems
✔
Embedded
A Smart Data-Driven Prototype for Depression and Stress Tracking … 431
432
P. Pranjal et al.
Fig. 3 Measured depression severity in patients by gender
Fig. 4 Accuracy analysis in context to volume of data used
of physical exhaustion, work burnout, and daily chores that cannot be completed at home. They are now being developed and suggested new methods for identifying and maybe treating mental illnesses. For instance, the deployment of social robots can help those who are feeling mentally exhausted by the pandemic’s social isolation. Furthermore, the usage of telerobots may make it possible to take a patient’s temperature without having to touch them. This investigation suggests that certain technologies and sensors may be coupled to identify depression. It is possible to use sensors and other IoT-enabled devices to build a model that might enable people to recognize depression and seek treatment from physicians and psychiatrists. Chatbots can potentially be used to identify depression and treat it. The combination of several
A Smart Data-Driven Prototype for Depression and Stress Tracking …
433
technologies can result in a method for accurately identifying depression. However, the IoT’s potential for applications in the field of mental health has scarcely been explored.
References 1. Kirankumar C, Prabhakaran M (2017) Design and implementation of low cost web based human health monitoring system using Raspberry Pi 2. In: Proceedings of the 2017 IEEE international conference on electrical, instrumentation and communication engineering (ICEICE), Karur, India, pp 1–5 2. Tripathy HK, Mishra S, Suman S, Nayyar A, Sahoo KS (2022) Smart COVID-shield: an IoT driven reliable and automated prototype model for COVID-19 symptoms tracking. Computing 1–22 3. Mishra S, Jena L, Tripathy HK, Gaber T (2022) Prioritized and predictive intelligence of things enabled waste management models in a smart and sustainable environment. PLoS ONE 17(8):e0272383 4. Kumar A, Sharma K, Sharma A (2021) Hierarchical deep neural network for mental stress state detection using IoT based biomarkers. Sci Direct Pattern Recogn Lett 145:81–87 5. Amanat A, Rizwan M, Javed AR, Abdelhaq M, Alsaqour R, Pandya S (2022) Deep learning for depression detection from textual data. MDPI Electron 11 6. Rajawat AS, Rawat R, Barhanpurkar K (2021) Depression detection for elderly people using AI robotic systems leveraging the Nelder–Mead method. In: ScienceDirect artificial intelligence for future generation robotics, pp 55–70 7. Khan AA, Nait-Abdesselam F, Dey I, Siddiqui S (2021) Anxiety and depression management for elderly using internet of things and symphonic melodies. In: IEEE xplore ICC 2021 - IEEE international conference on communications 8. Burton C, McKinstry B, Tatar AS (2013) Activity monitoring in patients with depression: a systematic review. J Affect Disorders 145(1):21–28 9. Appleby D, Faherty LJ, Hantsoo L (2017) Movement patterns in women at risk for perinatal depression: use of a mood-monitoring mobile application in pregnancy. J Am Med Inf Assoc 24(4):746–753 10. Victor E, Aghajan ZM, Sewart AR, Christian R (2019) Detecting depression using a framework combining deep multimodal neural networks with a purpose-built automated evaluation. AIME Res Psychol Assess 31(8) 11. Chlasta K, Wołk K, Krejtz I (2019) Automated speech based screening of depression using deep convolutional neural networks. Procedia Comput Sci 164(2019):618–628 12. Suman S, Mishra S, Sahoo KS, Nayyar A (2022) Vision navigator: a smart and intelligent obstacle recognition model for visually impaired users. Mobile Inf Sys 13. Sivani T, Mishra S (2022) Wearable devices: evolution and usage in remote patient monitoring system. In: Connected e-Health. Springer, Cham, pp 311–332 14. Mohapatra SK, Mishra S, Tripathy HK, Alkhayyat A (2022) A sustainable data-driven energy consumption assessment model for building infrastructures in resource constrained environments. Sustainab Energy Technol Assess 53:102697 15. Sahoo PK, Mishra S, Panigrahi R, Bhoi AK, Barsocchi P (2022) An improvised deeplearning-based mask R-CNN model for Laryngeal cancer detection using CT images. Sensors 22(22):8834
434
P. Pranjal et al.
16. Mishra S, Thakkar HK, Singh P, Sharma G (2022) A decisive metaheuristic attribute selector enabled combined unsupervised-supervised model for chronic disease risk assessment. Comput Intell Neurosci 17. Mohanty A, Mishra S (2022) A comprehensive study of explainable artificial intelligence in healthcare. In: Augmented intelligence in healthcare: a pragmatic and integrated analysis. Springer, Singapore, pp 475–502
Applied Computational Intelligence for Breast Cancer Detection Bhavya Dua, Kaushiki Kriti, Sushruta Mishra, Chitra Shashidhar, Marcello Carvalho dos Reis, and Victor Hugo C. de Albuquerque
Abstract One of the most widespread illnesses worldwide among women is breast cancer. A quick, proper diagnosis is essential in rehabilitation and recovery. However, because of different difficulties in mammography detection, it is not an easy process. Techniques of applied machine learning can produce early detection tools, significantly improving survival rates. This study analyses three of the most often used machine learning (ML) approaches for identifying breast cancer: support vector machines (SVM), random forests (RF), and Bayesian networks (BN). This study’s findings provide a summary of current ML techniques for breast cancer detection. Random forest generated the best performance with 96.4% accuracy when normalisation and feature weighting are considered. Keywords Breast cancer · Malignancy · Machine learning · Computational intelligence · Classification
B. Dua · K. Kriti · S. Mishra (B) Kalinga Institute of Industrial Technology University, Bhubaneswar, India e-mail: [email protected] B. Dua e-mail: [email protected] K. Kriti e-mail: [email protected] C. Shashidhar Department of Commerce and Management, Seshadripuram College, Bengaluru 560020, India Federal Institute of Ceará, Fortaleza, Brazil M. C. dos Reis Federal Institute of Education, Science and Technology of Ceará, Fortaleza, Ceará, Brazil e-mail: [email protected] V. H. C. de Albuquerque Department of Teleinformatics Engineering, Federal University of Ceará, Fortaleza, Brazil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_37
435
436
B. Dua et al.
1 Introduction Cancer is a group of diseases with similar symptoms that all result in unchecked cellular growth and reproduction. With about 8 million deaths per year, it is one of the largest fatal diseases in the developing world and the first in the developed world. Breast cancer primarily impacts women. With today’s modern medicines and technological equipment, fighting breast cancer has grown simpler. Screening is an essential approach for diagnosing breast cancer and is pursued after an early diagnosis. It boosts the chance of a satisfactory treatment result [1]. In order to effectively manage patients’ future clinical care, early cancer detection and prognosis are now essential in cancer research. For many doctors, accurately predicting malignant growth remains a difficult endeavour. The rise of recent techniques and the massive volume of data pertaining to patients have paved the way for creating novel cancer prediction and diagnosis procedures. Although data analysis from patient and physician input helps considerably to the diagnostic process, new technologies should be implemented to aid in the promotion of right diagnoses. These technologies aim to avoid potential errors in diagnosis, while also provide a fast means to examine large volumes of data [2]. Predictive analytics is a subset of machine Intelligence that allows computers to learn without human input, allowing them to operate in a particular format by dealing with various data sets that allow them to learn particular tasks via trial and error. Techniques of this domain have been incorporated in prediction models over several years to improve successful decision-making on a large scale. These methods might be used in research related to cancer to identify trends in data and thus classify tumours as malignant or benign. The accuracy of classification, recall, exactness, and area under the operating characteristic curve may also be used to assess the performance of such algorithms. Three popular machine learning techniques are investigated and compared in this paper using a breast cancer data set. Random forests, support vector machines, and Bayesian networks are examples of these approaches [3]. Despite the complicated interconnections of high-dimensional medical data, statistical approaches have traditionally been utilised for categorization of highand low-risk cancer. Machine learning has recently been utilised in cancer prognosis and prediction to solve the limitations of classic statistical methodologies. Machine learning is an artificial intelligence discipline that integrates a variety of statistical, optimization, and probability-based methodologies to allow computers to observe prior occurrences and uncover subtle patterns in highly convoluted data sets [4]. ML offers several benefits over pathologists. To begin with, machines can operate far quicker than people. A pathologist typically needs 10 days to complete a biopsy, whereas a computer can accomplish the task in a fraction of a second. Further, machines have been developed to the extent that they can learn after each iteration. While learning is not limited to machines since humans also learn from practice, yet machines are not only faster than humans but they also do not suffer from fatigue or exhaustion.
Applied Computational Intelligence for Breast Cancer Detection
437
Fig. 1 Computational learning model for detection of breast cancer
Additionally, machines have a high degree of accuracy. With Internet of Things (IoT), the amount of data being generated has gone up manifold, making it virtually impossible for humans to make the best possible use of it. That is when machines come in handy. They are faster than us, can do more exact computations, and can recognise patterns in data. That is why they are called computers. This ability of machines is best suited for applications in the medical domain that depend upon analysis of complex genomes and proteomes. Given the increasing importance of predictive medicine and the increasing dependency on machines to make predictions, we felt a thorough study of published works using ML techniques in cancer detection and diagnosis would be useful. The objective is to identify key trends in the types of algorithms used, the training data employed, the malignancies studied, and the overall efficacy in predicting patient outcomes. Figure 1 denotes a sample use-case of computational intelligence in analysing breast cancer.
2 Related Works Extensive research has been undertaken on many medical data sets utilising machine learning methods, particularly in BC prediction. In 2015, Wang and Yoon [5] employed SVM, an ANN, AdaBoost, and a Bayes classifier to detect breast cancer, with principal component analysis applied for reduction of the attribute space. In 2020, Boeri et al. [6] utilised an ANN and SVM to predict breast cancer recurrence and patient mortality in a period of less than 3 years after surgery. SVM, with an accuracy of 96.86%, outperformed. Khourdifi [7] used four ML algorithms on the Wisconsin breast cancer data set (UCL repository): SVM, RF, Naive Bayes classifiers, and K-nearest neighbour. For the algorithm simulation, the authors utilised the Weka programme. SVM again outperformed not only in productivity but also in efficacy. Chaurasia et al. [8] improved the BC prediction model by using RBF networks, J48, and Naive Bayes classifiers for classifying tumours in the Wisconsin
438
B. Dua et al.
breast cancer database—the findings presented Naive Bayes as the best performer for prediction. Kumar [9] evaluated the success of techniques that mined data and used decision trees, Naive Bayes, and logistic regression in identifying breast cancer cells. Rajbharath and Sankari [10] created a breast cancer survivorship prediction model using a mix of RF and logistic regression algorithms. The model utilised RF to do an initial filtering. To predict the chances of survival of a patient, data were gathered from the existing WDBC data set and then used for predictive modelling based on logistic regression. In 2016, in the original Wisconsin breast cancer data set, Asri et al. [11] evaluated multiple methods—decision trees, Naive Bayes classifiers, and K-nearest neighbour—to find out which method was most effective. The study concluded that the experimental SVM provided the highest precision level. Ricciardi et al. [12] created a model that employed elements of both linear discriminant analysis (LDA) and principal component analysis (PCA) for the stratification of coronary artery disease. The PCA was used to develop new attributes, and the LDA was used for the stratification. Together, this improved disease detection. Kumar et al. [13] used Naive Bayes, RF, AdaBoost, random trees, Lazy K-star, regression (logistic), multiple class classifiers, and multi-layered perceptrons to predict the presence of breast cancer. The initial data came from the Wisconsin breast cancer database, and the random tree and Lazy K had the highest effectiveness. Furthermore, Gupta and Gupta [14] compared four commonly used machine learning algorithms, namely multi-layered perceptrons (MLP), SVM, decision trees, and KNN, using the Wisconsin breast cancer data set to anticipate the chances of a relapse of breast cancer. Their major goal was to determine which classifier out of those examined by them was the best performer on the parameters of correctness, recall, and exactness. They also determined that perceptrons outperformed all other approaches, inclusive of the tenfold cross-validation, in their study. Zheng et al. [15] used tenfold cross-validation to examine K-SVM algorithms, and the proposed approach improved the precision of prediction of breast cancer to 97.38% when measured using the Wisconsin Diagnostic Breast Cancer Scale. These scientists introduced a novel mix of learning algorithms, most notably K-means to distinguish between malignant and benign tumours, followed by SVM to develop a classifier using tenfold cross-validation. This innovative technique demonstrated an exactness of 97.38%, which was greater than the other models studied.
3 Proposed Model In ML techniques, there are two forms of learning: supervised and unsupervised learning. One data set is used for training the system as well as to label it for providing a proper output in supervised learning [16]. Unsupervised learning, on the other hand, is more difficult to do since no predetermined data sets are used and no understanding of the desired output exists. Classification is one of the most frequent supervised learning approaches. It develops a model using previous labelled data that is then used to forecast the future. Clinics and institutions in the medical industry have
Applied Computational Intelligence for Breast Cancer Detection
439
Fig. 2 Flowchart of proposed breast cancer assessment model
vast databases containing patients’ records for symptoms and diagnosis. As a result, researchers utilise this information to create classification models that can draw inferences on the basis of past data. With machine-based help and the vast quantity of medical data accessible today, medical inference has become a considerably easier undertaking [17]. In this work as shown in Fig. 2, a hybrid machine learning-based methodology for classifying breast cancer has been suggested, having three stages. A. Normalisation stage: Deviation about the absolute median (MAD) is a powerful normalisation technique used for quantifying dispersion of a set of data. MAD = median (|Yi—median (Yi|)) is the formula for the MAD normalisation procedure. B. Feature weighting stage: Approaches to prioritising features were utilised to change the data set from linearly linked to linearly separable. K-means clustering was used to weight the breast cancer data set. The following is how feature weighting based on K-means clustering works: (I) (II) (III) (IV)
Using KMC, locate cluster centres of each feature. Each data set’s absolute disparities from its cluster centre are computed. Take the average of these deviations for each feature. By multiplying these averaged values by the number of data points in each characteristic, the weighting factors may be calculated [2].
C. Classification stage: The constructed model is also known as the decision tree model with model coefficients. All of the approaches employed in this research are classified as follows: C.1 Support vector machines (SVM)—SVM is used to generate a mapping from input vectors to a space having a higher order, to discover the optimum hyperplane that divides the data into classes. C.2 Random forest (RF)—RF assembles multiple decision trees, resembling a forest. When RF is used instead of single decision trees, the stability increases [18].
440
B. Dua et al.
C.3 Bayesian networks (BN)—The branch of probabilistic graphical models known as BN is utilised for domain prediction and knowledge representation. The directed acyclic graph is represented by BN (DAG).
4 Result Analysis According to the proposed model, SVM is used to map an input vector to a space with higher dimension in order to determine the optimal hyperplane for dividing the data into classes [19]. Random forest combines multiple decision trees to form a forest. When RF is used instead of single decision trees, the stability increases. The branch of probabilistic graphical models known as BN is utilised for domain prediction and knowledge representation [20]. During implementation, it is observed that normalisation played an integral role in the prediction process. All intelligent models generated an optimum accuracy with normalisation where random forest models noted the highest accuracy of 96.4%, while SVM noted 89.7% as shown in Fig. 3. Figure 4 highlights the impact of feature weighing on the accuracy of prediction of breast cancer. Applying feature weighting, it recorded better accuracy than going alone. Again random forest gave the best performance of 94.3% accuracy rate with feature weighting. The response delay analysis was also undertaken on breast cancer data. The least response time noted was with random forest with 1.08 s only against 3.65 s with SVM classifier as seen in Fig. 5. 100
Accuracy rate (%)
Fig. 3 Accuracy rate analysis in context to normalisation approach
96.4
95 89.7
90 85
91.2
95.8 90.4
82.5
80 75 SVM
RF
BN
Learning algorithms Without normalization
With normalization
Fig. 4 Accuracy rate analysis in context to feature weighting approach
Learning algorithms
Applied Computational Intelligence for Breast Cancer Detection
BN
441 92.9
86.2
RF
94.3
87.7
SVM
90.3
83.6 75
80
85
90
95
100
Accuracy rate (%) With feature weighing
4
Delay sec)
Fig. 5 Response delay analysis in context to machine learning classifiers
Without feature weighing
3.65
3
2.98
2 1.08
1 0 SVM
RF
BN
Learning algorithms Response Delay (sec)
5 Future Works This review covers the latest research on cancer prognosis using predictive techniques. In the recent decade, there has been a large amount of ML research published that yields accurate results for certain predictive cancer outcomes. Identifying potential limits, such as the study design, collecting sufficient data samples, and confirming the categorization results, is critical for clinical decision extraction. Before gene expression profiles may be employed in the clinic, more trustworthy validation data are necessary. In the preceding two years, there has been a rising trend in articles that employed semi-supervised ML techniques to estimate cancer survival. Misclassifications can occur when the training sample size is too small in contrast to the data dimensionality, and estimators can produce unstable and biased models. A bigger patient population used for survival prediction clearly improves the predictive model’s generalizability.
442
B. Dua et al.
6 Conclusion Breast cancer afflicts more women around the world than any other disease. ML techniques are often used in the medical domain and have shown to be an efficient diagnostic tool. The fundamental characteristics and methods of each of the three ML techniques were provided, and the Wisconsin breast cancer samples were utilised for comparing the performance of the approaches investigated. The implementation outcome shows that the strategy utilised for classification has a great role to play in the final result. According to the statistics, SVMs have the highest exactness, relevance, and precision. RFs, conversely, have the highest success in identifying tumours.
References 1. Bazazeh D, Shubair R (2016) Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In: 2016 5th international conference on electronic devices, systems and applications (ICEDSA) 2. Mishra S, Jena L, Tripathy HK, Gaber T (2022) Prioritized and predictive intelligence of things enabled waste management model in smart and sustainable environment. PLoS ONE 17(8):e0272383 3. Tripathy HK, Mishra S, Suman S, Nayyar A, Sahoo KS (2022). Smart COVID-shield: an IoT driven reliable and automated prototype model for COVID-19 symptoms tracking. Computing 104:1–22 4. Suman S, Mishra S, Sahoo KS, Nayyar A (2022) Vision navigator: a smart and intelligent obstacle recognition model for visually impaired users. Mobile Inform Syst 2022:1–15 5. Wang H, Yoon SW (2015) Breast cancer prediction using data mining method. In: Proceedings of the IIE annual conference proceedings. Institute of Industrial and System Engineers (IISE), New Orleans, LA, USA, 30 May–2 June 2015, p 818 6. Boeri C, Chiappa C, Galli F, de Berardinis V, Bardelli L, Carcano G, Rovera F (2020) Machine learning techniques in breast cancer prognosis prediction: a primary evaluation. Cancer Med 9:3234–3243 7. Khourdifi Y (2018) Applying best machine learning algorithms for breast cancer prediction and classification. In: Proceedings of the 2018 international conference on electronics, control, optimization and computer science (ICECOCS), Kenitra, Morocco, 5–6 December 2018, pp 1–5 8. Chaurasia V, Pal S, Tiwari BB (2018) Prediction of benign and malignant breast cancer using data mining techniques. J Algorithms Comput Technol 12:119–126 9. Kumar Mandal S (2017) Performance analysis of data mining algorithms for breast cancer cell detection using Naïve Bayes, logistic regression and decision tree. Int J Eng Comput Sci 6:2319–7242 10. Rajbharath R, Sankari I, Scholar P (2017) Predicting breast cancer using random forest and logistic regression. Int J Eng Sci Comput 7:10708–10813 11. Asri H, Mousannnif H., al Moatassime H, Noel T (2016) Using machine learning algorithms for breast cancer risk prediction and diagnosis. Proc Comput Sci 83:1064–1069 12. Ricciardi C, Valente SA, Edmund K, Cantoni V, Green R, Fiorillo A, Picone I, Santini S, Cesarelli M (2020) Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health Inform J 26:2181–2192
Applied Computational Intelligence for Breast Cancer Detection
443
13. Kumar V, Misha BK, Mazzara M, Thanh DN, Verma A (2019) Prediction of malignant and benign breast cancer: a data mining approach in healthcare applications. In: Advances in data science and management. Springer, Berlin/Heidelberg, Germany, pp 435–442 14. Gupta S, Gupta MK (2018) A comparative study of breast cancer diagnosis using supervised machine learning techniques. In: Proceedings of the 2nd international conference on computing methodologies and communication (ICCMC 2018), Erode, India, 15–16 February 2018. IEEE, Piscataway, NJ, USA, pp 997–1002 15. Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of k-mean and support vector machine algorithms. Exp Syst Appl 41:1476–1482 16. Sivani T, Mishra S (2022) Wearable devices: evolution and usage in remote patient monitoring system. In: Connected e-health. Springer, Cham, pp 311–332 17. Mohapatra SK, Mishra S, Tripathy HK, Alkhayyat A (2022) A sustainable data-driven energy consumption assessment model for building infrastructures in resource constraint environment. Sustain Energy Technol Assess 53:102697 18. Sahoo PK, Mishra S, Panigrahi R, Bhoi AK, Barsocchi P (2022) An improvised deeplearning-based mask R-CNN model for laryngeal cancer detection using CT images. Sensors 22(22):8834 19. Mishra S, Thakkar HK, Singh P, Sharma G (2022) A decisive metaheuristic attribute selector enabled combined unsupervised-supervised model for chronic disease risk assessment. Comput Intell Neurosci 20. Mohanty A, Mishra S (2022) A comprehensive study of explainable artificial intelligence in healthcare. In: Augmented intelligence in healthcare: a pragmatic and integrated analysis. Springer, Singapore, pp 475–502
Crop Yield Forecasting with Precise Machine Learning Swayam Verma, Shashwat Sinha, Pratima Chaudhury, Sushruta Mishra, and Ahmed Alkhayyat
Abstract In India, the impact of global weather variation has severely impacted the majority of agricultural products. Farmer decision-making will be aided by this research, which will enable them to evaluate the productivity of their crops preparatory to cultivation. It makes use of random forest, a machine learning algorithm. There aren’t any suitable technologies or solutions to the difficulties we confront despite research into concerns and problems including weather, temperature, humidity, rainfall, and humidity. Even the agricultural sector is experiencing various forms of rising economic growth in nations like India. The prediction of crop output is another benefit of the technique. Keywords Data analytics · Crop productivity · Random forest · Prediction · Precision farming
1 Introduction During earlier habitation times, farming status in India got its start. Second in this industry is India. Agriculture and allied industries contributed around 20% of the total revenue in the twentieth century which was slightly more than its previous year, and S. Verma · S. Sinha · P. Chaudhury · S. Mishra (B) Kalinga Institute Of Industrial Technology, Deemed to be University, Bhubaneswar, India e-mail: [email protected] S. Verma e-mail: [email protected] S. Sinha e-mail: [email protected] P. Chaudhury e-mail: [email protected] A. Alkhayyat College of Technical Engineering, The Islamic University, Najaf, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_38
445
446
S. Verma et al.
around 18% with around 40% of the staff in the twentieth century. In context to the overall farming zone, India has the largest percentage of arable land (9.6%). Demographic data indicate that agriculture is a major component of India’s socioeconomic structure. Agriculture’s contribution to India’s GDP is drastically decreasing with the rise of industries. Technological integration is not as per expectation, and it is a concern for farming domain. This is one factor for which the farming zone is underutilized. Because of the excessive use of industrial technologies and non-renewable resources, it is challenging for agricultural experts to estimate climatic features that affect crop yield. Machine learning can aid farmers in this situation by predicting trends in temperature, rainfall, and crop yield using algorithms like RNN, LSTM, and others. Farmers with predictions will find it easier to live a little easier and will also see an increase in the yield and quality of their harvest because of the capability to yield crops at prior as per the suitability of estimation. The primary subjects of this study are the use of data analytics techniques in real time. The study discussed here also considers the unstructured samples corresponding to health metrics dataset in order to obtain a consistent trend. This approach considers all the factors, in contrast to the conventional practice of predicting saplings productivity by considering one feature simultaneously. The rest portion of the research is organized in the following manner. Second section discusses the significant existing works undertaken in the domain. Third section presents the dataset details used in the study. The fourth section discusses the novel workflow model along with its explanation. Fifth section presents the implementation outcome of the proposed model. Sixth section highlights the benefits of using the ensemble random forest method. Section 7 concludes the work.
2 Literature Survey The random forest method has the best yield forecast accuracy, according to demonstrations by authors in [1] using an Indian data samples set. By using ensemble models, Balamurugan [2] has used forecasting sapling productivity. The crop output was predicted using a variety of factors, including rainfall, temperature, and season. As per authors in [3], supervised analytical methods enable predictive methods for the prediction of an outcome. This work addresses methods for crop productivity estimation with supervised models. The yields of the various crops, along with climatic attributes like moisture content, pH value, temperature, and other factors, were forecast using a random forest method by Jig Han et al. [4]. Leo Brieman [5] is an expert in association and ensemble learner accuracy. The ensemble learning technique produces decision trees from several instances, estimates data patterns from each set, and then asks the system’s users to vote on which is the best option. Many analytics techniques that may be employed in many estimating zones were logically described by Mishra [6]. Authors in [7] constructed several regression frameworks to predict agricultural productivity using data mining techniques. Time series data, soil, and meteorological factors are used to examine various saplings productivity of
Crop Yield Forecasting with Precise Machine Learning
447
Table 1 Attribute list of dataset Attribute Description States
Different states of India like Tripura, Andhra Pradesh, Goa, Mizoram, Odisha, Maharashtra, Telangana, Bihar, West Bengal, Rajasthan, Puducherry, and Jharkhand
Crops
Different crops like rice, kharif pulses, coconut, potato, millets, fruits, castor seeds, groundnut, bajra, vegetables, fibers, cotton, pepper, oilseeds, rabi pulse, and cucumber
residential crops. Applying association rules on a farm-based dataset, Manjula’s et al. study [6] presented and applied a rule-oriented model for forecasting crop productivity from historical data. Crop yield predictions are made using the CNN-RNN model by researchers in [8]. It’s employed to combine delay correlations between environmental factors and gene-based enhancement of saplings without displaying genotype information. Research work in [9] employed the random forest model for the prediction of crop productivity. Authors in [10] used various advanced neural network techniques to determine the optimal crop yield prediction architectures. Vinson, S. Seth A. Joshua for the purpose of predicting crop yield, studied in [11] employed regression-based networks and other linear classifiers.
3 Data Source and Datasets Dataset acquisition in the Indian subcontinent is a little challenging because the necessary datasets have not been officially compiled, although there are scattered datasets that, when combined, can produce the desired results [12]. Data related to crops were used throughout the following Table 1. For effective interpretation of features available in the samples, we also used machine learning techniques to visualize its features. We produced a matrix corresponding to the information displaying the relationships among variables to aid in understanding the dataset. We can see different attributes in the above table, and an example of the following dataset can be shown in Fig. 1. There are possibly more than two lakh datasets accessible that, when combined, can be used to deliver the desired yield. We have used a variety of machine learning algorithms to visualize the dataset [13]. We have used pairplot graph Fig. 2 for the various features to help us comprehend the dataset’s complexity and find relationships and trends.
4 Proposed Methodology The use of data is crucial to machine learning. A technique called data preprocessing is utilized to turn the unstructured samples into filtered samples [14]. Dataset is acquired from various repositories; however, because they are collected in raw form, analysis is not possible. We can change data into a comprehensible format by using
448
Fig. 1 Crop dataset sample
Fig. 2 Graphical representation of features and their relationships
S. Verma et al.
Crop Yield Forecasting with Precise Machine Learning
449
various strategies, such as substituting missing values and null values. The division of training and testing data is the last step in the data preprocessing process. Due to the fact that system’s training typically needs many instances, the data typically tend to be distributed unevenly. Data used to train model is the initial set of data used to instruct learning models to know ways to make correct estimations. • Factors Affecting the Crop Yield Any crop’s yield and production are influenced by a number of variables. In essence, these variables aid in the prediction of crop yield over a specific time frame. We took into account variables like area, temperature, rainfall, humidity, and wind speed in this research. • Different Machine Learning algorithms Before determining the prospective algorithm that best matches this specific dataset, we must first evaluate and contrast the options available. Machine learning provides the most practical method for resolving the crop production issue [15]. A wide range of data analytics methods are used to predict farming outcome. Crucial predictive methods used to select and compare are as follows: • Linear regression: The likelihood of a desired feature is estimated with the use of regression method. There are only two eligible classes since the goal or dependent variable has a dual character. This regression model computes if an independent feature and other dependent features are linearly interlinked. The logistic regression approach produces an accuracy of 87.8% on our dataset. • Random Forest: Random forest has the capacity to examine how sapling productivity is influenced by the prevailing environmental elements and geographical variation [16]. It is a predictive method widely used to solve categorization-enabled domain. This model creates decision trees from multiple data instances, forecast data from every sub-sample and later evaluates the best solution by vote-enabled approach. Random forest trains the data using the bagging method, which improves the outcome accuracy. RF offers a 90.47% accuracy for our data. As a result, we will use the random forest algorithm to analyze our data because it performs more accurately than the alternative algorithm The random forest model is represented by a diagram in Fig. 3 and operates in the following ways: 1. When the algorithm is started, the datasets are loaded in the model. Graphs are made according to the first step, and random samples are taken from the datasets that are then processed to get them in suitable form to construct decision trees. 2. Decision trees are formed using attribute selection process and the selected attributes are data points chosen by the user. Then the decision trees create some set of rules and formulas to predict the result using different sets of data and from different rules for prediction. 3. The result from each decision tree is taken and voted upon by the random forest classifier, and the result that gets the highest votes get selected for the final result.
450
S. Verma et al.
Fig. 3 Proposed model of crop yield prediction with random forest
4. The final result are displayed, and graphs are made according to the result. The random forest algorithm gets illustrated in pseudocode (1). Out of all the features, K random features can be chosen using the best split point scheme. Then, N trees are produced, each with a d node and several daughter nodes. In this area of prediction, random forest provides the highest accuracy because it trains N numbers of trees, and more trees lead to greater accuracy. It can manage enormous volumes of data. The proposed system’s pseudocode (1) is as follows: 1. First, we choose ‘k’ number of attribute at random from the overall ‘m’ attribute in system. 2. The k feature is chosen using the best split point, and the dth node is determined. 3. Using split function, nodes are partitioned into siblings. 4. Iterate phases 1–3 till various nodes are reached. 5. Iterate phases 1–4 till n loops to make n number of trees. The voting process is highlighted in pseudocode (2) to provide the final result. Each trained tree utilizes a random set of data to predict an outcome for each event. This process is repeated numerous times, saving the results for each event. Next, the voting process is started, and each tree casts votes for each outcome. The outcome with the most votes is then chosen as the outcome for the event. If two results are in conflict, the data are again voted on, and the result with the highest number of votes is chosen. The trained algorithm predicts using the pseudocode (2) below: 1. Testing attributes were utilized for the prediction of outcome, and it was saved. 2. Vote from every decision tree of individual estimation was computed. 3. Then the most fruitful predicted result is considered and taken as the model’s prediction. The random forest classifier’s predicted crop was translated to generate the estimated sapling. The crop production was determined by dividing the zone typed manually by the evaluation. Figure 4 shows the yield for the following dataset:
Crop Yield Forecasting with Precise Machine Learning
451
Fig. 4 Yield calculation for different crops
5 Result and Analysis There are around 2 lakh records in the collection. We have tested the supplied dataset using various machine learning methods in order to choose the preferred algorithm for the study. First, we examined the dataset using the random forest and linear regression algorithms. Since the random forest algorithm’s R2 score was higher than the linear regression, which determined the coefficient of determination at − 66.59 < 0.95, it is obvious that linear regression is not suitable for the dataset. This is so because in linear regression, homoscedasticity, multivariate normality, and the absence of multicollinearity are all presumptions that the dataset should meet. We can observe the distinction in the plotted graph in Fig. 5, which illustrates the prediction of yield using the model and both algorithms. After comparing linear regression and random forest regression, we performed an analysis on decision trees, which revealed that the decision tree’s R2 value was 0.93 as shown in Fig. 6, It is significantly lower than the R2 score of the random forest, which indicates that the random forest was the most effective technique for the dataset in question, with an accuracy of 95.3%, and a standard deviation of 4.72%, as shown.
Fig. 5 Crop yield prediction with linear regression and random forest model
452
S. Verma et al.
Fig. 6 Prediction of yield using decision tree regression
6 Benefits of Using Random Forest Model • Reduced overfitting: Decision trees are prone to overfitting because they tightly map all instances in train set. In the presence of several decision trees in random forest, the model will not only overfit but also because computing the mean of associated trees minimizes the predictive bug. • Flexibility provision: Random forest is a famous analytics technique since it may deal with both regression and classification with enhanced preciseness. Because attribute bagging maintains accuracy when a section of samples is not present, the random forest is an efficient means to estimate missed out instances. • Easy to determine feature importance: Random forest simplifies the analysis of feature significance in the system. There are several methods to determine the significance of such a variable. The mean decrease in impurity (MDI) and GINI importance are usually utilized to determine the model’s preciseness that reduces when a specific feature is removed. The other vital parameter is permutation importance, also called as mean decrease accuracy (MDA). By increasing the values of variables at random, MDA calculates the mean reduction in precision.
7 Conclusion and Future Work This study outlined a variety of predictive methods to estimate agricultural output based on area, season, temperature, and rainfall. Studies using datasets from the Indian government have shown that the random forest regression has the best accuracy for predicting yield. This will enable the farmers in India to determine the yield
Crop Yield Forecasting with Precise Machine Learning
453
they may anticipate in a particular climate and adjust the timing of crop planting accordingly. In the following years, we can try to create a data-independent system as currently, the system is heavily data-dependent. Our system must perform accurately regardless of format. Since crop selection also takes soil knowledge into account, it is advantageous to incorporate soil information into the system. Effective irrigation is also necessary for crop cultivation. Rainfall may show whether more water availability is needed or not.
References 1. Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:101093 3404324. 2. Muthaiah U (2018) Predicting yield of the crop using machine learning algorithm. Int J Eng Sci Res Technol (IJESRT) (5.164, UGC approved) 3. Mishra S, Mishra D, Santra G (2016) Applications of machine learning techniques in agricultural crop production: a review paper. Indian J Sci Technol 9. https://doi.org/10.17485/ijst/ 2016/v9i38/95032 4. Breiman L (2001) Random forests. Mach Learn 45:5–32 5. Mahore P, Bardekar D (2021) Crop yield prediction. Int J Sci Res Comput Sci Eng Inform Technol 561–569. https://doi.org/10.32628/CSEIT2173168 6. Champaneri M, Chachpara D, Chandvidkar C, Rathod M (2020) Crop yield prediction using machine learning. Int J Sci Res (IJSR) 9:2 7. Khaki S, Wang L, Archontoulis SV (2020) A CNN-RNN framework for crop yield prediction. Front Plant Sci 24(10):1750. https://doi.org/10.3389/fpls.2019.01750.PMID:32038699; PMCID:PMC6993602 8. van Klompenburg T, Kassahun A, Catal C (2020) Crop yield prediction using machine learning: a systematic literature review. Comput Electron Agric 177:105709. https://doi.org/10.1016/j. compag.2020.105709 9. Anakha V, Aparna S, Jinsu M, Rima M, Vinu W (2021) Crop yield prediction using machine learning algorithms. Int J Eng Res Technol (IJERT) NCREIS 09(13) 10. Vinson Joshua S, Selwin Mich Priyadharson A, Kannadasan R, Ahmad Khan A, Lawanont W et al (2022) Crop yield prediction using machine learning approaches on a wide spectrum. Comput Mater Continua 72(3):5663–5679 11. Lontsi Saadio C, Adoni WYH, Aworka R, Zoueu JT, Kalala MF, Kimpolo CLM (2022) Crops yield prediction based on machine learning models: case of west African countries. Available at SSRN: https://ssrn.com/abstract=4003105 or https://doi.org/10.2139/ssrn.40 12. Mishra S, Jena L, Tripathy HK, Gaber T (2022) Prioritized and predictive intelligence of things enabled waste management model in smart and sustainable environment. PLoS ONE 17(8):e0272383 13. Tripathy HK, Mishra S, Suman S, Nayyar A, Sahoo KS (2022) Smart COVID-shield: an IoT driven reliable and automated prototype model for COVID-19 symptoms tracking. Computing 1–22 14. Suman S, Mishra S, Sahoo KS, Nayyar A (2022) Vision navigator: a smart and intelligent obstacle recognition model for visually impaired users. Mobile Inform Syst 15. Sivani T, Mishra S (2022) Wearable devices: evolution and usage in remote patient monitoring system. In: Connected e-health. Springer, Cham, pp. 311–332 16. Mohapatra SK, Mishra S, Tripathy HK, Alkhayyat A (2022) A sustainable data-driven energy consumption assessment model for building infrastructures in resource constraint environment. Sustain Energy Technol Assess 53:102697
Recommendation Mechanism to Forge Connections Between Users with Similar Interests Indrakant Dana, Udit Agarwal, Akshat Ajay, Saurabh Rastogi, and Ahmed Alkhayyat
Abstract This paper addresses the problem individuals face when searching for people with similar interests. It is common for us to look for people who have similar interests when traveling to an unfamiliar city. A person’s interests can range from films or music to hobbies, certain personality traits, lifestyle, and more. These traits and interests will be input by the user during the time of registration and will be a part of their profile. The aim of this research is to develop a matchmaking system based on user profiles to recommend like-minded individuals. Users’ profiles contain the text that serves as their fingerprints which can be embedded in a vector on a dimensional space using deep learning. With this project, several widely used and newly developed clustering algorithms, such as K-means, hierarchical agglomerative clustering, and DBSCAN, are scrutinized in order to categorize large-scale behavioral data into groups of individuals with comparable interests. The effectiveness of these algorithms is measured and assessed using various performance evaluation metrics on the basis of simplicity, efficiency, and accuracy. Keywords Recommendations · Clustering · DBSCAN · Agglomerative hierarchical · K-means · User-matching systems
I. Dana (B) · U. Agarwal · A. Ajay · S. Rastogi Maharaja Agrasen Institute of Technology, New Delhi, India e-mail: [email protected] U. Agarwal e-mail: [email protected] A. Ajay e-mail: [email protected] S. Rastogi e-mail: [email protected] A. Alkhayyat College of Technical Engineering, The Islamic University, An Najaf, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_39
455
456
I. Dana et al.
1 Introduction The World Wide Web is undergoing a revolution that encourages users to engage through platforms like blogs, forums, wikis, and YouTube. This has resulted in stronger and more frequent online interactions between individuals. The number of online social networks is increasing as web usage increases. Users are increasingly using online social networks to connect with new friends or similar users due to improved web technology and rising popularity. Social networking sites, like Twitter, Instagram, and Facebook, have gained millions of active users in the past decade. As of January 2022, Twitter has 436 million monthly active users, Instagram has 1.47 billion, and Facebook has 2.91 billion. These sites have changed how we communicate and stay connected with friends [1]. Social matching is a new way to connect people through computation. Without an automatic selection method, it becomes difficult and ineffective to choose the best candidate from a large group of applicants. Many social networks are adopting social matching as a way to recommend potential matches and make it easier to find connections. This is often used for dating and forming friendships, with popular platforms like Tinder and OkCupid focusing on dating and others like Happn focusing on opportunistic connections with strangers. In September 2020, 64% of respondents in a Rakuten Insight survey used Tinder and 24% used OkCupid. The online dating market in India is expected to reach US$70.64 million in 2022 and 28.4 million users by 2027, with a user penetration of 1.8% in 2022 and 1.9% in 2027. The USA has the highest user penetration in the online dating industry at 17.2% and will generate the most revenue at $1290.00 m in 2022. The increase in online social networking has broadened the consumer base for online dating and made it more socially acceptable [2]. Social Networks rely on connections among users, which can be formed through sub-networks of individuals with shared interests, opinions, and lifestyles. The main focus of this paper is on the social matching of individuals or groups for professional collaboration and value co-creation such as mentoring, networking, recruiting, and community building, as well as matching buyers and sellers on ecommerce platforms [3]. User-matching systems use techniques like collaborative filtering and machine learning to calculate the likelihood of a successful match based on factors like age, interests, location, and other personal information provided by users and offer personalized recommendations based on their behavior, interests, and preferences. This results in a more engaging user experience and allows users to specify their preferences for potential partners. Additionally, businesses can improve customer relationships and engagement by providing personalized recommendations. This can lead to increased customer satisfaction and retention. Overall, user-matching systems are a valuable technology that can help businesses to improve the user experience and increase conversions and have become an integral part of the online matching experience, providing individuals with a more efficient and effective way to find potential partners. Various studies have been conducted in the past focusing on methods [4] used in the formation of connections between users. Despite this, research on social matching systems often includes several limitations
Recommendation Mechanism to Forge Connections Between Users …
457
in one or more areas, including limited personalization, lack of transparency and fairness in the matching algorithm, limited scope and functionality, difficulty in maintaining user engagement, and privacy concerns. By providing personalized recommendations and targeting marketing efforts more effectively, individuals are able to save time, gain control over their matching experience, and reduce the likelihood of rejection. These systems can be highly effective in helping users find other users with whom they are compatible, but they can also have potential drawbacks, such as bias or discrimination. It is important for designers of user-matching systems to carefully consider these issues and take steps to mitigate them.
2 Related Works There have been numerous studies and research projects focused on developing effective user matchmaking algorithms in various contexts. In particular, research presented in [5] introduces a collaborative filtering recommendation algorithm that utilizes user interest clustering to address the limitations of traditional algorithms. This approach has been shown to have higher recommendation efficiency and precision through experimental results. Mendonça [6] proposes an ant colony-based algorithm to solve the optimization problem of clustering/matching individuals in a social network, with numerical results indicating that the algorithm can successfully perform clustering with a variable number of individuals. The authors in [7] focus on the application of online social games (OSGs) in the field of user-matching systems and the use of social and complex network analysis tools to extract and model relationships between users. A formalism is proposed for extracting graphs from large datasets of OSG users and considering aspects such as game participation, adversarial relationships, and match outcomes. The influence of different threshold values on the resulting OSG graph properties is analyzed using two novel large-scale datasets. The research suggests that an analysis of multiple graphs could be used to improve matchmaking for players and could contribute to the development of more effective user-matching systems in the future. Bin et al. [8] present a tag-based common interest discovery approach in online social networks, suggesting that user-generated tags are effective for representing user interests due to their ability to more accurately reflect understanding. This approach can effectively discover common interest topics in online social networks, such as Douban, without any information on the online connections among users. Li et al. [9] introduce the project Match-MORE, which aims to address the issues surrounding proximitybased mobile social networks by utilizing the concept of friends-of-friends to find common connections among friends and design a private matching scheme. This is achieved through the use of a novel similarity function that takes into account both the social strength between users and the similarity of their profiles. The use of Bloom filters to estimate common attributes helps to reduce system overhead, and the security and performance of this project have been analyzed and evaluated through
458
I. Dana et al.
simulations. The authors in [10] present a method for matching user accounts based on user generated content (UGC). They propose a UGC-based user identification model using a supervised machine learning solution that has three steps: measuring the spatial, temporal, and content similarities of two UGCs, extracting the corresponding features, and using a machine learning method to match user accounts. The proposed method is tested on three datasets and demonstrates excellent performance with F1 values reaching 89.79%, 86.78%, and 86.24%, respectively. This work presents a solution for user identification when user profile attributes are unavailable or unreliable. In addition to studies on user recommendation mechanisms, research on group recommendation systems has also provided valuable insights. Sangeetha et al. [11] investigate online dating networks, modeling the network as a graph and performing an analysis using social network analysis (SNA) methods. The proposed system utilizes a recommender system that combines both the attributes of the vertices (individual users) and the interactions within the network, leading to a 12% improvement in accuracy for identifying compatible and potentially successful matches within the network. Qin et al. [12] present a method for recommending social groups in online services by extracting multiple interests and adaptively selecting similar users and items to create a compact rating matrix for efficient collaborative filtering. This approach was evaluated through extensive experiments. Cheng et al. [13] introduces a collaborative filtering recommendation method based on users’ interests and sequences (IS), defining the concept of ’interest sequences’ to depict the dynamic evolution patterns of users’ interests in online recommendation systems. The method for calculating users’ similarities based on IS and predicting users’ ratings for unrated items is presented, and the effectiveness of the proposed recommendation method is verified through comprehensive experiments on three datasets. The algorithms employed by current matchmaking applications based on interests are not open source and are kept confidential. Our objective is to examine various clustering algorithms [14] and determine the most effective approach for providing more accurate matches. Our analysis and investigation suggest that we develop a user recommendation mechanism by integrating multiple clustering algorithms [8, 14], such as DBSCAN clustering and hierarchical clustering using various linkages.
3 Methodology 3.1 Preliminaries The data needed by matchmaking applications to suggest potential partners can be broken down into the following categories: • Demographics: age, gender, location, education level, occupation • Interests: hobbies, activities, sports, music, movies, books • Personal values: political views, religious beliefs, relationship preferences
Recommendation Mechanism to Forge Connections Between Users …
459
• Personality traits: introversion/extroversion, openness, conscientiousness, agreeableness, neuroticism • Social connections: friends, family, colleagues, shared connections with other users • Behavioral data: online activity, preferences, interactions with other users on the platform. We would need these user data which would contain mentioned data about users such as bio and interests in different topics with interest being measured in the range 0–9. Additionally, we would need to collect data on the compatibility between individuals, such as their common interests and compatibility score. This information would be used to match individuals with each other based on their preferences and compatibility.
3.2 Data Gathering and Generation The machine learning algorithms we have devised require a significant quantity of data to train our model. Because such a dataset is not publicly available on the internet, we generate fake user profile data. Various websites can assist us in generating a large quantity of fake data. By using ’BeautifulSoup’ for web scraping, we can turn the generated data into a data frame. For each user, we need data that show their interests in different categories like movies, sports, and politics. This can be done by randomly assigning numbers from 0 to 9 to each category in the data frame. User bios are potentially one of the better ways to determine the similarity between two users and determine if they are a good match for each other or not. We can preprocess the user biodata by using NLP and find out the important words used majorly by users in their bios, across the platform. Using the NTLK library, we can perform tokenization and lemmatization of user biodata, i.e., splitting up full sentences into individual words and converting words into their basic form. For example, converting ‘Joking’ into ‘Joke’. We can take an extra measure to exclude words like ‘a’, ‘the’, ‘of’, etc. Our next step would be making a set of unique words with their corresponding usage frequency by users in their bio, across the platform. Many of these words might be adjectives to a noun, like ‘c++ enthusiast’. We need to pair such words. We make a list of such words and find their frequency scores and build up our data frame for the next step (Fig. 1). Scaling our categories of interests, which generalize data points, is crucial for better algorithm performance in later phases because it reduces the distance between the data points. To improve the performance of our clustering algorithm, we will proceed to scale the categories (such as movies, TV, and religion). This will reduce the time required to fit and transform the algorithm to the dataset. The next phase entails structuring the data into a series of numerical vectors, where each vector stands for a particular data point. This is known as the vectorization of the biodata of users. We will employ the count vectorization and TFIDF vectorization techniques. To
460
I. Dana et al.
Fig. 1 Data frame containing bios and different categories for each user profile
determine the best vectorization technique, we shall test both ideas. The two distinct data frames created by the two methods mentioned above will be combined and scaled into a new dataset. Because there are too many dimensions in the dataset, we will use principal component analysis (PCA). It is a statistical method for decreasing a dataset’s dimensionality. It accomplishes this by translating the data into a new coordinate system with principal components as the new axes. The techniques discussed above can be used to locate the ideal number of clusters based on evaluation metrics like the silhouette coefficient, Davies-Bouldin score, and Calinski-Harabasz score once our data is ready. These metrics will assess how effectively the clustering algorithms work. After an in-depth analysis of the algorithms using the scoring metrics, we concluded that the optimal number of clusters for a better match probability is 2. We will then train a classification model using the optimal cluster value and try to train a classified model having better accuracy. When a user wants to look for similar users, their data will be considered new. Entering new data into existing clusters will require running clustering algorithms all over again. And before that, the new data would go through NLP processing, tokenization, scaling, vectorization, and PCA. After this, clustering algorithms would run and provide the top 10 matches to the user.
3.3 Algorithms Used K-Means Clustering K-means clustering [15–17] is an unsupervised machine learning algorithm for clustering data into k groups or clusters. The goal of the algorithm is to partition the data into clusters such that the data points within a cluster are more similar to each other than they are to data points in other clusters. The algorithm works by first initializing k centroids or the center points of the clusters. Then, the data points are assigned to the cluster whose centroid is closest to
Recommendation Mechanism to Forge Connections Between Users …
461
the data point. The centroids are then updated based on the mean of the data points assigned to the cluster. This process is repeated until the centroids no longer move or the assignments of data points to clusters stop changing. One of the main advantages of K-means clustering is that it is fast and efficient, especially for large datasets. However, it can be sensitive to the initial placement of the centroids and may not always produce the best possible clusters. DBSCAN Clustering Density-based spatial clustering of applications with noise (DBSCAN) is a densitybased clustering algorithm [10, 18]. It works by identifying clusters of high density, defined as groups of points that are closely packed together, and marking points that are not part of any cluster as noise. The algorithm starts by identifying a point at random and then searching its surrounding area to find other points that are close by. If it finds enough points in the surrounding area to form a cluster, it will mark all of those points as part of the cluster. If it does not find enough points, it will mark the starting point as noise. One of the advantages of DBSCAN is that it does not require the user to specify the number of clusters in advance. This is useful because the number of clusters in a dataset is often not known beforehand. It also can identify and label points that are not part of any cluster, which is useful for outlier detection (Fig. 2). HDBSCAN HDBSCAN is an implementation [19] of the DBSAN clustering algorithm that can handle data with varying densities. It is an extension of DBSCAN that is able to automatically determine the appropriate value for the density threshold parameter, which controls how tightly packed the points in a cluster need to be. HDBSCAN uses a hierarchical approach to build clusters, starting with the largest clusters and then adding smaller clusters until all points have been assigned to a cluster. This allows
Fig. 2 DBSCAN algorithm was utilized to generate two clusters, which consist of three types of points: key points (green) that meet the criteria for clustering, border points (blue) that do not meet the clustering criteria but are within the reach of a key point, and noise points (black) that do not fit into either
462
I. Dana et al.
it to find clusters of different densities and shapes and to identify points that are not part of any cluster. One of the advantages of HDBSCAN is that it can handle datasets with a large number of dimensions, which can be challenging for other clustering algorithms. It is also able to handle noisy or outlier data well, making it a good choice for data that may not be well-behaved. However, like DBSCAN, it is sensitive to the choice of parameters and can be affected by the presence of noise or outliers in the dataset. Agglomerative Hierarchical Clustering Agglomerative hierarchical clustering is a type of clustering algorithm that is used to group data into clusters [20]. It is a bottom-up approach, meaning that it starts by treating each data point as a separate cluster and then iteratively merges clusters until all points are part of a single cluster or a predetermined number of clusters have been formed. The algorithm first calculates the distance between each pair of data points. Then, it iteratively merges the two closest clusters, based on the distance between their points. This process continues until all points are part of a single cluster or the desired number of clusters has been formed. One of the advantages of agglomerative hierarchical clustering is that it allows the user to specify the number of clusters they want to find, which is not possible with other clustering algorithms such as K-means.
4 Performance Evaluation Metrics Used 4.1 Silhouette Coefficient Based on the difference between the average distance to points in the closest cluster and to points in the same cluster, this metric measures the cohesiveness and separation of clusters. Each sample’s silhouette coefficient is defined, and it consists of two scores: a: sample’s average distance from every other point in its class. b: The average separation between a sample and every other point in the next cluster. The silhouette coefficient s for a single sample is then given as: s=
b−a max(a, b)
(1)
It is a useful tool for evaluating the performance of a clustering algorithm and for selecting the appropriate number of clusters for a dataset. The score is limited to a range of − 1 for poor clustering and + 1 for dense clustering. Scores close to zero indicate that the clusters overlap.
Recommendation Mechanism to Forge Connections Between Users …
463
4.2 Calinski-Harabasz Index The variance ratio criterion, widely known as the Calinski-Harabasz index, is determined by dividing the total inter-cluster and intra-cluster dispersion across all clusters (where the dispersion is the sum of squared distances). The Calinski-Harabasz index is calculated as CH =
BGSS K −1 WGSS N −K
=
N−K BGSS × WGSS K −1
(2)
where N is the total number of observations; K is the total number of clusters; BGSS is the between-group sum of squares (between-group dispersion); and WGSS is the within-group sum of squares (within-group dispersion). Given that the observations within each cluster are more closely spaced apart (denser), a high CH results in better clustering (well-separated).
4.3 Davies-Bouldin Score The Davies-Bouldin index is a commonly used validation statistic to determine the optimal number of clusters in a dataset. The index measures the average similarity of the clusters, where similarity is based on cluster size and distance between clusters. A lower Davies-Bouldin index indicates better separation of clusters. The score is defined as a ratio of cluster scatter to cluster separation, with a minimum possible score of zero, and scores closer to zero denote better partitioning. The computation of the Davies-Bouldin index is less complicated compared to silhouette scores, and it relies solely on point-wise distances within the dataset.
5 Results In this study, we aimed to evaluate the performance of different clustering algorithms on our datasets. In K-means clustering, the optimal value of clusters for the model came out to be the average value of all the optimal k values (from silhouette score, distortion score, Calinski-Harabasz score, and Davies-Bouldin score), i.e., k = 11. On the training classification model using this optimal value, we got an F1 -core of 94% (Figs. 3, 4, 5, and 6). In DBSCAN clustering, the minimal sample of clusters for the model came out to be the average value of all the optimal (from silhouette score, Calinski-Harabasz score, and Davies-Bouldin score), i.e., min_sample = 12. On the training classification model using this optimal value, we got an F1-score of 93% (Figs. 7, 8, and 9).
464
I. Dana et al.
Fig. 3 Silhouette score for different values of k
Fig. 4 C-index minimum for k = 19.5
In agglomerative clustering, the optimal number of clusters for the model came out to be the average value of all the optimal (from silhouette score, Calinski-Harabasz score, and Davies-Bouldin score), i.e., k = 4. On the training classification model using this optimal value, we got an F1-score of 92% (Figs. 10, 11, and 12). In HDBSCAN clustering, the min_samples in the case of metric ‘leaf’ is 4 and the value of n_clusters is 2. On training with classification models, we get an F1-score of 95% (Fig. 13 and Table 1). In summary, the study aimed to evaluate the performance of different clustering algorithms on the datasets. The optimal value of clusters for K-means clustering was
Recommendation Mechanism to Forge Connections Between Users …
465
Fig. 5 Davies-Bouldin score for K-means
Fig. 6 Gap-statistic lower on the left side
11, which resulted in an F1-score of 94%. For DBSCAN clustering, the optimal value of min_samples was 12, which resulted in an F1-score of 93%. In agglomerative clustering, the optimal number of clusters was 4, which resulted in an F1-score of 92%. The best results were obtained from HDBSCAN clustering, with a min_ samples value of 4 and n_clusters of 2, resulting in an F1-score of 95%. Although the results of the study showed the performance of different clustering algorithms on the datasets, it is essential to keep in mind the scope of future work that can be done. For example, the study only used a limited number of evaluation metrics, and
466
I. Dana et al.
Fig. 7 Silhouette score for different values of k
Fig. 8 C-index minimum for k = 19.5
there might be other metrics that could give better results. Additionally, the study could be extended to evaluate the performance of different clustering algorithms on different types of datasets, including high-dimensional data, imbalanced data, and time series data. The results of the study can also be used as a starting point for developing more advanced and robust clustering algorithms that can perform better on real-world data.
Recommendation Mechanism to Forge Connections Between Users …
467
Fig. 9 Davies-Bouldin score for DBSCAN
Fig. 10 Silhouette score for AG clustering
6 Conclusions In this paper, three clustering methods (K-means, hierarchical agglomerative clustering, DBSCAN, and spectral clustering) have been considered where we evaluated the performance of these algorithms using three metrics: silhouette score, CalinskiHarabasz index, and Davies-Bouldin score. K-means and hierarchical clustering algorithms tend to perform well in terms of efficiency and simplicity, but may not
468
I. Dana et al.
Fig. 11 C-Index score for different values of k
Fig. 12 Davies-Bouldin score for AG
always produce the most accurate results. On the other hand, density-based algorithms such as DBSCAN can produce more accurate clusters, but may be more computationally complex and require more fine-tuning of parameters. In conclusion, the performance of clustering algorithms for matchmaking systems can vary greatly depending on the specific characteristics of the data and the desired outcomes of the system. Overall, it is important to carefully evaluate the strengths and limitations of each clustering algorithm and consider the specific needs of the matchmaking system
Recommendation Mechanism to Forge Connections Between Users …
469
Fig. 13 Number of points versus λ-value
Table 1 Table containing scores of different metrics Algorithms
Silhouette score
K-means clustering
0.0294
Agglomerative clustering
0.034494
DBSCAN clustering
0.1110
Calinski-Harabasz 91.5824 103.061426 15.3085
Davies-Bouldin 4.9644 4.352123 2.1883
before deciding on the most appropriate approach. By carefully selecting and implementing the right clustering algorithm, matchmaking systems can effectively group individuals or items together in a way that maximizes compatibility and satisfaction. Further research should be conducted to explore the potential of other clustering algorithms and to optimize their performance in different scenarios.
References 1. Statista. Global social media ranking 2022. https://www.statista.com/statistics/272014/globalsocial-networks-ranked-by-number-of-users/ 2. Statista. Online dating India—revenue highlights 2022. https://www.statista.com/outlook/dmo/ eservices/dating-services/online-dating/india 3. Olsson T, Huhtamäki J, Kärkkäinen H (2020) Directions for professional social matching systems. Commun ACM 63(2):60–69 4. Ren J et al (2021) Matching algorithms: fundamentals, applications and challenges. In: IEEE transactions on emerging topics in computational intelligence, vol 5, no 3 5. Yu Y., Zhou Y (2017) Research on recommendation system based on interest clustering. AIP Conf Proc 1820:080021 6. Mendonça L (2014) An approach for personalized social matching systems by using ant colony. Social Netw 03:102–107 7. van de Bovenkamp R, Shen S, Iosup A, Kuipers F (2013) Understanding and recommending play relationships in online social gaming. In: Fifth international conference on communication systems and networks (COMSNETS), Bangalore, India 8. Bin S, Sun G, Zhang P, Zhou Y (2016) Tag-based interest-matching users discovery approach in online social network. Int J Hybrid Inform Technol 9:61–70
470
I. Dana et al.
9. Li F, He Y, Niu B, Li H, Wang H (2016) Match-MORE: an efficient private matching scheme using friends-of-friends’ recommendation. In: International conference on computing, networking and communications (ICNC), pp 1–6 10. Li Y, Zhen Z, You P, Hongzhi Y, Quanqing X (2018) Matching user accounts based on user generated content across social networks. Future Gen Comput Syst 83:104 11. Sangeetha K, Nayak R, Chen L (2014) A people-to-people matching system using graph mining techniques. World Wide Web 17(3):311–349 12. Qin D, Zhou X, Chen L, Huang G, Zhang Y (2020) Dynamic connection-based social group recommendation. IEEE Trans Knowl Data Eng 32(3):453–467 13. Cheng W, Yin G, Dong Y, Dong H, Zhang W (2016) Collaborative filtering recommendation on users’ interest seq. PLOS ONE 11:e0155739 14. Bindra K, Mishra A (2017) A detailed study of clustering algorithms. In: 6th International conference on reliability, infocom technologies and optimization (ICRITO) 15. Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc Ser C 28:100 16. Na S, Xumin L, Yong G (2010) Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: Third international symposium on intelligent information technology and security informatics, pp 63–67 17. Sinaga KP, Yang M-S (2020) Unsupervised k-means clustering algorithm. IEEE Access 8:80716–80727 18. Tran T, Drab K, Daszykowski M (2013) Revised DBSCAN algorithm to cluster data with dense adjacent clusters. Chemomet Intell Lab Syst 120:92–96 19. Ackermann MR, Blömer J, Kuntze D et al (2014) Analysis of agglomerative clustering. Algorithmica 69:184–215 20. Stewart G, Al-Khassaweneh M (2022) An implementation of the HDBSCAN* clustering algorithm. Appl Sci 12:2405
Identification of Device Type Using Transformers in Heterogeneous Internet of Things Traffic Himanshu Sharma, Prabhat Kumar, and Kavita Sharma
Abstract Internet of Things (IoT), which connects a large number of intelligent devices and smart sensors, has made it easier for individuals to maintain and improve their lives. Modelling the communication patterns of Internet of Things (IoT) devices to support attack protection is difficult due to the heterogeneity of connected devices and the variety of communication protocols in IoT. Identification is one of the difficulties in the many methods for safeguarding the IoT environment due to the complicated association between the categories of IoT devices and the patterns of their communication behaviours. These devices may be divided into many kinds based on communication patterns and functioning traits. IoT traffic does, however, contain a sizeable amount of aberrant data, for example, attack activity originating from hacked devices that cannot accurately predict the original device’s behaviour. To solve the aforementioned issues, commendable IoT device type detection approach is provided in this study. The proposed strategy is composed mostly of three parts. First, the generated data from IoT devices are categorised into normal and abnormal using a transformer-based traffic diagnostic model. Later, transformer model is applied to the typical traffic in order to determine the category of Internet of Things device being used. The usefulness of the strategy, which offers a substantial improvement in accuracy and F1-score compared to existing methods, has been demonstrated by experimental findings. Keywords IoT · Device identification · Machine learning · Deep learning H. Sharma (B) · P. Kumar Computer Science and Engineering Department, National Institute of Technology Patna, Patna, India e-mail: [email protected] P. Kumar e-mail: [email protected] K. Sharma Computer Science and Engineering Department, Galgotias College of Engineering & Technology, Greater Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_40
471
472
H. Sharma et al.
1 Introduction Despite the fact that the IoT’s fast growth has improved industrial productivity and made life easier for consumers, it has also introduced a slew of new challenges. As more and more types of Internet of Things data are transmitted via networks, network service providers are confronted with newer challenges in traffic management, service quality assurance, and the provision of individualised services [1]. Defending networks now require much fewer resources owing to the prevalence of unsecured IoT devices. Malicious traffic-based attacks, like distributed denial of service (DDoS) attacks, have grown in volume and sophistication in recent years. Better and more in-depth knowledge of communication characteristics for IoT devices is required to find effective solutions to these issues. Therefore, it has been of broad importance in the academic community to predict IoT devices’ communication behaviours depending on traffic characteristics. Due to the vast category of IoT devices as well as network protocols, a wide range of how these devices interact with one another. While environmental monitoring equipment like cameras demand high bandwidth and continuous connection, smart water metres and energy metres often communicate on a periodic basis with low frequency and low bandwidth. That is why it is not only impractical, but also unscientific, to create a single behavioural model to describe all IoT devices. Analysing communication traffic according to the particular types of IoT devices has been proven to be an efficient technique to minimise the complexity of behavioural modelling [2, 3]. In order to improve quality of service (QoS) and attack detection in software, it is essential to categorise IoT devices before modelling their communication patterns. Unfortunately, the three key obstacles listed below are encountered by the current methods of machine learning-based categorisation techniques for IoT devices. First, a sizeable fraction of today’s IoT traffic is harmful. After an IoT device has been infiltrated, it will often send malicious communications in accordance with the commands of the attacker, which is not typical of the device type. Because it acts as noise, it drastically limits the precision of the device classification models. Second, feature engineering is sometimes used to pick the input features by hand, and those features are subsequently used as inputs to the model. However, the classification accuracy of the models is typically constrained by the imprecise feature selection that occurs as a result. In addition, various feature screening techniques are required for different contexts when using classic ML-based approaches like naive Bayes (NB), support vector machine (SVM), K-nearest neighbours (KNN), and convolutional neural network (CNN). Third, there is substantial unpredictability and variation in the amount of IoT traffic at certain measurement points due to changes in business requirements. Additionally altering the make-up of IoT traffic is the portability of the accessed devices. As a result, the typical habits of IoT devices cannot be well described by looking at only their short-term traffic patterns. Few current ML techniques, like long short-term memory (LSTM), can capture temporal information, but their capacity for
Identification of Device Type Using Transformers in Heterogeneous …
473
learning long-term temporal relationships is restricted. To fix the issue completely, feeding more variables to the model as input is not the remedy. So, to obtain high accuracy of classification for IoT device types, it is necessary to design algorithms that can capture lengthy behavioural data. This research develops a unique dual-stage, deep learning-based IoT device classification framework and applies the transformer model to the device classification job to overcome these obstacles. To begin, a learning model is developed to identify aberrant traffic based on self-attention method of the transformer-based model for feature representation. Our second step is to build a new classifier that can accurately categorise devices based on normal traffic, leveraging the inherent ability of transformer model to capture the overall temporal properties automatically. Further, a refined method that works on the outcomes of multi-round classifications to define the sustained communication behaviours of the IoT devices, which improves accuracy and resilience, is proposed. There are two key benefits of using this framework. To prevent the loss of feature information while modelling behaviour, the learning model may extract the features automatically, without the need for human feature engineering. Second, the classification findings are more reliable since abnormal behaviour information is removed from malicious traffic data at the data pre-processing step. Proposed work is comparable to that published in [4], where the authors investigated the impact of identifying device type and anomalous traffic data on IoT security. The authors handle IoT identifying device type identification, but they only care about the accuracy of identification during regular traffic and ignore the effect during anomalous traffic. Proposed approach is different from the others because of transformer-based model. In this part, a brief outline of the key findings is mentioned. Here, a strategy for improving device type recognition accuracy in heterogeneous IoT communications by filtering out noise introduced by anomalous activity is presented. Using the transformer model as inspiration, a 2-stage deep learning model for determining the type of Internet of Things (IoT) device is developed. This system first properly collects the time-based feature information for classification, and then it extracts regular traffic by pre-classification. This framework makes high-precision IoT device categorisation possible, which sidesteps the time-consuming feature selection technique required by conventional machine learning approaches. Using actual IoT traffic data from nine different types of devices, thorough device categorisation studies are conducted. By comparing the proposed approach against more conventional machine learning-based methods across several metrics, its efficacy and robustness are confirmed. This paper’s structure will continue as follows: Section 2 provides an outline of the cited references. Section 3 shows proposed procedure for classifying IoT devices in detail. In Sect. 4, experiments and analysis of the outcomes are presented. In the final section, we conclude our findings.
474
H. Sharma et al.
2 Literature Review In recent years, several approaches to IoT device identification have been presented, most of which are based on either machine learning (ML) or deep learning (DL). In order to classify IoT devices, Desai et al. [2] presented a method for ranking feature so as to screen them based on hypothesis testing. To categorise IoT devices, Gunes et al. [5] utilised a genetic algorithm to identify as well as filter important characteristics, and then they employed a number of machine learning (ML) methods, including decision trees (DT). Using potent feature engineering, Hsu et al. [6] identified device types using five well-known ML techniques: AdaBoost, decision trees, KNN, logistic regression (LR), and random forest. It turned out that for their purposes, RF was the most effective model. Taking into account the expense involved in feature engineering, Chakraborty et al. [7] suggested a new cross-entropy-based stochastic optimisation method for choosing optimal features. The authors then use Gaussian NB, RF, and SVM to categorise IoT devices. To categorise IoT devices into distinct categories, Sivanathan et al. [8] used an unsupervised clustering technique, as opposed to the aforementioned supervised approaches. The K-means clustering algorithm was used for classification, and the Elbow approach was used to determine how many clusters would be most effective. While ML-based approaches can be lightweight and suitable in certain cases, their success relies heavily on well-designed features. In order to avoid complex feature engineering, numerous researchers have recently formulated deep learning-based algorithms to categorise IoT devices. Known device type may be identified using supervised deep neural networks, whereas unknown device types can be classified using unsupervised clustering, as described by Bao et al. [9]. Liu et al. [3] offered a DL-based approach for classifying IoT devices. They also added the zero-bias layer to a densely linked CNN to make it more stable and easier to understand. ResNet was first developed by Luo et al. [10] for use in identifying the types of IoT devices, and they were able to get decent results without resorting to complex feature engineering. IoT device type detection was accomplished in a distributed setting by He et al. [11], who integrated DL-based approaches with federated learning. Liu et al. [12] recently conducted in-depth study on existing methods for recognising Internet of Things devices using machine learning and deep learning and offered a thorough assessment that analysed the benefits and drawbacks of various ML- and DL-based approaches. However, most of these approaches only account for typical IoT device traffic while disregarding atypical traffic, which leads to diminished efficiency in real-world network settings. The diagnosis of traffic has long been a focus of cybersecurity study, under several names including anomaly detection and the detection of hostile traffic. As ML and DL algorithms have progressed quickly in recent years, numerous ML- and DLbased approaches are presented to address the issue. The cutting-edge CorrAUC [13] method integrates TOPSIS and Shannon entropy to pick features. Four different ML techniques were used after feature selection to identify malicious Bot-IoT traffic. In order to identify DoS and DDoS assaults in the IoT, Hussain et al. [14] first
Identification of Device Type Using Transformers in Heterogeneous …
475
transformed traffic data into an image-like format and then used ResNet18, a popular CNN architecture from the past few years. Recently, Lin et al. [15] used a transformer, a popular NLP tool, to categorise encrypted communication and found promising results. Although the two studies tackle separate problems, they both sparked the idea for a transformer-based model that is used to boost speed when classifying IoT devices. Moreover, they offered some recommendations for picking suitable ML and DL models for certain jobs. Unsupervised traffic diagnosis has been the subject of several investigations, such as detecting IoT device types. To identify fraudulent data, IoT-Keeper [16] uses a fuzzy C-means (FCM) clustering algorithm on IoT data characteristics pre-screened using a feature selection approach based on correlation. A unique technique for detecting network intrusion is presented by Diallo et al. [17], and it relies on an adaptive clustering algorithm called ACID. The researchers also developed a real-world traffic dataset called N-aBIoT, which includes nine distinct IoT devices and primarily features ten assaults drawn from the Mirai and BASHLITE botnets. Sharafaldin et al. used well-known ML-based algorithms (including RF, NB, and LR) to detect DDoS assaults, and they also created a new DDoS dataset called CICDDoS2019. These traffic diagnostic techniques can reliably identify malicious traffic (often above 99), and they allow us to increase the accuracy of IoT device categorisation by mitigating the effect of abnormal traffic.
3 Proposed Work This section provides a comprehensive overview of the technique for determining the kind of Internet of Things device. Figure 1 is a flowchart depicting our suggested procedure. Pre-processing of incoming IoT traffic packets by extracting fundamental statistical data like mean and standard deviation of packet size is done. The traffic diagnosis component is then used to categorise the incoming data. Finally, the type of IoT devices is determined by the usual traffic, and the anomalous traffic is disregarded.
Fig. 1 Proposed method workflow
476
H. Sharma et al.
In transformer-based feature processing module, pre-processing of incoming IoT traffic packets by extracting fundamental statistical data like mean and standard deviation of packet size is done. The traffic diagnosis component is then used to categorise the incoming data. Finally, the sorts of IoT devices are determined by the usual traffic, and the aberrant traffic is disregarded. The traffic data diagnosis model aims to provide a system for labelling Internet of Things (IoT) traffic as either normal or abnormal. The high-level characteristics must be transformed into a two-dimensional vector space in order to be used. Traffic diagnosis model initially uses flatten and linear projection operations to modify the value produced by a feature processing module based on transformer. A transformer-based module is applied in the device type identification model. Proposed device type identification model is trained using only normal traffic data, as opposed to anomalous traffic data, on the theory that only normal traffic data will include the features of various IoT devices. It is important to note that the transformer-based feature processing parameters used by each model are unique and that the two models are trained independently. Pre-processed information (x) about traffic generated by Internet of Things devices is the input. Positional encoding, linear projection, layer normalisation, multi-head attention, and feed-forward network position-wise are the major components of this module. Insights gained from this data can be utilised to determine traffic patterns and classify electronic gadgets.
4 Experiment and Result Analysis The entire process of the proposed algorithm for identifying the IoT device types is assessed and compared with other methods like naive Bayes (NB), support vector machine (SVM) [2, 7], convolutional neural network (CNN) [3, 4, 10, 18, 19], and Knearest neighbours (kNN) [9]. All of these strategies were tested on the first iteration of the test dataset during training. In Table 1, diagonal of the confusion matrix holds the majority of the elements. That is to say, the vast majority of traffic statistics are separated accurately. At the same time, it can be observed that in the worst situation, almost 11% of the Type 5 device were mistaken for the Type 4 device. Because both categories of IoT devices are cameras made by the same companies, some degree of confusion is to be expected. Popular metrics for measuring the success of a classification model include the F1-score and accuracy. To completely evaluate model performance, both precision and recall values are calculated in Table 1 using the formulas mentioned below. Precision: Total number of true positives out of all the positive predictions that are made. It can be calculated as: Precission =
True Positive True Positive + False Positive
Identification of Device Type Using Transformers in Heterogeneous …
477
Table 1 Confusion matrix of our proposed workflow 0 0
1
2
92
0
2
3
4
0
5 1
6
4
0
7
8 0
1
1
0
93
0
0.2
0
0.8
6
0
0
2
0
0
98
0
1
1
0
0
0
3
1
1
0
96
0
0
2
0
0
4
0
1
1
0
94
3
0
1
0
5
0
0
2
1
11
85
0
1
0
6
0
1
0
0
1
0
98
0
0
7
0
1
2
0
1
0
0
95
1
8
0
1
0
0
0
0
0
2
97
Recall: Total number of correctly predicted positives out of all the actual positives. It can be calculated as: Recall =
True Positive True Positive + False Negative
Here, true positive (TP): a true positive occurs when a model accurately predicts the positive class. False positive (FP): a false positive occurs when a model wrongly predicts the positive class. False negative (FN): a false negative when a model predicts the negative class inaccurately. If the formula for precision and recall is compared, both seem to be similar. The only difference is the second term of the denominator, where it is false positive for precision but false negative for recall. F1-score is the harmonic mean of precision and recall for a more balanced summarisation of model performance. It is calculated in Table 1 using the formula: F1 - score = 2 ∗
Precision × Recall Precision + Recall
In the case of multi-class classification, macro-F1-score calculation is adopted. The macro-F1-score is computed in Table 2 using the arithmetic mean of all the per-class F1-scores. Proposed approach excels in terms of the macro-F1-score measure. When compared to CNN, the macro-F1-score is improved by 32.97%, as shown in Fig. 2. Accuracy can be calculated using the formula: Accuracy =
TP + TN TP + TN + FP + FN
In multi-class setting when test data may only belong to one class and not have multilevel, TP = TN because there is no TN in multi-class classification.
478
H. Sharma et al.
Table 2 Calculation of macro-F1-score Device type
TP
FP
FN
Precision
Recall
F1-score
0
92
8
1
0.92
0.989247312
0.953367876
1
93
7
5
0.93
0.948979592
0.939393939
2
98
2
7
0.98
0.933333333
0.956097561
3
96
4
1.2
0.96
0.987654321
0.973630832
4
94
6
15
0.94
0.862385321
0.899521531
5
85
15
8.8
0.85
0.906183369
0.877192982
6
98
2
8
0.98
0.924528302
0.951456311
7
95
5
4
0.95
0.95959596
0.954773869
8
97
3
2
0.97
0.97979798
0.974874372
Macro F1-score
0.942256586
ML Model
Proposed
94.2
CNN
61.23
KNN
60.12
SVM
24.9
NB
17.5 0
20
40
60
80
100
Macro F1-Score (%)
Fig. 2 Comparison of the macro-F1-score of proposed and other approaches
So, ModifiedAccuracy =
TP TP +
1 2 (FP
+ FN)
The average accuracy is computed in Table 3 using the arithmetic mean of all the per-class accuracy. As shown in Fig. 3, the accuracy of proposed approach is far higher than that of any competing approach. In particular, compared to CNN, which is considered to be the best existing ML and DL methods, proposed approach improves accuracy by 30.96 percentage points.
Identification of Device Type Using Transformers in Heterogeneous …
479
Table 3 Calculation of average accuracy Device type
TP
FP
FN
Accuracy
0
92
8
1
0.953368
1
93
7
5
0.939394
2
98
2
7
0.956098
3
96
4
1.2
0.973631
4
94
6
5
85
15
6
98
2
15
0.899522
8.8
0.877193
8
0.951456
7
95
5
4
0.954774
8
97
3
2
0.974874
Average accuracy
0.942257
ML Model
Proposed
94.2
CNN
63.24
KNN
62.57
SVM
26.77
NB
20.45 0
20
40
60
80
100
Accuracy(%)
Fig. 3 Comparison of the accuracy of proposed and other approaches
Macro-F1-score and accuracy can be same when the dataset is balanced, but it is important to remember that the numbers relate to different aspects of performance and are not the same. For instance, macro-F1-score and accuracy produce a score of 0.94. For accuracy, this means that 94% of the predictions were correct, but for macroF1-score, the harmonic mean of precision and recall is 0.94. It indicates that model performance is good but not the same. Even proposed technique is a bit more sluggish than NB, SVM, CNN, and KNN, its accuracy and F1-score are significantly higher. As a result, this technique is financially viable for widespread implementation and outperforms SVM and KNN in the form of F1-score and accuracy while also being more efficient.
480
H. Sharma et al.
5 Conclusion This research provides a unique transformer-based method for determining the kind of Internet of Things device. It is presumed that only normal traffic data embrace the features of various IoT devices compared to abnormal traffic data, even though there is both normal and abnormal traffic involving IoT devices in real-world settings. Therefore, a transformer-based traffic diagnosis procedure is carried out before identifying the type of device. To further ensure the viability of our suggested strategy and the accuracy of our assumption, many experiments were performed. In context with accuracy as well as macro-F1-score, our suggested method outperforms previous ML and DL approaches. However, there are limitations of proposed approach in this work. Collecting training samples labelled as normal and abnormal with the device type is challenging. This means that many popular datasets are unusable for research purposes. Secondly, the approach is very difficult to be directly used by time lag sensitive company due to the fact that achieving high accuracy necessitates the long-term accumulation of behavioural data.
References 1. Liu C, Feng W, Tao X, Ge N (2022) MEC-empowered non-terrestrial network for 6G wide-area time-sensitive internet of things. Engineering 8:96–107 2. Desai BA, Divakaran DM, Nevat I, Peter GW, Gurusamy M (2019) A feature-ranking framework for IoT device classification. In: 2019 11th COMSNETS. IEEE, pp. 64–71 3. Liu Y, Wang J, Li J, Song H, Yang T, Niu S, Ming Z (2020) Zerobias deep learning for accurate identification of Internet-of-Things (IoT) devices. IEEE Internet Things J 8(4):2627–2634 4. Salman O, Elhajj IH, Chehab A, Kayssi A (2022) A machine learning based framework for IoT device identification and abnormal traffic detection. Trans Emerg Telecommun 33(3):e3743 5. Aksoy A, Gunes MH (2019) Automated IoT device identification using network traffic. In: ICC 2019. IEEE, pp. 1–7 6. Hsu A, Tront J, Raymond D, Wang G, Butt A (2019) Automatic IoT device classification using traffic behavioral characteristics. In: SoutheastCon, pp. 1–7 7. Chakraborty B, Divakaran DM, Nevat I, Peters GW, Gurusamy M (2021) Cost-aware feature selection for IoT device classification. IEEE Internet Things J 8:11052 8. Sivanathan A, Gharakheili HH, Sivaraman V (2019) Inferring IoT device types from network behavior using unsupervised clustering. In: 2019 IEEE 44th conference on LCN. IEEE, pp. 230– 233 9. Bao J, Hamdaoui B, Wong W-K (2020) IoT device type identification using hybrid deep learning approach for increased IoT security. In: 2020 IWCMC. IEEE, pp. 565–570 10. Luo Y, Chen X, Ge N, Lu J (2021) Deep learning based device classification method for safeguarding internet of things. In: 2021 GLOBECOM. IEEE, pp. 1–6 11. He Z, Yin J, Wang Y, Gui G, Adebisi B, Ohtsuki T, Gacanin H, Sari H (2021) Edge device identification based on federated learning and network traffic feature engineering. IEEE Trans Cogn Commun Netw 8(4):1898–1909 12. Liu Y, Wang J, Li J, Niu S, Song H (2022) Machine learning for the detection and identification of internet of things devices: a survey. IEEE Internet Things J 9(1):298–320 13. Shafiq M, Tian Z, Bashir AK, Du X, Guizani M (2020) Corrauc: a malicious bot-iot traffic detection method in iot network using machine learning techniques. IEEE Internet Things J 8:3242
Identification of Device Type Using Transformers in Heterogeneous …
481
14. Hussain F, Abbas SG, Husnain M, Fayyaz UU, Shahzad F, Shah GA (2020) IoT DoS and DDoS attack detection using ResNet. In: 2020 IEEE 23rd INMIC. IEEE, pp. 1–6 15. Lin X, Xiong G, Gou G, Li Z, Shi J, Yu J (2022) Et-bert: a contextualized datagram representation with pre-training transformers for encrypted traffic classification. Proc ACM Web Conf 2022:633–642 16. Hafeez I, Antikainen M, Ding AY, Tarkoma S (2020) IoT-KEEPER: detecting malicious IoT network activity using online traffic analysis at the edge. IEEE Trans Netw Serv Manag 17(1):45–59 17. Diallo AF, Patras P (2021) Adaptive clustering-based malicious traffic classification at the network edge. In: INFOCOM 2021. IEEE, pp. 1–10 18. Meidan Y, Bohadana M, Mathov Y, Mirsky Y, Shabtai A, Breitenbacher D, Elovici Y (2018) N-BaIoT: network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervas Comput 17(3):12–22 19. Sinha A, Gulshan S, Prabhat K, Deepak G (2022) A community-based hierarchical user authentication scheme for Industry 4.0. Softw Pract Exp 52.3:729–743
A Novel Approach for Effective Classification of Brain Tumors Using Hybrid Deep Learning Ananapareddy V. N. Reddy, A. Kavya, B. Rohith, B. Narasimha Rao, and L. Harshada
Abstract Brain tumor is an abnormal growth of unwanted cells in human brain. They can develop in any part of brain and can ultimately affect the human lifespan. Detection of brain tumors as early as possible is essential. Out of all available techniques, MRI has become very useful medical diagnostic tool in works related to brain tumor detection and classification. Data acquisition, preprocessing, segmentation, feature extraction, and classification is the work flow which we followed. BraTS2015 dataset is used, and firstly, an improved median filter is utilized to enhance the input MRI image. U-net architecture is used for image segmentation. Then, some major features which are based on loop, median binary pattern (MBP), and modified local Gabor directional pattern (LGDip) are extracted. Ultimately, we present a hybrid deep learning approach consisting of deep belief networks (DBN) and Bi-LSTM to categorize tumors accordingly. We subsequently used a hybrid optimization technique, that is, blue monkey extended bald eagle optimization (BMEBEO). Our proposed model outperformed previously used techniques with maximal results. Keywords Brain tumor · MRI images · Deep learning · DBN · Bi-LSTM · Blue monkey · Bald eagle
1 Introduction Human brain plays the most important role in controlling the memory, emotion, touch, vision, and decision making [1]. Nervous systems are very essential to protect from any harm and illness in the brain. Also, it is a well-known fact that brain is the most complex structure in the human body [2] and made up of billions of cells. A. V. N. Reddy (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] A. Kavya · B. Rohith · B. N. Rao · L. Harshada Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_41
483
484
A. V. N. Reddy et al.
A brain tumor is something that arise due to mass or rapid growth of abnormal cells or tissue in the brain, where it should not be there [3]. Brain tumors are one of the most dangerous diseases that can directly affect the human lives [4]. There are different types of tumors. Mostly the brain tumors are categorized into various types based on their origin. Some tumors are benign (non-cancerous), and some are malignant (cancerous). Those tumors which origins are in brain are called as primary tumors. Otherwise it is called as the secondary tumor [metastatic tumors] those can originate anywhere in the body and later migrate to brain [5]. The brain tumor can be in any shape, size and located in any location of the brain tissue and appear in various picture intensities. Considerably, Gliomas are the most common or frequently occurring type, and it has the highest mortality rate. The magnetic resonance imaging (MRI) is the powerful tool for identification and diagnosis of the tumor in different imaging modalities like SPECT, X-ray, EEG, ultrasonography, PET, CT, and MEG. MRI can actually give way more clear images than other techniques like ultrasound, CT scan, and X-ray [5]. So, magnetic resonance imaging (MRI) has become the mostly used tool by the radiologists to analyze brain tumors. Generally, brain tumor detection means identifying the effected part in the brain with shape, size, and the pattern of the tumor in the tissues of the brain [6]. In the overall brain tumor detection process, we followed five stages: data acquisition, preprocessing, segmentation, feature extraction, and classification. Data acquisition is the initial step where we collect the BraTS2015 dataset from the Kaggle and separate the datasets as normal brain images and abnormal (tumor) images. Preprocessing and segmentation are the important steps in the image analysis. In preprocessing, we use the median filters, from that we can get more clear images with less noise MRI. Segmentation deals with separating the tumor portion from the normal tissues like white matter, gray matter, and cerebrospinal fluid, and this process will be done by using proper software aid. Here we use the U-net architecture to perform the classification on every pixel so that input and output share the same size [7]. Tumor size, loop, modified local Gabor directional pattern (LGDiP), and median binary pattern (MBP) are the majorly considered features. Here we extract all the corresponding values of features for classification of tumor type. Ultimately, we classify the tumors with the proposed hybrid deep learning model which involves DBN and BI-LSTM algorithms. These algorithms are helpful to get the better accurate results while compared to other algorithms like SVM, KNN, Naive Bayes, decision tree, etc. And to optimize the obtained results, we make use of a hybrid optimization algorithm, that is, (BMEBEO) blue monkey extended bald eagle optimization.
A Novel Approach for Effective Classification of Brain Tumors Using …
485
2 Related Work In 2021, Sabitha et al. [8] the author applied a system for classifying MRI brain tumors from the images of the human brain into normal, benign, and malignant tumors. Preprocessing and segmentation, feature extraction and feature reduction, and classification are the major steps involved in this work. Preprocessing and segmentation were developed in the initial step utilizing the threshold function. The features were obtained in the second stage utilizing a discrete wavelet transformation (DWT) specific to MR images. Principal component analysis (KPCA) was performed in the third stage to compress the magnetic resonance imaging features to their most crucial components. The final stage was the classification stage, where a classifier called KSVM was used to categorize the brain tumor’s infecting region. The findings experimentally achieved good accuracy and distinguished between normal and abnormal tissues in the brain MR images. Comparing the proposed method to many existing frameworks, it was efficient at finding tumors. In 2022, Singh et al. [9] hybrid deep neural network (H-DNN) is used to classify the brain tumors efficiently. Initially, deep neural network (DNN-1) is used for the spatial texture information of the cranial magnetic resonance (MR) images, whereas in the second method deep neural network (DNN-2) is being the frequency domain information of the MRI scans. Prediction scores of the both neural networks are fused together to get the proper classification result. Comparing many other previously done works, it achieved better results in terms of accuracy, sensitivity, and specificity. The proposed system is actually much more efficient when compared to other works; it also dealt with real-time MRI dataset. But dealing with hybrid deep neural networks is itself a complex task. In 2021, Khairandish et al. [10] suggested hybrid model integrated threshold-based segmentation and threshold-based classification (CNN and SVM) for classification and SVM for detection. All brain MRI scans will be divided into benign and malignant tumors based on CNN’s superior quality on a publicly accessible dataset. The hybrid CNN-accuracy SVM’s score is 98.49%. In 2021, Irmak et al. [11] developed a model for multi-classification of brain tumors. In this, various CNN models have been used, where almost all parameters are tuned using grid search. Three CNN models have been utilized to classify three brain tumors, where the first model deals with normal classification by means of publicly medical image data. Classifying into glioma, meningioma, pituitary, metastatic, and normal brain is the second model, and the ultimate model deals with classifying into grade II, grade III, and grade IV. The results were proved to be better in every model. In 2020, Gokila Brindha et al. [12] used a self-defined artificial neural network (ANN) along with convolutional neural network (CNN) to detect the brain tumors. The used model is integrated with an optimization technique, namely Adam and a binary cross-entropy loss function. Different steps were followed according to the used technique. Initially, ANN was applied and same dataset was given to CNN technique. CNN actually made it easier to predict the tumor by reducing size of image without losing required information, and ANN helped to achieve good accuracy. Overall, it is a neat procedure, but accuracy depends on input image data. Also, it is a trial and error method. In 2020, Naser et al.
486
A. V. N. Reddy et al.
[13] used a deep learning approach that involves U-net-based CNN for segmentation and also a transfer learning based on pre-trained convolution base of VGG-16, and a fully connected classifier was also used for tumor grading. This work illustrates the potential of using deep learning in MRI images. Obtained accuracy, sensitivity, and specificity are 0.89, 0.87, and 0.92, respectively. In 2016, Zhang et al. [14] used a new technique called as Glioblastoma Multiforme Prognosis prediction as multiple kernel machine and a minimum redundancy feature selection method. This proposed method was efficient to be learnt and extracting all the features, but accuracy was not promised. In 2020, Chaudary et al. [15] developed a method using K-means clustering and DWT. Initially, a clustering-based method was used to segment the image and then to detect the tumors SVM is applied. It achieved a robust result with 94.6% accuracy.
3 Methods and Material This particular section describes the overall work flow of our study. Below flowchart depicts the stages involved (Fig. 1). A. Data Acquisition This is the first step which involves collection of MRI data where we used BRATS2015 dataset. B. Data Preprocessing Our input data is in the form of MRI images. The main objective of this stage is to eliminate the unwanted noises in MRI image and enhancing the available input image. For this purpose, we used improved median filter. Improved Median Filter Basically, median filter [16] is a nonlinear filter and it is widely used in image filtering processes. It has good edge keeping characteristics. For enhancing our MRI images, we combined median filter with average filter which is termed as improved median filter. Let us consider that our input MRI image is Inim and preprocessed image as Ppim (Fig. 2). Step 1: Take an image and a n × n square or cross-mask where the center of the mask overlaps with a pixel on image and it goes over and reaches central element (x,y) Step 2: The values of mask’s associated pixels can be obtained by: 1. Calculate weighted geometric mean GMw of mask. 2. Then combine obtained GMw with every pixel and check whether e (x,y) > GMw or not,
A Novel Approach for Effective Classification of Brain Tumors Using … Data Accqusition
Pre-Processing
Segmenation
FeatureExtraction
BILSTM+ DBN
Classification
Results Fig. 1 Proposed workflow
Fig. 2 Input MRI image and enhanced image
MRI Dataset.
Improved Median Filter
U-NET Architecture
Tumor size, Loop, Modified Local Gabor Directional
Optimization via BMEBEO Hybrid optimizer
487
488
A. V. N. Reddy et al.
If yes, then e (x,y) = median If not, keep the pixels original value as it is. Step 3: Repeat the above steps until, x = y = n.ssxsz C. Segmentation Segmentation in general means to isolate the required things or features. In brain, tumor is located along with other constituents like white matter, gray matter, and cerebrospinal fluid for instance. So, segmentation helps to identify the tumor’s shape, location, and position much more accurately. U-net architecture [17] is being used in our study for this purpose. Semantic segmentation is something where we not only predict the object but also can create a mask that shows where on the image that specific object is located along with dimensions. It has two main parts that are encoder and decoder and is also termed as expanded path and contracting path. Contracting path represents the standard CNN design. Every level in encoder consists of two 3 × 3 convolutional layers each followed by a radioactivation unit. The transition between those layers is handled by a 2 × 2 max pooling unit with a stride of 2 for down sampling. Every level that we go to bottom of u-shape, the size of input halves and the channels doubles. Decoder part in U-net responsible to determine the location of object (tumor) in the image. It is same as encoder but with a slight variation. 2 × 2 transposed convolution takes place to up sample the condensed image. Cropping is required as there is a loss of boundary pixels in each convolution. After this step, we have our segmented tumor image as follows (Fig. 3).
Fig. 3 Input MRI, enhanced MRI, segmented image
A Novel Approach for Effective Classification of Brain Tumors Using …
489
D. Feature Extraction To classify the tumor, we need to rely on some factors/features. So, extracting the required feature set from the tumor image is crucial. It involves tumor size, modified LGDiP, and MBP from segmented image. Determining required features is as follows: 1. Tumor size: Brain tumors all are not of similar type so is the size. They can be in any size. Pathology and radiologist’s experience and image analysis consider intensity and shape as major factors. Border of tumor is referred as area feature and denoted by S given in Eq. (1). S=
π (Hx w) 4
(1)
where H = length, w = tumor width. 2. LOOP: Loop is a grouping of LDP and LBP that addresses the problems while securing LDP and LBP strengths. Scale independence can be achieved by an adaption of LOOP; it is denoted by L and represented in Eq. (2). L = LOOP(ro , so ) =
7 z h j − h o · 2wj
(2)
j=0
where (r 0 , s0 ) = pixel location, h0 = center pixel. 3. Median Binary Pattern: MBP is same as LBP but instead of using central pixel, it utilizes median value within image for more robustness and microstructure sensitivity. MBP is defined as follows in Eq. (3). MBP P,R (i, j ) =
2k E(xk − ω(i, j ))
(3)
k∈N p(i, j )
where ω(i, j ) = median which implies that final binary pattern will contain at least (|M P |) bits, and therefore, there are only 2|M P |−1 viable binary patterns. 4. Modified LGDiP [18]: While configuring image with Gabor filters, segmented image’s first Gabor form is created. Mixture of segmented image s (x, y) with Gabor filter is defined in Eq. (4). Hτ e (x, y, ∂, o) = e(x, y) ∗ t,l (𝚿)
(4)
where O = scales, o ∈ {0, 1, . . . , 4}, ∂ = orientation, ∂ ∈ {0, 1…, 7}. LDP is calculated after applying Kirsch masks m0 … m7 . mi =
1 1 0=−1 k=−1
F(x + 0, y + k) × Ji (o, k)
(5)
490
A. V. N. Reddy et al.
LDP is defined as LDP X,Y (m o , . . . , m 7 ) = S I (x, y) =
7
S I (m i − m k ) ∗ 2i
(6)
i=0
1 if x > 0 0 otherwise
(7)
According to proposed method, Harmonic value is calculated as H M + mid 2 n = n 1 , n = 8
mk = HM
(8)
i=i m i
mid = median The mean and median of overall mask values are helpful to determine most prominent response mk .
4 Description on Proposed Hybrid Classifiers After segmentation and proper extraction of features were done, we need to classify the tumor. It is based on the extracted feature set F Im . We are proposing a hybrid model that includes both DBN and Bi-LSTM algorithms. It works as follows. Extracted feature set is given to both the classifier algorithms simultaneously. Then the obtained results from both classifiers will be tuned to arrive at a conclusion with the help of newly proposed optimization technique BMEBEO. A. Bi-LSTM Model Bidirectional long short-term memory [19] is a sequence processing model that involves 2 LSTMs: Here the input is given from both the directions, that means, one taking the input in forward direction and another in backward. Instead of conventional unidirectional LSTM, this bidirectional LSTM takes the whole available information into consideration from both sides. Calculation of the output using Bi-LSTM proceeds as follows: Hidden output of the forward layer is represented as Bt. To store the records, conventional LSTM uses three special gates. Present input and hidden layer outputs influence the cell’s output state. An input gate examines whether the current input transformed to state ut and the forget gate Gt determines. Whether Ot −1 was kept, also, the output gate Dt determines whether Bt −1 was sent to subsequent cell or not. It is a probable option to update memory cell.
A Novel Approach for Effective Classification of Brain Tumors Using …
491
Present hidden state and LSTM cell Ot can be calculated as follows: Ct = l(Q C u 1 ) + JC Bt + RC Ot=1 + sc
(9)
Dt = l(Q D u t ) + J D Bt + R D Ot=1 + s D
(10)
G t = l(Q G u t ) + J D Bt + R D Ot=1 + sG
(11)
It = l(Q I u t ) + JI Bt + R I Ot=1 + s I
(12)
Ot = G t ∗ Ot + C i ∗ I t
(13)
Bt = Dt ∗ tan B(Ot )
(14)
The complete output for Bi-LSTM can be calculated as follows: Bt = l(Z B [Bt , Bt ] + sk )
(15)
where t = time, ut = current input, Bt −1 = old hidden state, and Ot −1 = old output state. The output of Bi-LSTM is represented as Bi-LSTMout . B. DBN Model Deep belief network (DBN) [20] is a well-known unsupervised deep learning technique, which is composed of stochastic latent variables of multi-layers. It is a unique hybrid graphical representation. These networks mostly address the problems associated with classical neural networks. They take any value within a specific range with some probability. Actual workflow A greedy approach has been utilized to train DBN. These networks get benefit from the greedy learning as they build weights layer by layer. Initially, we train the property layer that can directly gain pixel input signals. Then these networks learn the features of given input. First, in the top two hidden layers, we run numerous steps of Gibbs sampling. By the completion of this step, it effectively extracts a sample from it. Afterward, a sample will be generated from the visible units using a pass of previous sampling throughout the model. Ultimately, a bottom-up pass is utilized to infer values of latent variables in every layer. Generated data vector is used as top input in greedy training application. Generating weights are then added by fine-tuning in opposite direction. Final output is represented as DBNout . Also, output of both classifiers can be represented as (Fig. 4)
492
A. V. N. Reddy et al.
DBN Hidden Layer
Visible Layer
Fig. 4 Proposed classification model
Fout = Bi − LSTMout ⊕ DBNout .
5 Proposed BMEBEO Algorithm As we already discussed, our proposed BMEBEO [21] algorithm optimizes the obtained output of both Bi-LSTM and DBN. Our proposed optimization algorithm is a blend of both blue monkey and bald eagle optimization algorithm. That means, the working of blue monkey is integrated with bald eagle. Overall workflow can be categorized as follows: 1. Choosing the search space 2. Looking inside the selected swooping 3. Search space. Each step can be illustrated as follows:
A Novel Approach for Effective Classification of Brain Tumors Using …
493
Select stage: Initially, we encounter the bald eagle algorithm. Based on the amount of the food available, a bald eagle identifies and chooses the best hunting area within the selected search space. This is an important step for further proceedings. It is represented quantitatively as follows: Ti,new = Tbest + φ ∗ r (Tmean − Ti )
(16)
φ = parameter for managing positional shifts, Tbest = search space that bald eagles chose based on an optimal location, and r = random number. The bald eagles mostly rely on the early phases and locations. That means, they choose a location based on data available from preceding phase, and this continues. Current motion of bald eagles is calculated by dividing the previously searched data by (theta). Search Stage: This is the second and significant stage. In this search stage, bald eagles fly in a spiral shape inside the previously selected search zone. Ti,new = Ti + y(i) ∗ (Ti − Ti+1 ) + x(i) ∗ (Ti − Tmean ) yr (i ) xr (i) x(i ) = max(|xr |) and y(i ) = max(|yr |) xr (i ) = r (i ) ∗ sin(θ (i )) and yr (i ) = r (i ) ∗ cos(θ (i )) q(i) = a ∗ p ∗ rand r (i) = a ∗ R ∗ rand
(17)
A B C D
R = random number of cycles. θ = parameter that determines the corner among point searches. Best position for swoop in BES is as follows in Eq. (18), Ui+1 = Ui + Ratei+1 ∗ rand
(18)
Ratei+1 = (0.7 ∗ Ratei ) + (Vleader − Vi ) ∗ rand ∗ (Ubest − Ui )
(19)
Swooping stage: Bald eagles swoop out from best location in search area to their prey during this stage. Moreover, all points converge on an ideal point. The behavior in swooping stage is illustrated as follows: Ti,new = round ∗ Tbest + xl(i ) ∗ (Tt − Al ∗ TMean ) + yl(i ) ∗ (Ti − A2 ∗ Tbest ) (20) Here, x(i ) =
yr (i) xr (i) and y(i ) = max(|xr |) max(|yr |)
xr (i ) = r (i) ∗ sin h(θ (i )) and yr (i ) = r (i ) ∗ cos h(θ (i ))
494
A. V. N. Reddy et al.
θ (i ) = a ∗ p ∗ rand r (i ) = a ∗ R ∗ rand A1, A2 ∈ [1, 2] Swoop equation’s point movements will get directed toward the best spot when the parameters were altered. Population’s mean can help to achieve the best solution. Ti,new = rand ∗ Tbest + xl(i ) ∗ (Ti − Al ∗ TMean ) + yl(i ) ∗ (Ti − Al ∗ Tbest ) + BM (21) Additionally, the cycle crossover is predetermined to update the obtained solution according to proposed idea. In order to create an offspring out of its parents in which every slot is occupied by an element from a distinct parent, this process is termed as cycle crossover (CX). Pseudocode of proposed BMEBEO algorithm is as follows: Step1: Initiate point i in population. Step2: Calculate initial point fitness value: e (Ti) Step3: while Termination requirements are not satisfied. Select space for each point i in population, T i,new = T best + φ * r(T Mean − T i ) if e(T new ) < e(T i ) T i = T new if e(T new ) < e(T best ) T best = T new end if end if End for Improved search space update is done as per (19) Swoop space Improved swooping stage update is done as per (21) End while.
6 Results and Analysis A. Simulation Procedure Our proposed optimization model is BMEBEO, which is used for classification of tumors using MRI more efficiently. BMEBEO method was compared with other
A Novel Approach for Effective Classification of Brain Tumors Using …
495
methods like arithmetic optimization algorithm (AOA), sparrow search algorithm (SSA), bald eagle search (BES), blue monkey optimization (BMO), and others, respectively. To calculate the classification performance, we will take some evaluations like statistical indices: accuracy, specificity, precision, FNR, NPV, FPR, F-measure, sensitivity, and MCC. B. Performance evaluation according to positive measure The results of the suggested BMEBEO approach and traditional approach are displayed in Fig. 5. C. Performance evaluation according to other measures The results of the suggested BMEBEO approach and traditional approach are displayed in Fig. 6. D. Analysis on classifiers See Table 1. E. Convergence analysis See Fig. 7.
Fig. 5 Evaluation of the proposed BMEBEO model performance in comparison with the conventional approaches. a Accuracy, b precision, c sensitivity, and d specificity
496
A. V. N. Reddy et al.
Fig. 6 Evaluation of the proposed BMEBEO model performance in comparison with the conventional approaches. a F-measure, b MCC, and c NP Table 1 Analysis on classifier WHHO
R-CNN
SVM
Bi-GRU
CNN
RNN
NN
BMEBEO + Hybrid
Accuracy
0.891356 0.90878
FNR
0.079221 0.060617 0.08825
0.236135 0.162122 0.170842 0.133874 0.043496
Sensitivity 0.920779 0.939383 0.91175
0.763865 0.837878 0.829158 0.866126 0.966467
0.895937 0.756422 0.78486
MCC
0.761758 0.799493 0.771775 0.48749
FPR
0.162187 0.146812 0.1341
0.775202 0.820365 0.937161
0.538384 0.518476 0.611901 0.745604
0.258036 0.303308 0.314069 0.258036 0.14168
F-measure 0.916242 0.929988 0.919868 0.805471 0.829474 0.821362 0.858939 0.958677 Precision
0.91175
NPV
0.853188 0.885696 0.837813 0.617946 0.72099
0.920779 0.928131 0.851871 0.821236 0.813712 0.851871 0.95276
Specificity 0.837813 0.853188 0.8659
0.708173 0.763865 0.877512
0.741964 0.696692 0.685931 0.741964 0.85732
A Novel Approach for Effective Classification of Brain Tumors Using …
497
Fig. 7 Comparing BMEBEO’s performance w.r.t existing techniques
7 Conclusion and Future Scope Our proposed methodology aims to provide an efficient classification system for brain tumors. Our entire work is categorized into five stages. Initially, for the data acquisition, MRI dataset is collected from Internet. To enhance the MRI images, an improved median filter has been used to preserve edges for easy workflow. U-net architecture has been utilized for segmentation step. Also, tumor’s size, loop, MBP, and modified LGDiP-based features are extracted. Lastly, to classify the tumors we used a hybrid model that includes Bi-LSTM and DBN. Followed by a hybrid optimization algorithm, BMEBEO has been used to tune the results of both classifiers. Our proposed technique achieved comparatively efficient results in terms of accuracy, precision, sensitivity, F-measure, etc. In the future, we can design a user-friendly computerized software integrating our proposed algorithm so that doctors can work easily and efficiently.
References 1. Hashemzehi R, Mahdavi SJS, Kheirabadi M, Kamel SR (2020) Detection of brain tumors from MRIimages base on deep learning using hybrid model CNN and NADE. Sci Direct 40:1225–1232 2. Khairandish MO, Sharma M, Jain V, Chatterjee JM, Jhanjhi NZ (2021) A hybrid CNN-SVM threshold segmentation approach for tumor detection and classification of MRI brain images. IRBM 43:290 3. Ghahfarrokhi SS, Khodadadi H (2020) Human brain tumor diagnosis using the combination of the complexity measures and texture features through magnetic resonance image. Biomed Signal Process Control 61:102025 4. Ari A, Hanbay D (2018) Deep learning based brain tumor classification and detection system. Turkish J Electr EngComput Sci 26(5):2275–2286 5. https://www.aans.org/en/Patients/Neurosurgical-Conditions-and-Treatments/Brain-Tumors
498
A. V. N. Reddy et al.
6. Islam MK, Ali MS, Miah MS, Rahman MM, Alam MS, Hossain MA (2021) Brain tumor detection in MR image using superpixels, principal component analysis and template based K-means clustering algorithm. Mach Learn Appl 5:100044 7. Kesav N, Jibukumar MG (2021) Efficient and low complex architecture for detection and classification of brain tumor using RCNN with two channel CNN. Comput Inform Sci 34:6229 8. Sabitha V, Nayak J, Ramana Reddy P (2021) MRI brain tumor detection and classification using KPCA and KSVM. Mater Today Proc 9. Singh M, Shrimali V (2022) Classification of brain tumor using hybrid deep learning approach. Broad Res Artif Intell Neurosci 13(2):308–327. https://doi.org/10.18662/brain/13.2/345 10. Khairandish MO, Sharma M, Jain V, Chatterjee JM, Jhanjhi NZ (2021) A hybrid CNN-SVM threshold segmentation approach for tumor detection and classification of MRI brain images. IRBM, 2021 11. Irmak E (2021) Multi-classification of brain tumor MRI images using deep convolutional neural network with fully optimized framework. Iran J Sci Technol Trans Electr Eng 45:1015–1036. https://doi.org/10.1007/s40998-021-00426-9 12. Gokila Brindha P et al (2021) Brain tumor detection from MRI images using deep learning techniques. IOP Conf Ser Mater Sci Eng 1055:012115: https://doi.org/10.1088/1757-899X/ 1055/1/012115 13. Naser MA, Jamal Deen M (2020) Brain tumor segmentation and grading of lower-grade glioma using deep learning in MRI images. Comput Biol Med 121:103758. ISSN 0010-4825. https:// doi.org/10.1016/j.compbiomed.2020.103758 14. Zhang Y, Li A, Peng C, Wang M (2016) Improve glioblastoma multiforme prognosis prediction by using feature selection and multiple kernel learning. In: IEEE/ACM transactions on computational biology and bioinformatics, vol 13, no 5, pp 825–835. https://doi.org/10.1109/ TCBB.2016.2551745 15. Chaudhary A, Bhattacharjee V (2020) An efficient method for brain tumor detection and categorization using MRI images by K-means clustering & DWT. Int J Inf Technol 12:141–148. https://doi.org/10.1007/s41870-018-0255-4 16. Zhu Y, Huang C (2012) An improved median filtering algorithm for image noise reduction. Phys Proc 25:609–616. https://doi.org/10.1016/j.phpro.2012.03.133 17. https://paperswithcode.com/paper/u-net-convolutional-networks-for-biomedical 18. Zahid Ishraque SM, Hasanul Banna AKM, Chae O (2012) Local gabor directional pattern for facial expression recognition 19. Zheng X, Chen W (2021) An attention-based Bi-LSTM method for visual object classification via EEG. Biomed Signal Process Control 63:102174 20. https://medium.datadriveninvestor.com/deep-learning-deep-belief-network-dbn-ab715b 5b8afc 21. Alsattar HA, Zaidan AA, Zaidan BB (2019) Novel meta-heuristic bald eagle search optimisation algorithm
An Application of Multilayer Perceptron for the Prediction of River Water Quality Rozaida Ghazali, Norfar Ain Mohd Fuzi, Salama A. Mostafa, Umar Farooq Khattak, and Rabei Raad Ali
Abstract This study examines the use of an intelligent system to predict the quality of water flowing through the Johore Rivers. A neural network model of multilayer perceptron has been developed to predict the quality of water in three rivers, namely the Bekok River, Sayong River, and Johor River. The model has been trained and tested with the values of the pH parameters as input variables. The neural network model’s performance has been assessed using three performance metrics: mean squared error, CPU time, and prediction accuracy. By analyzing the relationship between the residuals and the model values, it will be possible to assess the model’s suitability. Based on the results from the implementation of the multilayer perceptron model, a method is developed for forecasting and monitoring the parameters associated with water quality obtaining an average accuracy of prediction score of 99.93%. Keywords Water quality · Multilayer perceptron · Backpropagation · Supervised learning · Neural network
R. Ghazali (B) · N. A. M. Fuzi · S. A. Mostafa Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Batu Pahat, Johor, Malaysia e-mail: [email protected] N. A. M. Fuzi e-mail: [email protected] S. A. Mostafa e-mail: [email protected] U. F. Khattak · R. R. Ali School of Information Technology, UNITAR International University, 47301 Petaling Jaya, Malaysia e-mail: [email protected] R. R. Ali Department of Computer Engineering Technology, Northern Technical University, 41000 Mosul, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_42
499
500
R. Ghazali et al.
1 Introduction Increasing levels of various pollutants in the water have deteriorated the quality of rivers because of human activities and urbanization within the region [1]. Moreover, the river continues to be clogged and sludged by garbage and waste due to lack of storage by the local governments. The contaminants ultimately flow into river systems that provide fish and poultry with spawning and feeding grounds [2]. It is possible that underwater animals and plants which have adapted to living in an environment with a limited pH range will eventually be damaged by even a minor change. Humans and ecosystems depend on water for survival and health, so managing water quality is essential. A water quality parameter assessment is essential to improve the evaluation process’s efficiency and develop a better water management strategy [1]. A water quality model involves the analysis of numerical simulations to predict pollution levels in the water. Even though traditional processes-based models provide reasonably accurate predictions of river water quality parameters, the models rely on lengthy data and typically require unknown inputs [3]. The majority of these calculations are based on approximate descriptions of processes, and some of these approximate descriptions may ignore critical aspects that affect the processes in water. In the meantime, data-driven techniques serve as a valuable substitute for conventional process-based modeling. Developing models based on data-driven approaches requires fewer input parameters than process-oriented models and is computationally faster. Artificial neural networks (NNs) are at the forefront of this field [4]. The NN is a machine-based computational technique [5]. In water engineering, NN models have demonstrated their capabilities and suitability for modeling and simulating different physical aspects [6]. The technique is especially suitable for problems that require the manipulation of various parameters and nonlinear interpolation. This type of problem cannot be adequately addressed through conventional theoretical or mathematical methods [7]. A further advantage of the NN is that its architecture and nonlinear nature capture the issue’s embedded spatial and unsteady practices, as opposed to traditional modeling approaches. The NN is able to perform several remarkable tasks, such as learning, adapting, and achieving generalization [3, 5, 8]. Motivated by the ability of the NN, which has been successfully applied in many real-world problems [9], this research aims to investigate and apply the capability of multilayer perceptron (MLP), the widely known NN model, to the prediction of water quality parameter at Johore River.
2 Neural Networks Adapted from the way the human brain processes information, NNs are a paradigm for processing information. Despite the tremendous power of the human brain, researchers and even computer experts are excited about it. A neural network is modeled on a model of the brain neuron that consists of three parts [10]: a dendrite
An Application of Multilayer Perceptron for the Prediction of River …
501
that acts as a receptive zone and collects inputs from other neurons; a neuronal excitation core; and a neuronal dendrite that acts as a motor unit; a soma (cell body) besides the neuron, which is a crucial component of the nonlinear processing chain; additionally, there is an axon, an extension of the neuron, which transmits the output to other neurons in the processing chain. In neuroscience, a synapse is an area of connection between two neurons [4]. The synapses mediate the communication between neurons in the brain through the medium of elementary structural and functional units. Generally, neurons transmit their signals through chemical signals composed of spikes and short bursts of electrical activity [9]. These spikes may be considered as temporally averaged pulses in NNs, where Xj represents a continuous variable. Similar mechanisms and functions exist within NN. Many very simple processors exist, each of which may have local memory [7]. In order to achieve success and effectiveness, NN combines a large number of simple computational units that perform a multitude of operations, often called neural cells [9, 11]. As with the brain, NN systems achieve their power using these network components, similar to the thousands of axons (wires) that make up each cubic centimeter of the brain. NN layers are connected by weighted links and work together to generate outputs. Some NNs consist of many layers of neuron processing elements and involve sophisticated software or high-performance hardware. In most engineering and scientific fields, NNs are an effective method of solving nonlinear problems in the real world, such as the forecast of time series [12], image processing [13], pattern recognition [14], medical image analysis [15], and system optimization [16]. Their numerous application domains fall into different categories: for example, regression and generalization, classifications, association, clustering, pattern completion, and optimization.
2.1 Multilayer Perceptron Multilayer perceptron (MLP) consists of a collection of summing units connected by corresponding weights and is referred to as a feed-forward network [17]. The network is composed of several perceptions and can overcome the weaknesses of single-layer networks. This network’s input and output layers are separated by one or more hidden layers which transmit the data from it. Hidden nodes perform a function by meaningfully intervening between external inputs and network outputs. A simplified representation of the MLP structure is shown in Fig. 1 with a single hidden layer. Full connectivity is defined as the fact that every node in each layer is associated with every other node in the neighboring layer. The MLP calculates the network output in accordance with the following equation: ⎛ ⎞ N J Y = σ⎝ W jk σ Wi j X i + Woj + Wok ⎠ j=1
i=1
(1,)
502
R. Ghazali et al.
Fig. 1 MLP architecture
where x i represents the input value, W ij is the weight of the input layer toward the hidden layer, and W jk is the weight of the hidden layer toward the output layer. W oj represents bias for hidden node, σ is a sigmoid transfer function, and Y refers to the network’s output [18]. The MLP employs a highly interconnected topology in which each input is linked to each node in the initial hidden layer, the nodes in each further hidden layer are linked to each other, and so on [17]. The input nodes give values to the nodes of the first hidden layer. On this layer, forward propagation progresses until the network’s output is produced at the output layer. By utilizing a single hidden layer, MLP can estimate valid functions to any level of accuracy desired, assuming that there are satisfactory numbers of hidden nodes and that the nonlinear activation function follows a sigmoid shape [18].
2.2 Backpropagation Learning Algorithm An NN to produce meaningful forecasts will have to be trained using a certain algorithm, an underlying process that facilitates the learning process. When a learning algorithm is applied, the error is determined each time a training vector is handed
An Application of Multilayer Perceptron for the Prediction of River …
503
over to the network, and gradient descent is applied to the error. In the context of the overseen training of MLPs, backpropagation (BP) has established itself as the most famous learning algorithm. Haykin [19] asserts that the algorithm has two individual structures: it is easy to calculate locally and uses stochastic gradient descent (updating weights by learning patterns). In supervised learning, the BP algorithm builds on a relevant cost function or error, whose values are influenced by the desired and actual outputs of the network. The gradient descent approach must be used to minimize these values. The influence of each weight within the network is computed by a method based on a repeated training process. The goal is to minimize the error as much as possible using simple gradient descent, where the weight is allocated toward the steepest descent (negative of gradient). Consequently, the error rapidly decreased in this direction. Weights are coordinated according to the delta rule which implies that, in the example, the actual output from the network is deducted from the desired output. As a result of adjusting the weights, the network output becomes significantly closer to what was desired. The error function to be minimized is E=
1 (tk − yk )2 2 k
(2)
In this case, t k represents the desired output, and yk represents the network output. The slope of the error function for each component of the gradient depends on the weight of that component. ∂E = ∂W
∂E ∂E ∂E , ,..., ∂ W0 ∂ W1 ∂ Wn
(3)
The partial derivative of the error function is determined by the following formula regarding the weights and biases in the BP algorithm: ∂E ∂ E ∂ Si ∂neti = . ∂ Wi j ∂ Si ∂neti ∂ Wi j
(4)
W ij is the weight applied by neuron j to neuron i, S i is the neuron’s output, and neti is the weighted sum of the neuron i’s inputs. The weights are adjusted by subtracting the gradient from each other in order to reduce errors. After determining the derivative, the error function is minimized by performing a gradient descent as described below: Wi j (t + 1) = Wi j (t) − ε
∂E (t). ∂Wij
(5)
To control the learning step, a rate, ε, is used, which has a significant influence on convergence time. It is possible that a very high learning rate can cause oscillations in the weight space and result in only local minima as opposed to global optimums.
504
R. Ghazali et al.
At the same time, setting a small learning rate will result in a slow training process because of the number of weight steps that must be performed. Momentum terms are often added to prevent the above issues by sizing the effect of the early step/ derivative to the additional step and ensuring more stability in the learning process. ∆Wi j (t) = −ε
∂E (t) + μ∆w(t − 1) ∂Wij
(6)
2.3 Water Quality Prediction Water quality is affected by the biological, physical, microbiological, and chemical conditions in watercourses and subsurface aquifers. There are many uses of water, and the quality of the water directly influences all of them, including industrial water supplies, fish survival, diversification, abundance, recreational and governmental, and livestock watering [20]. Therefore, a river management system should be introduced more systematically in order to maintain and protect rivers from the pollution that has worsened due to human activities. To determine the level of water quality under the Water Quality Index (WQI), the evaluation is usually performed by considering a number of chemical parameters such as dissolved oxygen, biochemical oxygen demand, chemical oxygen demand, suspended solids, ammonia nitrogen, pH, temperature, nitrate, and phosphate. Therefore, in real practice, measuring water quality is not an easy task [21]. A traditional process-based model may provide good estimates of water quality parameters, but it typically can only be applied directly to environmental data with a lengthy calibration procedure [22]. Recent research has shown that NN is applicable to a number of areas, including water engineering, ecology, and environmental science. NN has been extensively used for forecasting and determining various water-related domains, such as water resource management. During the past decade, data-driven techniques have been used to model both freshwater and seawater qualities. A significant obstacle to use process-based modeling is the absence of adequate data on water quality and the high cost of monitoring aquaria. It is especially beneficial to use the MLP model since it is computationally very fast and requires fewer input parameters compared to deterministic models. However, the model remains a relatively unpopular tool in the field of predicting or forecasting water quality with an adequate choice of input data and a suitably designed MLP model. The proposed model can contribute to a great potential to simulate water quality in several rivers in Johore and achieve reasonably good accuracy prediction results accordingly. Scientists and environmentalists will gain from developing such forecast models, as they will be able to predict the levels of water pollution and take significant mitigation measures in advance.
An Application of Multilayer Perceptron for the Prediction of River …
505
3 Experimental Design In order to design a valid and reliable water quality assessment and prediction tool, in this work, we have selected three stations, namely the Bekok River, Sayong River, and Johor River. All the data were obtained from the Hydrology and Water Resources Division of the Department of Irrigation and Drainage (DID), Johore, Malaysia. The input data to the MLP model are organized as a temporal sequence of the previous parameter values so as to enable the model to learn the pattern of parameter values in the preceding period and make a prediction accordingly for the future event. For this work, the pH time series values are used as univariate data for the input–output mapping of the MLP model. A pH value measures a solution’s acidity or alkalinity, which lies between 0 and 14. Neutral water has a pH of 7. Acidic water has a pH value below 7, with 0 being the acidic [1]. Training the MLP is carried out using the BP learning method [21]. Datasets are separated based on their time distribution. Data analysis is being done in order to identify any relationship that might occur between the past, present, and future. Specifically, the training and out-of-sample data for the three rivers were divided into two sections, with respective distributions of 75 and 25%. In MLPs, the estimate of input units’ ranges from three to eight, with a single hidden layer and a single output unit. The training is stopped when the network reaches the maximum epoch of 3000 or a minimum error of 0.0001. An experimental selection of the learning rate and the momentum term is made between 0 and 1. The initialization of the weights is random within the range from − 0.5 to 0.5. To prevent computation obstacles and to meet the algorithm requirement, outputs and inputs will be scaled by means of the upper and lower bounds of the network’s transfer function. According to the selected sigmoid transfer function, the input–output variables are normalized between 0 and 1. The prediction performance of the MLP simulation model is evaluated using the mean squared error (MSE), CPU time, and accuracy rate. The MLP model, the training algorithm, and the prediction system have been developed using MATLAB.
4 Implementation and Results This prototype is built to assist in predicting and determining river water quality systematically and accurately. The interface module reads the pH data from the selected river stations, as shown in Fig. 2. The data are passed to the MLP model, and the network learns the data through the BP algorithm and predicts the next-day pH. Before the training process of the MP model starts, a few learning parameters need to be set, in particular, the number of input nodes, the learning rate, and momentum term. Based on the inputs provided, as shown in Fig. 2, three graphs will be generated:
506
R. Ghazali et al.
(a) Main interface
(b) Results interface
Fig. 2 Prediction of river water quality system
the training plot, the testing plot, and the error plot. For the purpose of demonstration, the simulation presented in this section shows and discusses the prediction made by the model on the Sungai Bekok River only. The training plot depicts the signal of the actual pH data versus the predicted pH for the selected river (please refer to Fig. 3). Meanwhile, Fig. 3 shows the prediction made by the MLP model using the testing data (out-of-sample data). Observe that the testing data comprise unseen data that have not been utilized during the training of the network and are only applied for testing purposes. The plot shows that the MLP can implement the nonlinear input– output mapping of the pH time series data, as the predicted data capture the pattern of the actual data.
(a) Training of the BP algorithm Fig. 3 Training and testing results of the BP
(b) Testing of the BP algorithm
An Application of Multilayer Perceptron for the Prediction of River …
507
Fig. 4 MSE training of the BP
The MSE is a way to quantify the difference between values implied by the simulator/predictor and the true values of the quantity being estimated. It measures the square of the error between the actual and prediction signals and becomes the most commonly used accuracy measure in the previous study. In this work, we observed the MSE obtained by the MLP model, as depicted in Fig. 4. Based on the visualization graph generated, it shows that the MLP model can produce remarkably stable learning curves, and the model is able to learn the pH data very well as the MSE was approaching zero and decreasing steadily along the training process (refer to Fig. 4). The amount of CPU time utilized for the BP algorithm to train the MLP is calculated and given in Fig. 5. CPU time can be defined as the amount of time a program takes to operate on a computer’s processor, usually calculated in ticks of the clock. As a measure of program performance, it compares CPU usage among programs. The CPU time was determined based on a machine that runs Windows Operating System, with the specifications of Intel processor (Core 2 Duo), CPU of 1.83, and 1.5 GB of RAM. Apart from the CPU runtime, the result in Fig. 5 also shows the average accuracy of the prediction score of 99.93%. It can be noticed that the MLP model is highly capable of producing high prediction accuracy of river water quality based on evaluating the values of pH signals.
508
R. Ghazali et al.
Fig. 5 Overall results of the river water quality prediction
5 Conclusion In this paper, we have designed, implemented, and simulated a fast convergence assessment and prediction tool using the MLP with water quality prediction using a BP determining algorithm, namely the pH parameters for three rivers in Johore. Results from the developed model confirmed that the MLP could predict the pH data for the rivers with good accuracy and small prediction error. This result indicates that the networks can track the signal well. Apart from generating an acceptably good forecast, the model also converges very fast with a small number of epochs. This performance led to the utilization of minimum CPU time to complete the learning. The nonlinearity of the underlying functions has enabled the networks to perform well in predicting the river pH using the MLP. This architecture enables the networks to accurately model input–output mapping interactions for short-term forecasting. The MLP model is capable of making an accurate decision, as it shows an average accuracy of 99.93%. Therefore, it can be suggested that the model has the potential to overcome the limitation of the existing water quality determination methods. Acknowledgements This paper is supported by Universiti Tun Hussein Onn Malaysia (UTHM).
References 1. Mesner N, Geiger J (2010) Understanding Your Watershed: pH. Utah State University, Water Quality Extension 2. Nafi SNMM, Mustapha A, Mostafa SA, Khaleefah SH, Razali MN (2020) Experimenting two machine learning methods in classifying river water quality. In: Applied computing to support industry: innovation and technology: first international conference, ACRIT 2019. Ramadi,
An Application of Multilayer Perceptron for the Prediction of River …
3.
4. 5. 6. 7.
8. 9.
10.
11. 12. 13. 14. 15. 16. 17. 18.
19.
20. 21.
22.
509
Iraq, September 15–16, 2019, Revised Selected Papers 1, Springer International Publishing, pp 213–222 Alqahtani A, Shah MI, Aldrees A, Javed MF (2022) Comparative assessment of individual and ensemble machine learning models for efficient analysis of river water quality. Sustainability 14(3):1183 Fallah-Ghalhary GA, Mousavi-Baygi M, Habibi-Nokhandan M (2009) Seasonal rainfall forecasting using artificial neural network. J Appl Sci 9(6):1098–1105 Ali RR, Mohamad KM (2021) RX_myKarve carving framework for reassembling complex fragmentations of JPEG images. J King Saud Univ Comput Inf Sci 33(1):21–32 Bahi JM, Contassot-Vivier S, Sauget M (2009) An incremental learning algorithm for function approximation. Adv Eng Softw 40(8):725–730 Ali RR, Al-Dayyeni WS, Gunasekaran SS, Mostafa SA, Abdulkader AH, Rachmawanto EH (2022, March) Content-based feature extraction and extreme learning machine for optimizing file cluster types identification. In: Advances in information and communication: proceedings of the 2022 future of information and communication conference (FICC), vol 2. Cham: Springer International Publishing, pp 314–325 Chang FJ, Chang LC, Kao HS, Wu GR (2010) Assessing the effort of meteorological variables for evaporation estimation by self-organizing map neural network. J Hydrol 384(1–2):118–129 Ghazali R, Hussain A, El-Deredy W (2006, July) Application of ridge polynomial neural networks to financial time series prediction. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 913–920 Al-Jabri KS, Al-Alawi SM (2010) An advanced ANN model for predicting the rotational behaviour of semi-rigid composite joints in fire using the back-propagation paradigm. Int J Steel Struct 10:337–347 Chen AS, Leung MT (2005) Performance evaluation of neural network architectures: the case of predicting foreign exchange correlations. J Forecast 24(6):403–420 Ghazali R, Hussain AJ, Liatsis P (2011) Dynamic ridge polynomial neural network: forecasting the univariate non-stationary and stationary trading signals. Expert Syst Appl 38(4):3765–3776 Hussain AJ, Liatsis P (2003) Recurrent pi-sigma networks for DPCM image coding. Neurocomputing 55(1–2):363–382 Kaita T, Tomita S, Yamanaka J (2002) On a higher-order neural network for distortion invariant pattern recognition. Pattern Recogn Lett 23(8):977–984 Shieh JS, Chou CF, Huang SJ, Kao MC (2004) Intracranial pressure model in intensive care unit using a simple recurrent neural network through time. Neurocomputing 57:239–256 Yu W, Morales A (2005) Neural networks for the optimization of crude oil blending. Int J Neural Syst 15(05):377–389 Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314 Ali RR, Mohamad KM, Jamel S, Khalid SKA (2018) Classification of JPEG files by using extreme learning machine. In: Recent advances on soft computing and data mining: proceedings of the third international conference on soft computing and data mining (SCDM 2018). Johor, Malaysia, February 06–07, Springer International Publishing, pp 33–42 Ab Aziz MF, Mostafa SA, Foozy CFM, Mohammed MA, Elhoseny M, Abualkishik AZ (2021) Integrating Elman recurrent neural network with particle swarm optimization algorithms for an improved hybrid training of multidisciplinary datasets. Expert Syst Appl 183:115441 Smith BA, Hoogenboom G, McClendon RW (2009) Artificial neural networks for automated year-round temperature prediction. Comput Electron Agric 68(1):52–61 MMohammed MA, Abdulhasan MJ, Kumar NM, Abdulkareem KH, Mostafa SA, Maashi MS, Chopra SS (2022) Automated waste-sorting and recycling classification using artificial neural network and features fusion: a digital-enabled circular economy vision for smart cities. Multimedia Tools Appl 1–16 Ahmed AN, Othman FB, Afan HA, Ibrahim RK, Fai CM, Hossain MS, Elshafie A (2019) Machine learning methods for better water quality prediction. J Hydrol 578:124084
ELM-MFO: A New Nature-Inspired Predictive Model for Financial Contagion Modeling of Indian Currency Market Swaty Dash, Pradip Kumar Sahu, and Debahuti Mishra
Abstract This paper proposes an enhanced hybridized machine learning approach to forecast future price exchange rates such as US Dollar (USD), Great Britain Pound (GBP) and Australian Dollar (AUD) to INR. This hybridized machine learning approach mainly consists of popularly used Extreme Learning Machine (ELM) and different optimization techniques used to optimize ELM parameters using Particle Swarm Optimization (PSO), Gray Wolf Optimization (GWO) and Moth Flame Optimization (MFO). Out of three optimization techniques, it has been observed that ELM with MFO (ELM-MFO) provides best accuracy in the process of forecasting as compared to rest two optimization techniques. The datasets used for experiments have been collected from public platform having different time delay formats such as one day, seven days, fifteen days and thirty days. Also, several technical indicators as well as statistical measures have been used to augment the original currency pair dataset to get a deeper insight of the datasets. From the experimentation, comparison and validation, it has been proved that the proposed ELM-MFO outperforms all the networks used for the comparison and achieves higher accuracy of 97% and 95% considering the overall and average accuracy, respectively, and additionally, the statistical validation through Kappa statistics shows the strong-level agreement with 73.25% of this ELM-MFO for the augmented currency pair datasets with combination of original attribute, TIs and SMs. Keywords Exchange rate prediction · Extreme learning machine (ELM) · Particle swarm optimization (PSO) · Gray Wolf Optimization (GWO) · Moth Flame Optimization (MFO).
S. Dash · P. K. Sahu Department of Information Technology, Veer Surendra Sai University of Technology, Burla, Sambalpur, Odisha, India e-mail: [email protected] D. Mishra (B) Department of Computer Science & Engineering, Siksha ‘O’ Anusandhan (Deemed to Be) University, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_43
511
512
S. Dash et al.
1 Overview In present era forecasting the currency exchange price is one of the challenging tasks for the economics and computer science researchers. The Forex plays an important role in International markets [1] to judge the overall health of the economy. This financial market prediction is really tough due to certain factors like the complexity of financial time series data, seasonal deviations and unreliable movements. It is also affected by investor’s changing activity caused by economy, political, social and psychological factors. In spite of all these, the aim of the investor is to achieve more profit from financial market. Predicting the exchange rate in Forex market on the basis of past market data is not an efficient approach always. Hence, several prediction methods have been applied by the researchers to gain profit from financial market. The literature suggests basically two phases of prediction models such as parametric and non-parametric methods. The non-parametric models are basically designed using artificial intelligence (AI)-based strategies such as: machine learning and soft computing. A literature survey indicates that artificial neural network (ANN) has been used for financial prediction by most of the authors. In order to obtain better prediction result during the last few years, many nonlinear models have designed [2– 4]. To address the drawbacks encountered in traditional artificial neural networks (ANNs), such as the need to fine-tune input weights in each iteration, which can lead to more complex algorithms, as well as the challenge of handling inappropriate parameter selection that may result in local minima, a logical solution has been devised [4–6]. This solution involves the utilization of random input weights and biases to generate the output weight matrix, accomplished through the implementation of an advanced form of ANN called extreme learning machine (ELM). It has been studied that while the ELM is advantageous over traditional ANNs, but there may be a chance of getting non-optimal solutions leading to under performance of testing data [7, 8]. Therefore, to get better and optimum solution, many nature or swarm-based optimization strategies have been embedded with the basic machine learning tools to get better performance. In this work, an attempt has been made to explore the properties of few swarmbased optimizers such as: Particle Swarm Optimization (PSO) [9, 10], Gray Wolf Optimization (GWO) [11, 12] and Moth Flame Optimization (MFO) [13–18] to get optimized parameters of ELM to attain better result in the field of prediction of future price exchange rate. The key contributions of this study are as follows: To propose a currency exchange price prediction strategy for forecasting of exchange price of Indian currency (INR) with respect to US Dollar (USD), Great Britain Pound (GBP) and Australian Dollar (AUD). An attempt has been made in this study to arrive at a predictive model by exploring the identification and prediction capability of widely used machine learning strategy ELM as a regressor optimized with few swarm-based techniques such as: PSO, GWO and MFO. As the time series currency datasets are having less features, first those features are expanded using
ELM-MFO: A New Nature-Inspired Predictive Model for Financial …
513
technical indicators (TIs) [19, 20] such as: simple moving average (SMA), triangular moving average (TMA), relative strength index (RSI), stochastic oscillator (%k), William %R and momentum along with the statistical measures (SMs) [21] such as: standard deviation (StdDev), variation, kurtosis for better insight into the features for training the models. The training has been done with respect to three types of datasets such as: dataset with original features, dataset with original features and TIs and third one is original feature and SMs. A detail comparison has been made to understand the effect of feature expansion and dataset augmentation through TIs and SMs. The proposed models are compared with basic ELM [7, 8, 22–27] to observe the recognition performance, and finally, the recognition performance of proposed swarm-based ELM model has been recorded for one day, seven days, fifteen days and thirty days in advance and found to be predicting interesting result. The rest of the paper has been organized as follows: the detail literature survey on ELM as a regressor for prediction and optimized models for currency exchange price prediction models are discussed in Sect. 2. Section 3 discusses the datasets, and the methodologies adopted for experimentation. Section 4 discusses the parameters used and proposed model for experimentation. The result analysis is discussed in Sect. 5, and finally, Sect. 6 concludes the paper with future scope.
2 Literature Review on ELM-based Predictive Models In this study, an attempt has been made to experiment the predictive model based on ELM and its variants. The various studies on ELM-based predictive model are explored here for getting better understanding of ELM-based models. Many studies on nature or swarm-inspired optimized ELMs have been proposed for financial market prediction such as: Das et al. [22–24] have made discussions on ELM and its variants and hybrid forms for Forex and stock market predictions. In [22] authors have tried to propose a hybrid form of ELM with Jaya optimization method to predict the USD to INR and EURO for 1 day–1 month predictive days. The key observation in this work is to augment the currency pair datasets with few TIs and SMs, and it has been seen that the better predictive performance of ELMJaya has been recorded for the combination of TIs with original datasets. In [23] and [24] authors explored the online sequential ELM (OSELM) for stock market as well as Forex market predictions. The performance of four stock market indices is used for forecasting in [23] by proposing a feature optimization based on firefly optimization algorithm for feature reduction. In this work also authors explored the data augmentation based on TIs and SMs. The work proposed in [24] was for the prediction of Forex market by hybridizing the OSELM with krill herd optimization algorithm by utilizing the TIs and SMs with original datasets for various window sizes. The authors Nayak et al. [25] proposed a cryptocurrency prediction approach for Litecoin, Ripple, Bitcoin and Ethereum, and a straight-forward comparison has been
514
S. Dash et al.
made on few statistical and machine learning-based methodologies such as: autoregressive integrated moving average, support vector machine, multi-layer perceptron. A hybridize currency exchange predictive model based on empirical mode decomposition (EMD) and fast reduced kernel ELM strategy has been proposed by Das et al. in [28]. Authors attempted a nonlinear data decomposition method to obtain important components from noisy environment. They tried to convert nonlinear data to stationary time series data using intrinsic mod functions. Similarly, Bisoi et al. [29] also applied the concept of EMD and variational mode decomposition (VMD) to observe the daily stock price movement. Authors presented an integrated strategy of robust kernel-based ELM and VMD optimized with differential evolution (DE) and named as DE-VMD-RKELM.
3 Preliminaries This section discusses the preliminary information related to the datasets, TIs and statistical measures used for construction of dataset for experimentation. Also, the ELM as a regressor along with the optimization strategies such as: PSO, GWO and MFO is discussed which are extensively used in this experiment for optimization of ELM parameters and used for comparison.
3.1 Description and Analysis of Datasets, TIs and SMs Machine learning has proven to be immensely beneficial in accurately forecasting financial market prices. Numerous predictive models have been developed, and one significant area of research has been the prediction of currency exchange prices in the Forex market. This particular field has garnered considerable attention from researchers due to its substantial market size and potential for valuable insights. In this study of currency exchange market, three categories of currency datasets of three countries such as: USD, GBP and AUD are taken to map the exchange price with respect to the INR [30]. The datasets are re-constructed/augmented using TIs and SMs [19, 20] and the representation of augmented datasets/currency pairs is shown in Table 1. The SMA, TMA, RSI, stochastic oscillator (%k), William %R and momentum are the TIs [19, 20] used and the used SMs are: standard deviation (StdDev), variation and kurtosis [21].
ELM-MFO: A New Nature-Inspired Predictive Model for Financial … Table 1 Detailed representation of price exchange dataset
515
Prediction horizon
USD/INR
GBP/INR
AUD/INR
1 day
4988
4988
4988
7 days
4988
4988
4988
15 days
4985
4985
4985
30 days
4970
4970
4970
3.2 Methodologies Adopted The main technique used for modeling of data is ELM which is a purely machine learning algorithm that is similar to ANN but only with one hidden layer. Also, there are three optimization techniques which have been tested to tune the parameters of ELM algorithm so that it will behave more efficiently to handle exchange price prediction data. The ELM proposed by Huang et al. [27] is a unique procedure for faster learning speed, better generalization performance and minimal human interference. The actual feedforward neural network is built with its irrelevant variables with single hidden layer. ELM takes very less time for the computation of new classifier in the purpose of training desired model. The benefits offered by ELMs have been extensively investigated for the design of predictive models. These advantages include accelerated learning, improved generalization performance, straightforward implementation, reduced computational complexity, the ability to set hidden layer parameters once randomly generated without requiring tuning, and a unified solution for various practical applications. Moreover, ELMs exhibit higher scalability and handle complex computations, making them highly suitable for developing efficient predictive models. Thus, ELM has a better scalability over regression and multi-class classification due to a widespread type of feature mapping with a unified learning platform. The remarkable speed of learning exhibited by the network, surpassing traditional methods by up to a thousand fold, coupled with its independence from control variables (manually tuned parameters) [7, 8, 22–25], has led to widespread adoption of this network in the realm of financial forecasting. In this study, PSO, GWO and MFO are used for optimizing the ELM network for currency price prediction for the augmented currency pair datasets. The principle of PSO algorithm is used to find a place for a swarm of flying birds to land where availability of food is maximized as well as the risk of existence of predators is minimized and is a population-based distributed learning scheme [9, 10]. The PSO possesses several advantages that make it a suitable choice for optimizing the parameters of an ELM network. Despite its simplicity as an optimization algorithm, it demonstrates satisfactory accuracy. While it may have a slower learning speed compared to other methods, PSO excels in solving complex optimization problems involving multiple objectives, variables, and constraints. Leveraging these capabilities, PSO effectively aids in optimizing the parameters of an ELM network to achieve optimal performance. The GWO is proposed by Mirjalili based on the
516
S. Dash et al.
inspiration of gray wolves. It mimics the hunting mechanism and leadership hierarchy of gray wolves. The leadership hierarchy is simulated by alpha (α), beta (β), delta (δ) and omega (ω) wolves. The fittest solution is α, and the second and third best solutions are β and δ. The rest of each member of the population is represented by ω. The GWO involves three main steps of hunting such as: prey searching; prey encircling and prey attacking [11, 12]. The GWO involves a smaller number of search parameters but provides competitive performance compared to other metaheuristic methods. However, the GWO has slow convergence and poor local searching ability. The MFO algorithm is based on navigation of moths [13–18]. It is assumed that the moths represent candidate solutions. The positions of moths in the space represent the variables of the problem. By changing the associated position vectors, the moths can fly in hyperdimensional space. The moths act like search agents which move around search space. But, the flames are best position of moths achieved so far. The flames may be thought of as flags or pine dropped by moths during searching. Hence, each moth searches around a flame and updates it if a better solution is achieved. The key advantage of MFO allows the moth to move around flames and not between the moths, leading to better exploration and exploitation and the working process of MFO is shown in Fig. 1.
Parameter Setting
Optimal Selection
Population Intialization
Iteration Process
Fitness Function Fig. 1 Working process of MFO
ELM-MFO: A New Nature-Inspired Predictive Model for Financial …
517
Table 2 Parameter values GWO
PSO
MFO
#dim
30
#Max_iter
1000
#Max_iter
1000
#iters
200
#lb
− 100
#lb
− 100
#PopSize
50
#ub
100
#ub
100
# Vmax
6
#dim
30
#dim
30
#population size
50
#SearchAgents_no
5
#SearchAgents_no
5
#wMax
0.9
#wMin
0.2
#c1
2
#c2
2
#lb
10
4 Experimentation This section discusses the various parameters used and their associated values and the proposed forecasting strategy experimented.
5 Parameter Setup The experimentation and empirical comparison of proposed ELM-MFO optimizer has been performed, and performances are recorded with respect to ELM-PSO and ELM-GWO. All those models are used for prediction with different time delay formats such as: one day, seven days, fifteen days and thirty days. The other parameters of PSO, GWO and MFO are detailed in Table 2.
6 Proposed Model The prediction process of ELM-MFO predictor starts from three categories of datasets such as: USD/INR, GBP/INR and AUD/INR for one-day, seven-day, fifteenday and thirty-day average price exchange details. Figure 2 depicts the workflow of the proposed predictor.
518
S. Dash et al.
Fig. 2 Workflow of ELM-PSO predictor
7 Proposed Model The prediction process of ELM-MFO predictor starts from three-categories of datasets such as: USD/INR, GBP/INR and AUD/INR for one-day, seven-day, fifteenday and thirty-day average price exchange details. Figure 2 depicts the workflow of the proposed predictor. The whole process has been categorized into three different phases. In the first phase of experimentation, the main focus is on data level or preprocessing tasks, such as: data collection, cleaning, normalization, standardization, and also the various important and emblematic features have been extracted from the datasets, such as: different TIs as well as different SMs. The second phase, focused on modeling of data by ELM, here, basically ELM is used for regression process by dividing the whole dataset into two parts like training and testing sets. All training phases of model carried out by using training set and all testing phase have been carried out using testing set. Finally, in the last phase of experimentation, the optimization of parameters of ELM has been addressed to design an accurate predictor using MFO and the performance of the proposed ELM-MFO predictor has been evaluated and validated by making necessary comparisons with PSO and GWO. Furthermore, to validate the proposed predictor’s efficacy in modeling the financial currency market, accuracy measures and statistical methods have been employed. Through these rigorous evaluations, the predictor has been established as a strong contender, particularly when considering the Indian scenario. These analyses confirm that the proposed predictor is a reliable choice for effectively modeling the dynamics of the financial currency market.
ELM-MFO: A New Nature-Inspired Predictive Model for Financial …
519
8 Result Analysis This experimentation has been performed in two phases; in the first phase, we tried to make a comparison between the prediction accuracy observed between the basic ELM and ELM-MFO predictor for forecasting the exchange price of the three currency pair datasets for four predictive days such as: one day, seven days, fifteen days and thirty days. The comparison has been made with respect to original attributes of currency pair datasets, original attributes of currency pairs combined with TIs and original attributes of currency pairs combined with SMs. Figure 3 depicts the accuracy curve based on the currency pair USD/INR where the Figs. 3a, b and c show the observed accuracy comparison for original attributes of currency pair datasets, original attributes of currency pairs combined with TIs and original attributes of currency pairs combined with SMs, respectively, and from those graphs, the stability of the ELM-MFO can be better observed. Likewise, Figs. 4a, b and c, and Figs. 5a, b and c illustrate the observed accuracy comparison for three scenarios: original attributes of currency pair datasets, original attributes combined with TIs, and original attributes combined with SMs. These graphs provide insights into the stability of the ELM-MFO model for augmented currency pairs, specifically GBP/INR and AUD/ INR. The second phase of experimentation discusses the comparison for USD/INR, GBP/INR and AUD/INR price exchange for one-day, seven-day, fifteen-day and thirty-day predictive days based on original attributes of currency pairs combined
Fig. 3 Error graphs showing the comparison between ELM and ELM-MFO for USD/INR price exchange for one-day, seven-day, fifteen-day and thirty-day predictive days based on a original attributes of currency pair datasets; b original attributes of currency pairs combined with TIs and; c original attributes of currency pairs combined with SMs
520
S. Dash et al.
Fig. 4 Error graphs showing the comparison between ELM and ELM-MFO for GBP/INR price exchange for one-day, seven-day, fifteen-day and thirty-day predictive days based on a original attributes of currency pair datasets; b original attributes of currency pairs combined with TIs and; c original attributes of currency pairs combined with SMs
with TIs and SMs (augmented currency pair datasets) for ELM with ELM-PSO, ELM-GWO and ELM-MFO. Figure 6a shows the comparative error graph of ELM versus ELM-PSO for the augmented currency pair dataset of USD/INR for oneday, seven-day, fifteen-day and thirty-day predictive days, and similarly, Figs. 6b and c depict the predictive error comparison based on GBP/INR and AUD/IND currency pair datasets, respectively, for the above-mentioned four predictive days of forecasting. Figure 7a shows the comparative error graph of ELM versus ELM-GWO for the augmented currency pair dataset of USD/INR for one-day, seven-day, fifteenday and thirty-day predictive days, and similarly, Figs. 7b and c depict the predictive error comparison based on GBP/INR and AUD/IND currency pair datasets, respectively, for the above-mentioned four predictive days of forecasting. Similarly, Figs. 8a, b and c depict the comparison graphs for the ELM versus ELM-MFO for USD/INR, GBP/INR and AUD/IND, respectively, for the four predictive horizons with the same currency pair datasets configuration. From those error comparisons, it can be well observed that the ELM-MFO is stable predictor in comparison to ELM, ELM-PSO and ELM-GWO. A straight-forward comparison has been made with respect to overall accuracy and average accuracy [31] observed among the basic ELM, ELM-PSO, ELM-GWO and ELM-MFO as shown in Table 3. From this table, the overall accuracy and average accuracy of the ELM-MFO with 97 and 95%
ELM-MFO: A New Nature-Inspired Predictive Model for Financial …
521
Fig. 5 Error graphs showing the comparison between ELM and ELM-MFO for AUD/INR price exchange for one-day, seven-day, fifteen-day and thirty-day predictive days based on a original attributes of currency pair datasets; b original attributes of currency pairs combined with TIs and; c original attributes of currency pairs combined with SMs
show the better performance over the rest of the comparative predictors. This ELMMFO achieves 6% improvement with respect to the basic ELM considering overall accuracy and average accuracy. Similarly, the 4 and 2% performance improvement of ELM-MFO over ELM-PSO and ELM-GWO establishes the performance of the proposed ELM-MFO in this experimentation for the Forex market predictions under consideration. Here, the Kappa statistics of Cohen’s Kappa [32] is measured to test the inter-rater reliability to observe the performance of the predictors with respect to the data collected in this study. From this statistical validation, it can be inferred that as the value of ELM-MFO is 73.25%, it leads to the strong-level agreement and ELM-GWO, ELM-PSO and ELM with 62.32%, 57.32% and 48.11%, respectively, depict the moderate-level agreement with respect to the augmented currency pairs comprising of combination of original attributes, TIs and SMs. To sum up, after conducting various comparisons in this study, it is evident that the navigation method inspired by moths, particularly their transverse orientation behavior of moving in a straight line when the light source is distant, has significantly contributed to the design of an optimized predictor called ELM-MFO. This approach effectively circumvents the issue of local minima, resulting in an enhanced predictive accuracy. The evidence supporting these findings is derived from comprehensive comparisons and rigorous statistical validation performed throughout the research.
522
S. Dash et al.
Fig. 6 Error comparison for USD/INR price exchange for one-day, seven-day, fifteen-day and thirty-day predictive days based on original attributes of currency pairs combined with TIs and SMs for a ELM versus ELM-PSO; b ELM versus ELM-GWO and c ELM versus ELM-MFO
Fig. 7 Error comparison for GBP/INR price exchange for one-day, seven-day, fifteen-day and thirty-day predictive days based on original attributes of currency pairs combined with TIs and SMs for a ELM versus ELM-PSO; b ELM versus ELM-GWO and c ELM versus ELM-MFO
ELM-MFO: A New Nature-Inspired Predictive Model for Financial …
523
Fig. 8 Error comparison for AUD/INR price exchange for one-day, seven-day, fifteen-day and thirty-day predictive days based on original attributes of currency pairs combined with TIs and SMs for a ELM versus ELM-PSO; b ELM versus ELM-GWO and c ELM versus ELM-MFO
Table 3 Performance recognition and statistical validation
Predictors
Overall accuracy (%)
Average accuracy (%)
Kappa statistics
ELM
91
89
0.4811
ELM-PSO
95
93
0.5732
ELM-GWO
93
91
0.6232
ELM-MFO
97
95
0.7325
9 Conclusion and Future Scope In this paper, an attempt has been made to explore the ELM network as a predictor for forecasting the exchange price of three currency pair datasets with respect to Indian currency for one-day, seven-day, fifteen-day and thirty-day predictive days. The better scalability, feature mapping ability, faster learning speed and control over the parameters of ELM have shown their advantages as predictor. The free parameters of this ELM network have been optimized with PSO, GWO and MFO nature-inspired swarm-based optimizers. From the experimentation, comparison and validation, it has been observed that the MFO has shown its better combination with the ELM due to its ability to deal with local optima and maintaining a right balance between exploration and exploitation has shown a very accurate approximation of global optimum leading to a good ELM-MFO predictor for currency exchange price.
524
S. Dash et al.
The future of ELM-MFO combined network may be utilized for any classification applications.
References 1. Cheung YW, Chinn MD, Pascual AG, Zhang Y (2019) Exchange rate prediction redux: new models, new data, new currencies. J Int Money Financ 95:332–362 2. Cavusoglu N, Goldberg MD, Stillwagon J (2021) Currency returns and downside risk: debt, volatility, and the gap from benchmark values. J Macroecon 68:103304 3. Ito H, McCauley RN (2020) Currency composition of foreign exchange reserves. J Int Money Financ 102:102104 4. Fang X, Liu Y (2021) Volatility, intermediaries, and exchange rates. J Financ Econ, Elissa 5. Pradeep Kumar D, Ravi V (2018) Soft computing hybrids for FOREX rate prediction: a comprehensive review. Comput Oper Res 99:262–284 6. Samitas A, Kampouris E, Kenourgios D (2020) Machine learning as an early warning system to predict financial crisis. Int Rev Financ Anal 71:101507 7. Ding X, Liu J, Yang F, Cao J (2021) Random compact Gaussian kernel: application to ELM classification and regression. Knowl-Based Syst 217:106848 8. Mohanty DK, Parida AK, Khuntia SS (2021) Financial market prediction under deep learning framework using auto encoder and kernel extreme learning machine. Appl Soft Comput 99:106898 9. Pradeep Kumar D, Ravi V (2017) Forecasting financial time series volatility using particle swarm optimization trained quantile regression neural network. Appl Soft Comput 58:35–52 10. Bagheri A, Peyhani HM, Akbari M (2014) Financial forecasting using ANFIS networks with quantum-behaved particle swarm optimization. Expert Syst Appl 41(14):6235–6250 11. Rajakumar R, Sekaran K, Hsu CH, Kadry S (2021) Accelerated grey wolf optimization for global optimization problems. Technol Forecast Soc Change 169:120824 12. Liu M, Luo K, Zhang J, Chen S (2021) A stock selection algorithm hybridizing grey wolf optimizer and support vector regression. Expert Syst Appl 179:115078 13. Mirjalili S (2015) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl-Based Syst 89:228–249 14. Shehab M, Abualigah L, Al Hamad H (2020) Moth–flame optimization algorithm: variants and applications. Neural Comput Appl 32:9859–9884 15. Li Y, Zhu X, Liu J (2020) An improved moth-flame optimization algorithm for engineering problems. Symmetry 12(8):1234 16. Hussien AG, Amin M, Abd El Aziz M (2020) A comprehensive review of moth-flame optimisation: variants, hybrids, and applications. J Exper Theor Artif Intell 32(4):705–725 17. Xu Y, Chen H, Heidari AA, Luo J, Zhang Q, Zhao X, Li C (2019) An efficient chaotic mutative moth-flame-inspired optimizer for global optimization tasks. Expert Syst Appl 129:135–155 18. Muduli D, Dash R, Majhi B (2020) Automated breast cancer detection in digital mammograms: a moth flame optimization based ELM approach. Biomed Signal Process Control 59:101912 19. Panopoulou E, Souropanis I (2019) The role of technical indicators in exchange rate forecasting. J Empir Financ 53:197–221 20. Dai Z, Zhu H, Kang J (2021) New technical indicators and stock returns predictability. Int Rev Econ Financ 71:127–142 21. Bao L, Cheng L (2013) On statistical measure theory. J Math Anal Appl 407(2):413–424 22. Das SR, Mishra D, Rout M (2020) A hybridized ELM-Jaya forecasting model for currency exchange prediction. J King Saud Univ Comput Inf Sci 32(3):345–366 23. Das SR, Mishra D, Rout M (2019) Stock market prediction using firefly algorithm with evolutionary framework optimized feature reduction for OSELM method. Expert Syst Appl X 4:100016
ELM-MFO: A New Nature-Inspired Predictive Model for Financial …
525
24. Das SR, Kuhoo, Mishra D, Rout M (2019) An optimized feature reduction based currency forecasting model exploring the online sequential extreme learning machine and krill herd strategies. Phys A: Statist Mech Appl 513:339–370 25. Nayak SC, Satyanarayana B, Kar BP, Karthik J (2021), An extreme learning machine-based model for cryptocurrencies prediction, smart computing techniques and applications. In: Proceedings of the fourth international conference on smart computing and informatics, pp 127–136 26. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501 27. Zhu QY, Qin AK, Suganthan PN, Huang GB (2005) Rapid and brief communication Evolutionary extreme learning machine. Patt Recog 38:1759–1763 28. Das PP, Bisoi R, Dash PK (2018) Data decomposition based fast reduced kernel extreme learning machine for currency exchange rate forecasting and trend analysis. Expert Syst Appl 96:427–449 29. Bisoi R, Dash PK, Parida AK (2019) Hybrid variational mode decomposition and evolutionary robust kernel extreme learning machine for stock price and movement prediction on daily basis. Appl Soft Comput 74:652–678 30. https://in.investing.com/currencies/. Last accessed on 10 July 2022 31. Raghav A (2022) Know the best evaluation metrics for your regression model! Available online: https://www.analyticsvidhya.com/blog/2021/05/know-the-best-evaluationmetrics-for-your-regression-model/. Last accessed 16 Aug 2022 32. McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 22(3):276–282
Plant Disease Detection Using Fine-Tuned ResNet Architecture Jalluri Geetha Renuka, Goberu Likhitha, Vamsi Krishna Modala, and Duggi Manikanta Reddy
Abstract Agriculture is vital for human survival as it provides the food, fiber and other essential resources needed for a growing population. It holds a crucial significance in advancing rural advancement, fostering economic growth, and promoting environmental sustainability. Lack of knowledge about plant diseases and proper use of pesticides can lead to crop loss, decreased yield and financial loss for farmers. It also can lead to overuse of pesticides, which can harm the environment and potentially harm human health. A modern solution using deep learning algorithms to detect plant diseases and suggest appropriate pesticides can greatly benefit farmers by improving crop yields, reducing pesticide usage and increasing profits. In this research, we present a fine-tuned Residual Network (ResNet) architecture that demonstrates a significant improvement in accuracy for plant disease detection from images of plant leaves. We leveraged the “Plant Village Dataset” dataset available on Kaggle to develop and evaluate the performance of our model for plant disease detection from images of plant leaves. We employed the augmentation technique to enhance the dataset and trained the model using the augmented images. Our model achieved accuracy of 99.68%. Keywords Augmentation · Deep learning · Fine-tuning · Residual network
1 Introduction Agriculture is a critical component of our global food system and is responsible for producing the food, fiber, and other essential products that we all depend on. The success of agriculture depends on the knowledge, skills, and expertise of farmers, J. G. Renuka (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] G. Likhitha · V. K. Modala · D. M. Reddy Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_44
527
528
J. G. Renuka et al.
who must constantly adapt to new challenges, such as changing weather patterns and disease outbreaks, in order to maintain the health and productivity of their crops. In this regard, the ability of farmers to accurately identify and respond to diseases affecting their crops is a critical component of sustainable agriculture and essential to maintaining a stable food supply. By investing in the knowledge and capacity of farmers, we can not only ensure the success of individual farms, but also support the long-term stability and resilience of our food system. The agriculture sector is a major contributor to global trade, with agricultural exports accounting for a significant portion of many countries’ GDP. Despite its importance, the sector also faces several challenges, including the misapplication of pesticides and a lack of knowledge about the correct identification of plant diseases. Every year, an estimated INR 50,000 crore worth of crops are lost due to pest and disease outbreaks, as reported by The Economic Times. This can lead to the overuse of pesticides, which can have negative impacts on human health and the environment and can also result in significant crop waste. According to estimates, up to 40% of the world’s food production is lost or wasted due to various factors, including pest damage and disease outbreaks. Addressing these challenges and improving the knowledge and capacity of farmers to correctly identify and respond to plant diseases is crucial to reducing crop waste, improving food security, and supporting the longterm sustainability of the agriculture sector. Accurately identifying plant diseases is a crucial step in preventing their spread to neighboring crops, selecting effective control measures, and enhancing crop productivity. This study addresses several research gaps in the area of plant disease recognition using deep learning, including the dataset and real-world agriculture conditions. Deep learning is a specialized area within machine learning that focuses on using algorithms modeled after the brain’s structure and function, known as artificial neural networks. In our model, we utilized Residual Network (ResNet) [1] architecture to achieve improved accuracy. There are a number of different architecture options within the realm of CNN, such as AlexNet, MobileNet, DenseNet and Visual Geometric Group (VGG) Networks but the academic papers that we reviewed provided a valuable insight into the use of ResNet for achieving higher accuracy [3] . They suggested that fine-tuning ResNet is a highly effective way to improve its performance. In particular, the papers highlighted the key aspects of ResNet that can be adjusted to enhance accuracy and provided guidance on how to best fine-tune the network. As a result of our research, we were able to make informed decisions on how to optimize ResNet for our specific task, leading to improved accuracy in our model. In our evaluation of the model, we utilized images from the Plant Village dataset, which encompasses 37 different plant disease classes. The baseline accuracy of the ResNet architecture was impressive at 97% [1], but our proposed pipeline was able to further improve it to reach 99.68%. For our study, we utilized a refined ResNet18 architecture to enhance the accuracy of our model. By doing so, we aimed to incrementally adapt the pre-trained features to the new data, with the potential to achieve substantial improvements. The fine-tuning process is a commonly used technique in deep learning and has been shown to be an effective way to increase accuracy,
Plant Disease Detection Using Fine-Tuned ResNet Architecture
529
especially when working with limited training data. In our study, we found that the fine-tuned ResNet18 architecture was able to produce meaningful improvements and enhance the overall performance of the model. The overarching goal of this research was to develop a highly accurate model. While various other architectures can achieve a decent level of accuracy, our aim was to create a model with even higher accuracy by using fine-tuning in combination with the ResNet architecture. To achieve this, we trained the model using the Plant Village dataset. Additionally, we employed data augmentation techniques to generate augmented images, which helped to enhance the efficiency of the training process. The use of these techniques allowed us to create a model that was not only highly accurate, but also robust and capable of handling diverse and challenging data.
2 Literature Review In 2019, Chen et al. [10] A recent proposal has put forth a novel deep learning technique that leverages cutting-edge convolutional neural networks (CNNs) to perform real-time identification of apple leaf diseases. The design features the GoogLeNet Inception architecture and the Rainbow concatenation approach to enhance accuracy. A deep learning approach has been proposed that combines the Inception module and Rainbow concatenation within the Single Shot MultiBox Detector (SSD) framework to identify five prevalent apple leaf diseases in real-time. The resulting model, known as the INAR-SSD, showcases an improved level of accuracy. In the study performed by Zeng et al. in [2], the INAR-SSD model was found to have exceptional detection performance with a mean average precision of 78.80% on the ALDD dataset and a fast detection rate of 23.13 FPS. In 2020, Zeng et al. [2] An investigation was conducted to assess the effectiveness of multiple models in determining the extent of Citrus HLB. Six different models were trained under consistent conditions to meet this objective. The findings showed that the Inception_v3 model, trained for 60 epochs, had the highest accuracy of 74.38%, surpassing the other models thanks to its high computational efficiency and low number of parameters. To examine the impact of data augmentation using Generative Adversarial Networks (GANs) on model performance, the Deep Convolutional Generative Adversarial Networks (DCGANs) approach was used to double the size of the original training dataset. As a result, the accuracy improved to 92.60%, a 20% increase compared to the Inception_v3 model’s accuracy obtained from the original dataset. In 2021, Barburiceanu et al. [3] this study investigates the use of pre-trained deep-learning models (AlexNet, VggNet, ResNet) on ImageNet object categories for texture classification tasks, particularly in the context of plant disease detection. A deep learning-based feature extraction technique is introduced to identify plant species and classify plant leaf diseases. The proposed method is compared with traditional handcrafted feature extraction descriptors and deep-learning based approaches on the same set. The results show that the newly proposed system is
530
J. G. Renuka et al.
more efficient in terms of speed and accuracy, surpassing both conventional and endto-end CNN-based techniques. It provides a solution to the limited data availability challenge in precision agriculture. In 2021, Yuan et al. [4] a novel approach, the SPOED-CCNN has been introduced in this paper for segmenting crop disease leaves. The network consists of two components: a network for detecting the presence of disease in a region and another for segmenting the affected area. The region-based disease recognition network incorporates a cascade convolution neural network and a spatial pyramid and is made up of three levels of convolution neural network models that escalate in complexity. Each level of the network extracts various features of crop disease leaves and screens the images for their presence. The integration of a spatial pyramid pooling layer at each level of the network enhances its ability to handle inputs of diverse dimensions. The regional disease segmentation network, which is structured as an Encoder-Decoder, employs multi-scale convolution kernels. The average results of the proposed method surpass 90%. In 2020, Kumar et al. [5] the purpose of this article is to develop an expert system that can forecast various fungal diseases affecting plants. The system utilizes a feedforward neural network for disease classification, which boasts a high degree of accuracy in disease detection. The proposed approach involves three essential steps: pre-processing of the data, exploratory data analysis, and the detection module. The prediction accuracy for each disease is consistently above 98% on average. In 2019, Nie et al. [6]. This research introduces a fresh method for detection of strawberry verticillium wilt by using a faster R-CNN-based disease detection network and multi-task learning. Unlike previous approaches that analyze the overall appearance of the entire plant, SVWDN can automatically categorize petioles and young leaves to identify the presence of verticillium wilt. The results demonstrate that SVWDN outperforms other methods, achieving an mAP of 77.54% in object detection across four categories. In 2022, Patil and Kumar [7] the study presents a new Rice-Fusion framework for diagnosing rice diseases that integrates a convolutional neural network (CNN) architecture with multimodal data fusion. The framework first extracts numerical features from sensor-collected agro-meteorological data and visual features from images of rice plants. These features are combined through concatenation and dense layers to produce a single output for disease diagnosis. The proposed Rice-Fusion framework exhibits high accuracy, with a testing accuracy of 95.31%. In 2020, Yang et al. [11]. A novel approach named LFC-Net is introduced in this study, which comprises three interconnected connections: a position network, feedback control system, and a classifier. It also includes a self-supervision mechanism. The location network is in charge of identifying significant parts in tomato images, which are then refined through a series of iterations guided by the feedback network. The classification network then utilizes the information-rich regions detected by the location network, along with the complete tomato image, to make the classification. The LFC-Net model represents a joint effort of multiple networks, enabling collective advancement.
Plant Disease Detection Using Fine-Tuned ResNet Architecture
531
In 2019, Khan et al. [12] the study introduces a three-step pipeline for diagnosing apple leaf diseases, which consists of preprocessing, spot identification, and feature analysis and categorization. During the preprocessing stage, the appearance of apple leaf spots is improved by utilizing a hybrid technique that integrates 3D box filtering, de-correlation, 3D Gaussian filtering, and 3D median filtering. The lesion spots are then precisely separated through the application of a correlation-based method optimized by expectation maximization segmentation. Subsequently, color, local binary pattern and color histogram features are combined and optimized using a genetic algorithm before being classified through the One-versus-All multi-class support vector machine. The findings of the suggested approach indicate a heightened accuracy in diagnosing specific illnesses affecting apples.
3 Methodology The proposed system involves a fine-tuning [13] process on the ResNet [1] model, which is a widely used deep neural network architecture for image classification tasks. During the fine-tuning process, the pre-trained ResNet model is further trained on the Plant Village dataset to improve its performance for the specific task of disease detection. The system proposed in this study employs the widely utilized Adam optimization algorithm in deep learning, with a cyclical learning rate schedule. The learning rate is dynamically adjusted during training, allowing the model to converge more effectively. The activation function used in the proposed system is rectified linear unit (ReLU), a popular choice for deep neural networks due to its ability to improve the training speed and prevent vanishing gradients. Compared to other commonly used architectures such as DenseNet121, DenseNet201, MobileNet, MobileNetV2, ResNet50, ResNet152V2, and VGG19, [3] the proposed system with fine-tuned [13] ResNet architecture and Adam optimizer with cyclical learning rate schedule can potentially achieve higher accuracy and faster convergence for the task of plant leaf disease detection. Step wise procedure of proposed system is mentioned in Fig. 1 (A) Dataset The Plant Village Dataset was utilized in our study for training and evaluating our model. This dataset is recognized as the largest open-source collection of leaf images for disease diagnosis, expertly curated by professionals in the field of plant pathology. It comprises a diverse array of 54,309 images of healthy and diseased leaves of 14 different crops, all labeled by experts in the field. The dataset provides a comprehensive representation of various diseases affecting leaves and includes samples of leaves with varying degrees of infection. This diversity of diseases and infection levels makes the Plant Village Dataset an ideal resource for developing models for leaf disease diagnosis. We used augmentation for increasing the size of the training dataset and creating diverse versions of the images by applying random transformations such as rotation, scaling, flipping, and more. This results in a more robust model that can better handle variations in the images and can generalize better to unseen
532
J. G. Renuka et al.
Fig. 1 Steps of data collecting and processing
data. Therefore, the utilization of image augmentation [2] helps mitigate overfitting and enhances the model’s performance on unseen data, making it a crucial technique in the training process when working with a substantial number of images. Some of images and list of classes are mentioned below in Figs. 2 and 3. (B) Convolutional Neural Network (CNN) Deep learning, a subfield of artificial intelligence, is built on the concept of neural networks and is trained to discover patterns and relationships in vast amounts of data. Unlike traditional machine learning, deep learning models have a deeper structure, composed of multiple hidden layers allowing for a more intricate hierarchy of features to be learned. There are various deep learning architectures including convolutional neural networks (CNNs) [2], recurrent neural networks (RNNs), Autoencoders, and generative adversarial networks (GANs) [2]. CNNs, in particular, are highly effective in image classification tasks due to their capacity to learn spatial hierarchies of features from the input data through multiple layers of convolution and pooling operations. Its general architecture represented in Fig. 4. In the field of computer vision, CNNs have achieved remarkable success in tasks such as object recognition, image classification, and semantic segmentation. This is due to the capability of CNNs to extract high-level features from images and detect patterns, textures, and shapes. As a result, CNNs have become a commonly used approach for various applications and continue to push the boundaries of the state-of-the-art in several domains. The fundamental structure of CNNs consists of neurons with tunable weights and biases. The neurons receive inputs and process them through a dot product operation, followed by a non-linear transformation if necessary. The entire network forms a single differentiable function that transforms
Plant Disease Detection Using Fine-Tuned ResNet Architecture Fig. 2 List of classes in dataset
533
534
J. G. Renuka et al.
Fig. 3 Sample images from dataset
raw image pixels into class scores, and the final layer of a CNN typically uses a loss function, such as SVM or Softmax, for training. The techniques used for training regular neural networks can also be applied to CNNs [2]. (C) Residual Networks (RESNET) ResNet is a type of deep convolutional neural network (CNN) architecture [2]. It was introduced to address the problem of vanishing gradients in deep networks, where the error signals become weaker as they pass through multiple layers. ResNet solves this problem by introducing residual connections, which allow the error signals to bypass one or more layers and reach deeper layers of the network with increased strength. This enables ResNet to train much deeper networks than other architectures, leading to improved performance on various tasks. In our methodology, we have used the ResNet architecture to develop our model for detecting leaf diseases in plants. The ResNet architecture allows our model to
Plant Disease Detection Using Fine-Tuned ResNet Architecture
535
Fig. 4 General CNN architecture
learn complex representations of the leaf images and make accurate predictions. The residual connections in ResNet also make the training process more stable, enabling us to train our model on a large dataset without encountering overfitting or convergence problems. By leveraging the power of ResNet architecture, we were able to achieve improved results compared to other standard CNN architectures such as DenseNet121, DenseNet201, MobileNet, MobileNetV2, NASNet-Mobile, ResNet50 and VGG19 [3]. Our model demonstrates the effectiveness of the ResNet architecture in solving real-world problems such as disease diagnosis in plants. We used ResNet-18, a convolutional neural network for image classification. Its summary is represented in Fig. 5.The implementation is in TensorFlow 2.x and follows the best practices for building a deep neural network. We defined architecture of the network in two main classes: ResnetBlock and ResNet18. The ResnetBlock class implements a single block of the ResNet architecture, which consists of two convolutional layers followed by batch normalization and activation functions. The activation function used in this implementation is the swish activation function. The ResNet18 class is the main model, where the ResNet architecture is defined. The architecture consists of a convolutional layer, batch normalization, a pooling layer, and eight instances of the ResnetBlock. The output of the ResNet18 is then fed into a global average pooling layer, followed by a fully connected layer that outputs the predictions for the given image. Layers and Operations Mentioned in the Architecture 1. Conv2D: Conv2D layer is employed in convolutional neural networks (CNNs) for carrying out image classification tasks. It performs convolution operation on the input image by applying filters/kernels. These filters extract important features from the input image, and the output of the Conv2D layer is a feature
536
J. G. Renuka et al.
Fig. 5 Model summary of proposed Resnet architecture
map. The size and number of filters in a Conv2D layer can be adjusted to increase or decrease the depth of features extracted from the input. 2. Batch Normalization: Batch normalization is a technique used in deep learning to normalize the inputs to a layer. It helps in improving the stability and reducing internal covariate shift of the model. This normalization is performed by normalizing the inputs to have zero mean and unit variance. Batch normalization is
Plant Disease Detection Using Fine-Tuned ResNet Architecture
537
applied to each mini batch of the training data to improve the performance and speed up the convergence of the model. 3. SeparableConv2D: The SeparableConv2D layer in deep learning is a type of 2D convolution operation that separates the spatial processing and the depthwise processing. This layer divides the standard convolution operation into two separate operations, allowing for more efficient computation and reduced number of parameters. This layer is especially useful for large and deep convolutional neural networks, where it can reduce computational complexity and memory usage. It can be implemented using the SeparableConv2D class in popular deep learning frameworks like TensorFlow and Keras. 4. Pooling layers: A pooling layer in a deep learning model is a type of layer that performs a down-sampling operation. The main purpose of pooling layers is to reduce the spatial dimensions of the feature maps, while retaining the most important information. This not only reduces the memory requirement of the network, but also makes the network less prone to overfitting. In the code, we have used two types of pooling layers: Max Pooling and Average Pooling. Max Pooling: The Max Pooling layer is used in convolutional neural networks (CNNs) for down-sampling of the input feature maps. This layer partitions the input into non-overlapping rectangular pooling regions, and for each region, it selects the maximum value of the corresponding elements in the feature maps. The result of this operation is a reduced feature map with smaller spatial dimensions but maintained spatial hierarchies. Max pooling helps to reduce the number of parameters, computation cost, and prevent over-fitting in CNNs. Example for max pooling is mentioned in Fig. 6. Average Pooling: The average pooling layer is a type of pooling layer commonly used in convolutional neural networks (CNNs). It reduces the spatial dimensions of a 3D tensor by taking the average of each feature map and producing a 1D tensor with the same number of channels as the input. This layer helps in reducing overfitting by reducing the number of parameters in the model and also helps in
Fig. 6 Max pooling layer of CNN
538
J. G. Renuka et al.
making the model more computationally efficient. The output of this layer is used as input for the final dense layer for classification or regression. 5. Flattening: The flattening operation is used to convert a multi-dimensional tensor into a one-dimensional tensor. This operation is typically used as a pre-processing step before passing the data to a fully connected layer. The flattening operation simply unrolls the input tensor into a long vector, which is then used as input to the fully connected layer. The Flattening operation is an important component in many deep learning models, as it allows the data to be transformed into a format that is suitable for processing by the fully connected layers. 6. Dense: The dense layer in deep learning is used to fully connect the output of one layer to another. It adds a dense connection between all the neurons in the previous layer to all the neurons in the current layer. The dense layer helps in learning complex relationships between the inputs and outputs. It also helps in reducing the dimensionality of the data before passing it through the final layer for prediction. (D) Swish Activation Function The Swish activation function is a self-gated activation function that was introduced as an alternative to ReLU, which has been the most widely used activation function in deep learning. It was first proposed by Google researchers in the paper “Searching for Activation Functions” [8]. The Swish function is defined as follows. f (x) = x ∗ sigmoid(x) sigmoid(x) =
1 (1 + e−x )
The use of the Swish activation function has been shown to improve the performance of deep learning models compared to ReLU in several studies [9]. In the ResNet architecture, the use of the Swish activation function can help to train deeper networks by allowing the model to learn more complex representations. (E) Fine-Tuning Fine-tuning is a process of adapting a pre-trained deep learning model to a new task by using its learned features as a starting point. In our methodology, we utilized fine-tuning on a pre-trained ResNet model for the task of detecting plant diseases in images. The pre-trained ResNet model was trained on a large dataset, allowing it to learn high-level features that can be applied to new tasks. During fine-tuning, we fine-tuned the model on our plant village dataset by unfreezing some of the layers and training them while keeping others frozen. This approach allowed us to leverage the knowledge learned by the model on the large dataset while still being able to make task-specific adjustments to improve performance. By fine-tuning the model, we were able to achieve better results compared to training a model from scratch.
Plant Disease Detection Using Fine-Tuned ResNet Architecture
539
We used an optimizer with a cyclical learning rate for fine-tuning a deep learning model. The optimizer used is the Adam optimizer with a learning rate that is defined by the “CyclicalLearningRate” function. The learning rate will cycle between of 3e-7 and the 0.001, with each cycle having a step size equal to the lengthof the training set. The scale function is defined as a lambda function that returns 1/ 2(x−1) , where x is the current cycle. The learning rate will be scaled using this function in a cyclical Manne. Finally, the model is compiled with categorical crossentropy loss and the optimizer, and the accuracy metric is also being tracked.
4 Results and Discussion In our research, we conducted a thorough review of various approaches to detect plant diseases from leaves. After careful consideration, we chose to use the ResNet architecture as our base model. This was due to its superior performance compared to other methods. However, we took it a step further by fine-tuning the ResNet architecture to further improve its accuracy. Our aim was to build a more robust and efficient model for plant disease detection. The fine-tuning process involved adjusting certain parameters and layers in the pre-trained ResNet18 architecture to fit our specific problem of plant disease detection. The results of our experiment showed that the fine-tuned [13] ResNet18 architecture performed better than the base model in terms of accuracy, achieving a significant improvement. We evaluated the performance of our model and compared it with other models that have been used in the field of plant disease detection. Our findings suggest that the fine-tuned ResNet18 architecture has great potential for practical applications and can be used as a reliable tool for plant disease detection (Figs. 7 and 8).
Fig. 7 Accuracy versus epoch graph of proposed model
540
J. G. Renuka et al.
Fig. 8 Loss versus epoch graph of proposed model
We utilized a comprehensive dataset to enhance the accuracy of our model. To achieve this, we employed data augmentation techniques. As a result of these efforts, our model achieved a training accuracy of 99.89% and a testing accuracy of 99.68%, with a loss of 0.0134, recall of 0.996, F1 score of 0.9965 and precision of 0.9967. To further develop this model, we plan to integrate a user-friendly interface and add valuable features, such as suggestions for appropriate pesticides and in-depth information about the detected diseases.
5 Conclusion and Future Scope The objective of our study was to develop a highly accurate model for detecting plant diseases from leaves. To achieve this goal, we conducted a comprehensive analysis of existing methods and proposed a solution based on fine-tuning the ResNet architecture. Our model was able to achieve an accuracy of 99.65%, making it a valuable tool for developing practical applications for farmers. However, there is still room for improvement and future work can be done to increase the accuracy even further. This can be achieved by fine-tuning more advanced models or incorporating new techniques. Additionally, efforts can be made to make the model lighter and more versatile, allowing for the prediction of diseases in plants not included in our current dataset. Our study represents a significant step toward developing useful and practical applications for the agriculture sector.
Plant Disease Detection Using Fine-Tuned ResNet Architecture
541
References 1. https://www.mdpi.com/20738994/14/9/1932/htm#:~:text=CNN%2C%20hybrid%20CNN% 20with%20ResNet,processed%20in%20the%20proposed%20work 2. Zeng Q, Ma X, Cheng B, Zhou E, Pang W GANs-based data augmentation for citrus disease severity detection using deep learning. https://ieeexplore.ieee.org/document/9200543/authors# authors 3. Barburiceanu S, Meza S, Orza B, Malutan R, Terebes R 2021 Convolutional neural networks for texture feature extraction. Applications to leaf disease classification in precision agriculture. https://ieeexplore.ieee.org/document/9627678/authors 4. Yuan Y, Xu Z, Lu G (2021) SPEDCCNN: spatial pyramid-oriented encoder-decoder cascade convolution neural network for crop disease leaf segmentation. https://ieeexplore.ieee.org/doc ument/9328499/authors 5. Kumar M, Kumar A, Palaparthy VS (2020) Soil sensors-based prediction system for plant diseases using exploratory data analysis and machine learning. https://ieeexplore.ieee.org/doc ument/9301331/authors 6. Nie X, Wang L, Ding H, Xu M (2019) Strawberry verticillium wilt detection network based on multi-task learning and attention. https://ieeexplore.ieee.org/document/8908746/authors 7. Patil RR, Kumar S (2022) Rice-fusion: a multimodality data fusion framework for rice disease diagnosis. https://ieeexplore.ieee.org/document/9672157/authors 8. Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941 9. Hendrycks D, Gimpel K (2019) A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1902.04615 10. Jiang P, Chen Y, Liu B, Dongjian CL (2019) Real-time detection of apple leaf diseases using deep learning approach based on improved convolutional neural networks. https://ieeexplore. ieee.org/document/8706936/authors#authors 11. Yang G, Chen G, He Y, Yan Z, Guo Y, Ding J (2020) Self-supervised collaborative multinetwork for fine-grained visual categorization of tomato diseases. https://ieeexplore.ieee.org/ document/9264241/authors#authors 12. Khan MA, Lali MIU, Sharif M, Javed K, Aurangzeb K, Haider SI, Altamrah AS, Altamrah AS (2019) An optimized method for segmentation and classification of apple diseases based on strong correlation and genetic algorithm based feature selection. https://ieeexplore.ieee.org/ document/8675916/authors#authors 13. Santhiya S, Sivaprakasam S (2020) Deep learning-based plant disease recognition using Resnet 50 architecture. J Ambient Intell Humaniz Comput 11(1):161–171
Data Mining Approach in Predicting House Price for Automated Property Appraiser Systems Naeem Th. Yousir, Shaymaa Mohammed Abdulameer, and Salama A. Mostafa
Abstract The house price prediction model can be a logistic tool in assisting individuals and companies to determine the cost of a property or a house on sale and the best time to acquire a house. With the ever-increasing in house price rates, data mining approaches have been used as a property appraiser system to predict a house price based on house features before making a deal. Existing research in house prediction involves collecting a number of house price estimation attributes into a dataset of house price features. This study is aimed at building a model for predicting house price based on the house price features dataset. Then dataset is run through machine learning (ML) algorithms to learn the pattern and predict new house price cases. The cross-industry standard process for data mining (CRISP-DM) method is used for this research. Decision tree regression (DTR), linear regression (LR), multiple linear regression (MLR), and random forest regression (RFR) are the ML algorithms applied in this research. The results have shown that RFR is the most suitable algorithm for this dataset, with an average coefficient of determination (R square) of 82.09%, followed by the DTR with an R square score of 73.32% when using different training and learning splitting ratios. An application of the RFR model can be used in real estate companies for estimating the prices of properties. Keywords House price prediction · Machine learning (ML) · Linear regression (LR) · Multiple linear regression (MLR) · Decision tree regression (DTR) · Random forest regression (RFR)
N. Th. Yousir · S. M. Abdulameer College of Information Engineering, Al-Nahrain University, 10072 Baghdad, Iraq S. A. Mostafa (B) Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, 86400 Johor, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_45
543
544
N. Th. Yousir et al.
1 Introduction One of the most important goals in a person’s life is buying a house. As we have known for a long time, the dealing price is always determined by the agency or seller. As a result, most people rely on third-party efforts to avoid ambiguous dealing prices [1]. The development of a model for estimating the price of houses can significantly bolster the process of formulating the price of houses and improving the future estimation policy’s accuracy of real estate [2]. House costs rise consistently, requiring the improvement of a strategy to figure out future house estimations of property appraiser applications. Housing price forecasting is critical in finance, investments in real estate, and urban planning and development [1]. The availability of properties or house price datasets is a dataset that informs both sellers and buyers to help in estimating a home’s selling price and can aid a customer in determining the optimum time to buy a house. In budgeting and marketing strategies, people take caution in searching for a new house. Real estate is a huge business, with the involvement of lots of people in regulation, investment, and private corporations [3, 4]. Here, there exists a significant requirement for deep rooted understanding within these stakeholders of the mechanisms in operation and driving causes in the industry. Property cost forecasting can help to determine the sale price of a house and can also help clients determine the optimal period to acquire a house [3]. House concept (design), location, and physical condition are the three main variables that affect a property’s price. Predicting house prices may assist consumers when determining the cost of buying a property in a certain location and the optimal period to purchase a home. Forecasting house prices is crucial for the real estate industry [4]. The literature endeavors to glean relevant information inherent in historical property market data. In many countries, machine learning (ML) methods are utilized to evaluate prior real estate trades in order to generate models that are useful to property buyers and marketers [2, 5]. The wide disparity in housing prices between states and counties’ most and least expensive suburbs has been highlighted. Experiments further show that methods that include ML algorithms like artificial neural networks (ANNs), decision trees (DTs), random forests (RFs), and support vector machines (SVMs) can provide acceptable house prediction results [5–7]. This study has performed the house price prediction by using four different algorithms, which are decision tree regression (DTR), linear regression (LR), multiple linear regression (MLR), and random forest regression (RFR). The following is an organization of the remainder of this paper. Section 2 is composed of reviews of several studies associated with house price prediction. In Sect. 3, the CRISP-DM methodology utilized in carrying out the data mining task on the dataset, including the evaluation parameters or metrics is presented. The results are in Sect. 4, and lastly, the conclusion and some recommendations for future work are presented in Sect. 5.
Data Mining Approach in Predicting House Price for Automated …
545
2 Related Work A lot of work is being done to train models in capturing patterns or relationships in datasets and predict possible future outputs. Many experiments have been conducted to determine the best algorithm for price prediction in this situation [7]. Because there are so many home property adverts on the internet, various researches may be conducted by utilizing them. The tabulation and analysis of this data, which may then be utilized for projecting trends and helping in a beneficial decision-making process, provides a significant benefit for developers. While many articles focus on the benefits and drawbacks of using a specific algorithm or technique, others carry out performance comparison of various algorithms to determine which the best is. In improving prediction, ML integrates computer science and statistical analysis. Within the context of human–computer interaction, high-value predictions can be achieved [3, 4]. It aids in the prediction of uncertain situations using data. There are two types of ML algorithms: supervised and unsupervised. Supervised algorithms employ labeled data in learning and creating results, whereas labeled data is not required in unsupervised in learning. It was simply utilized in the data source to obtain inference. Evaluating time series data can be performed by regression models [5]. Thakur et al. [2] performed regression trees that may be used with both continuous and categorical input variables. Regression trees are a study of several machine algorithms to solve the regression problem, with the decision tree method yielding the lowest loss. The decision tree has an R square value of 0.988, showing that it is a good model. The collected information is placed into ML. We will basically use K-fold cross-validation and the grid search technique to identify the optimum process and metrics for the model. The linear regression model, with scoring of more than 80%, represents the best results for the data, which is not that bad. Ho et al. [6] built an estimation model for house pricing using the random forest and linear regression. In comparison with the benchmark linear regression model, the random forests model captures more hidden nonlinear relationships between house prices and features, thereby, providing a better overall estimation. This discovery is supported by numerical experimentation performed on North Virginia house price data. In general, a linear regression model is a method for statistical analysis that employs regression analysis in mathematical statistics to determine the quantitative association between two or more variables. A least-squares function called a linear regression equation is used to form a model of the relationship between one or more variables (be it dependent or independent). Furthermore, the random forests model is an ensemble method of decision trees that is poor in learning and easily over-fits. The problem of decision tree uncertainty and high variability can be subdued by assembling the decision tree. It creates a model for predicting by solely training weak prediction models, typically decision trees. Using a model averaging method, the predictions of each tree are put together. In terms of R square, random forest generally outperforms the benchmark linear regression model. It is divided into four features testing and training sets. The first one is the year built and lot size features,
546
N. Th. Yousir et al.
which produce 64% for the random forest approach and 34% for the linear regression algorithm approach. Next is zip code features, latitude, longitude, lot size, and year built which have produced the result of 67% for the random forest and 41% for linear regression. The features of latitude, longitude, year built, and lot size have produced% for the random forest and 405 for linear regression. Lastly, 70% of random forest and 54% of linear regression results have been produced in the features of zip code, latitude, longitude, year built, and lot size. In conclusion, it is found that random forests compared to a standard linear regression could obtain the nonlinear hidden links between house location and price and give a better overall estimation. Winki et al. [7] applied the random forest algorithms to assess property values. Random forest is used to examine a sample of data collected over more than 18 years’ period consisting of approximately 40,000 housing trades in Hong Kong. Random forest runs a number of several trees which are combined into a single model to predict more accurately as would a single tree. It also builds several decision trees during training, and the combination of these predictions from all trees are utilized in making the resultant prediction. A random forest method, which uses random sampling with replacement (bagging in ML terminology), assists data scientists in reducing the variability related to high variance algorithms, such as decision trees. They get a sequence of examples for each tree randomly sampled replacements from the training set. Their base model, which employs random forest, has an R square as high as 0.89690. This study demonstrates that random forest performs equally well in explanatory power. Some studies have shown that RF has comparable predictive power and is nearly as useful in forecasting. Furthermore, in another study comparing linear regression to forecasting daily lake water levels, the random forest was found to produce the best results. As a result, they have shown that advanced ML algorithms can predict property prices very accurately, as measured by performance metrics. Their main conclusion is that random forests can produce accurate price estimates with lower prediction errors. Mau et al. [8] have proposed a method using multivariate linear regression integrated with geographic feature for predicting the price of houses. The zip code, in particular, is selected as an added geographic features due to its ease of acquisition. The integrated features are then employed in learning the multivariate linear regression model. Using King County areas as a real-world case, they conducted a huge experiment to compare their technique of linear regressions. The outcomes validated their model’s efficacy and superiority. In this article, they build multiple linear regression-based methods for predicting house prices. Rather than delving deeply into the effects of geographic information or surrounding factors on the prices of house, they seek an agent variable that roughly integrates the information of these external features. The result of the linear regression with all features is 76.6%. For linear regression with a dummy variable ‘zip code’, the result produced is 87.4%. Their method outperforms linear regressions on both the test set and tenfold cross-validation in terms of accuracy, according to the results.
Data Mining Approach in Predicting House Price for Automated …
547
3 Methodology The employed methodology in this study is the cross-industry standard process for data mining (CRISP-DM). It is going to be implemented to build the research project. The CRISP-DM is an industry standard predictive analytics methodology. Despite its limitations, CRISP-DM is among the most extensively utilized approaches in the data mining and business analytics industries [8]. The CRISP-DM technique improves the chances of success in analysis business or data mining projects. As a result, CRISP-DM is an effective technique for this domain [9]. The life cycle model of this methodology is in six stages, with the most essential and common interdependencies are shown using arrows. The stages are not arranged in a particular order. To a large extent, projects alternate among stages as the need arises. The CRISP-DM is a flexible model and can be easily customized. Instead of modeling, work centers on data exploration or probing and visualization in identifying problematic patterns in financial data. CRISP-DM empowers a data mining model development that suits one’s particular needs. • Business Understanding: In this phase, the importance of the dataset to the business is determined by studying the dataset objectives and requirements. • Data Understanding: Understanding the data provided is done in this phase, in addition to verifying its quality and also examining properties and features. • Data Preparation: Data preparation or preprocessing is conducted in this phase, from cleaning and dropping data variables to formatting and checking the correlation between values. • Modeling: Selecting the model techniques and algorithms to be used are determined in this phase, in addition to splitting rations moreover building the model using Python. • Evaluation: After getting the results, the model is evaluated to check if it meets the expected criteria and the correctness of the executed results. • Deployment: In this phase, a final report is produced containing all the results.
3.1 Dataset The dataset is composed of the selling prices of residences in King County, as well as in Seattle for homes sales between the periods of May 2014 to May 2015. It’s a great dataset for trying simple regression models to test. The dataset covers property prices from King County, a region in Washington State, as well as Seattle. The dataset was gathered using Kaggle. The dataset contains 11 variables and 21,613 observations. Based on the facts presented, the purpose was to anticipate property prices. The goal variable was the variable ‘price’, and the remaining variables were the feature variables that are utilized to estimate the target variable. All the columns have numeric data, making linear regression significantly simpler.
548
N. Th. Yousir et al.
Kaggle was used to obtain the dataset. It was released on August 25, 2016. Because all the housing data is public, the source is thought to be trustworthy. It’s a well-known dataset with all the necessary attributes. It includes an ID, date, and 18 house attributes in addition to the house price. A 3-row sample is taken with 21 features represented in the columns; the first column is the ID containing random values. The second column is the date represented in the Y/M/D format, and the price represents how much the house was sold. The bedrooms’ column accounts for the total bedrooms present in the house; that same applies to the bathroom’s column, as shown in Table 1. The internal living space of the apartment is represented in the sqft_living column, while the sqft_lot represents the total area of the property land in square feet. Furthermore, the floor represents the number of floors or stories in the house, and there is the waterfront, and the view of this house represents how good the view is, meaning that the view is awful when 0 and great when rated as 4. The condition of the apartment is rated from 1 to 5 depending on how great the condition is. The structure and design of this house are represented in the grade column. The housing interior area or space above ground level is represented in the (sq ft above), whereas the housing interior space below ground level (sq ft basement) represents the size of the basement, while if the basement sq ft is 0, it indicates that there is no dwelling space below ground level. The next column represents the year the house was originally developed or built (yr_built), while the (ye_renovated) represents the year the house was renovated or updated, while 0 means that the house has never been renovated. The zip code column represents the location the house is located, similarly to the latitude and the longitude. The square footage of the nearest 15 neighbors’ housing interior living area is represented in (sq ft living15), while the square footage of the nearest 15 neighbors’ land lots is represented in (sq ft lot 15). Table 1 House sales features Price
Bedrooms
Bathrooms
sqft_living
sqft_lot
Floors
221,900
3
1
1180
5650
1
538,000
3
2.25
2570
7242
2
2
180,000
1
770
10,000
1
Grade
sqft_above
yr_built
Zipcode
Lat
Long
7
1180
1995
98,178
47.5112
122.257
7
2170
1951
98,125
47.721
122.319
6
770
1933
98,028
47.7379
112.233
Data Mining Approach in Predicting House Price for Automated …
549
3.2 Machine Learning Algorithms The algorithms to be implemented in this research are discussed in this section, which contains four ML algorithms. Regression as a statistical analysis technique is used in identifying the link among variables. The association can be discovered between the dependent and independent variables. This can be expressed using probability distribution functions as in Eq. (1) [3]. Y = f (X, β)
(1)
The regression algorithms to be implemented in this study are decision tree regression (DTR), linear regression (LR), multiple linear regression (MLR), and random forest regression (RFR). • LR: It is a linear way for modeling relationships between a scalar response and one or more explanatory variables. The formula is shown in Eq. (2). In Eq. (2), y is an independent variable that is either continuous or definite, and x is a dependent variable that is consistently continuous. It is evaluated using probability distributions, with a particular emphasis on conditional probability distributions and multivariate analysis. y = xβ + ε
(2)
• MLR: It is an extension of LR, but the prediction process involves more than one independent variable or predictor, as shown in Eq. (3). y = (xβ + ε)_1 + (xβ + ε)_2 + · · · + (xβ + ε)_n
(3)
• DTR: DTR represents a map of the possible outcomes for a series of given choices. The dependent variable y is the target variable that it tries to generalize or categorize. The vector x comprises of the features x 1 , x 2 , x 3, until k that are utilized in the prediction exercise, as shown in Eq. (4) [9]. (x, y) = (x1 , x2 , x3 , . . . , xk , y)
(4)
• RFR: For random forests, the commonly used method of bagging to tree learners or bootstrap aggregating technique is applied by the training algorithm, as shown in Eq. (5). Bootstrapping procedure gives better model performance because it reduces the variance without increasing the bias of the model. This implies that in its training set, while a single tree predictions are extremely sensitive to noise, the average of several trees is not, provided that the trees are not correlated [10, 11].
550
N. Th. Yousir et al.
f =
B 1 ∑ f b (x) B b=1
(5)
3.3 Evaluation Metrics R-squared (R2 ), mean-squared error (MSE), root mean-squared error (RMSE), and median absolute error (MAE) are the evaluation metrics applied in the experiments [10]. • R2 : It is a statistical metric in a regression model that assesses the balance of difference demonstrated by an independent variable or variables for a dependent variable. The formula for calculating R square is shown in Eq. (6). ∑( R =1− ∑ 2
yi − yˆ
)2
(yi − y)2
(6)
• MSE: It calculates the distance between a regression line and a set of points. This formula for the calculation of mean-squared error is in Eq. (7). MSE =
n )2 1 ∑( yi − yˆi n i=1
(7)
• RMSE: When a predicting procedure is carried out on a dataset, RMSE is the standard deviation of the occurring errors. The formula for the calculation of RMSE is shown in Eq. (8). ┌ | n | 1 ∑( )2 yi − yˆi RMSE = √ n i=1
(8)
• MAE: It is specifically intriguing because it is robust to outliers. The loss calculation is the median of all absolute variation between the target and the prediction. The formula for the calculation of root mean-squared error is shown in Eq. (9). MAE =
n | 1 ∑|| yi − yˆi | n i=1
(9)
Data Mining Approach in Predicting House Price for Automated …
551
4 Results Analysis and Discussion The implemented models are developed in Jupyter Notebook using Tensor Flow and Python. The implementation is started with importing a related library for prediction, then reading and checking dataset from missing values, and exploring the variable to be predicted (price). Moreover, dropping extreme outliers that will affect the prediction result. Subsequently, exploring features that correlate with the price. sqft_ living showed a high correlation of 0.70, bedrooms showed a 0.31 correlation, which has a huge effect on the price. There are many different features with a high correlation to the price, such as the grade of the house and sqft_above ground with 0.66 and 0.60 scores, respectively, as shown in Fig. 1. After checking the data, data preprocessing is the next step, correcting the date data type, which was converted into a categorical feature. Redundant data such as (ID, date, and zip code) were dropped. The next step was building the model with four different ML algorithms under regression, LR, MLR, RFR, and DTR. The dataset was divided into the training set and test set to 5 different portions 30:70, 40:60, 50:50, 60:40, and 70:30. Each algorithm was run five times to get the evaluated metrics which are R square, MSE, RMSE, and MAE. The experiments’ aim is to make a comparison of the performance of four ML algorithms. Shown in Table 2 are the results of the performance of the algorithms. The outcomes indicate that the RFR algorithm with the ratio of 30:70 has the highest R square equals to 82.83, and that’s because RFR can discover more complex dependencies than LR and MLR and better than DTR due to the multiple decision trees combined.
Fig. 1 Effect of bedrooms on the house price
552
N. Th. Yousir et al.
Table 2 Experimental results of house price prediction Data split (%)
Algorithm
R2
MSE
RMSE
MAE
30:70
LR
47.8363
0.593007
0.243517
0.12636
MLR
69.2683
0.349363
0.186912
0.83806
RFR
82.8345
0.195140
0.139692
0.46804
DTR
74.0153
0.295399
0.171571
0.56659
LR
47.1457
0.588002
0.242487
0.12523
MLR
69.0125
0.344734
0.185670
0.84963
RFR
82.4014
0.195783
0.139992
0.47464
40:60
50:50
60:40
70:30
DTR
74.9783
0.278365
0.166842
0.56535
LR
47.2130
0.579570
0.240742
0.12560
MLR
69.1132
0.339119
0.184152
0.84447
RFR
82.0964
0.196568
0.140203
0.47447
DTR
73.3021
0.293128
0.171209
0.59409
LR
47.4691
0.569143
0.238567
0.12548
MLR
69.3654
0.331903
0.182183
0.84542
RFR
81.7150
0.198107
0.140750
0.48483
DTR
71.9232
0.304196
0.174412
0.60210
LR
47.4213
0.569173
0.238573
0.12425
MLR
69.3312
0.331994
0.182207
0.84430
RFR
81.4395
0.200920
0.141746
0.48708
DTR
72.4199
0.298559
0.172788
0.59966
Figure 2 visualizes the obtained results of the four algorithms in the five data split tests. It shows that the performance of the algorithms has the same result pattern in which RFR has the highest prediction performance and the LR has the lowest prediction performance. With the RFR algorithm, the predicted price for the first house in our dataset using the R square equals to 82.83 was 212,058, and the real price is 221900 with a 9842 difference, as shown in Fig. 3.
Data Mining Approach in Predicting House Price for Automated …
553
Fig. 2 R square in each split for each algorithm
Fig. 3 Predicted price of the RFR versus real price
5 Conclusion This study applies data mining technology to establish prediction using four different algorithms, which are decision tree regression (DTR), linear regression (LR), multiple linear regression (MLR), and random forest regression (RFR). Several experiments have been conducted to comparing them with different data splitting ratios in order to determine the best prediction algorithm. The results from the experiment show that the RFR algorithm has the highest R square than LR, MLR, and DTR. The RFR predicted results are a bit closer to the actual price with an average MSE of 0.1973 and R square of 82.09%. In the future, this study hopes to delve into new dataset of properties price prediction. In addition, we will look into other machine learning and deep learning algorithms for predicting house price with the aim of getting better results. Acknowledgements This paper is supported by Universiti Tun Hussein Onn Malaysia (UTHM).
554
N. Th. Yousir et al.
References 1. Bangura M, Lee CL (2020) House price diffusion of housing submarkets in Greater Sydney. Hous Stud 35(6):1110–1141 2. Thakur A, Satish M (2021) Bangalore house price prediction. Int Res J Eng Technol (IRJET) 8(9):193–197 3. Rana VS, Mondal J, Sharma A, Kashyap I (2020) House price prediction using optimal regression techniques. In: 2020 2nd international conference on advances in computing, communication control and networking (ICACCCN). IEEE, pp 203–208 4. Coskun Y, Seven U, Ertugrul HM, Alp A (2020) Housing price dynamics and bubble risk: the case of Turkey. Hous Stud 35(1):50–86 5. Xu X, Zhang Y (2021) House price forecasting with neural networks. Intell Syst Appl 12:200052 6. Wang C, Wu H (2018) A new machine learning approach to house price estimation. New Trends Math Sci 6(4):165–171 7. Ho WK, Tang BS, Wong SW (2021) Predicting property prices with machine learning algorithms. J Prop Res 38(1):48–70 8. Mao Y, Yao R (2020) A geographic feature integrated multivariate linear regression method for house price prediction. In: 2020 3rd international conference on humanities education and social sciences (ICHESS 2020), pp 347–351. Atlantis Press 9. Martínez-Plumed F, Contreras-Ochando L, Ferri C, Flach P, Hernández-Orallo J, Kull M, Lachiche N, Ramírez-Quintana MJ (2019) CRISP-DM twenty years later: From data mining processes to data science trajectories. IEEE Trans Knowl Data Eng 33(8):3048–3061 10. Jasni NH, Mustapha A, Tenah SS, Mostafa SA, Razali N (2022) Prediction of player position for talent identification in association netball: a regression-based approach. Int J Adv Intell Inf 8(1):84–96 11. Chen JR, Lin YH, Leu YG (2017) Predictive model based on decision tree combined multiple regressions. In: 2017 13th international conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE, pp 1855–1858
IPSO-SMOTE-AdaBoost: An Optimized Class Imbalance Strategy Using Boosting and PSO Techniques Sarvani Anandarao, Polani Veenadhari, Gudivada Sai Priya, and Ginjupalli Raviteja
Abstract The class imbalance is challenging issue in machine learning and data mining especially health care, telecom sector, agriculture sector, and many more (Zhu et al. in Pattern Recogn Lett 133:217–223, 2020; Thabtah et al. in Inf Sci 513:429–441, 2020). Imbalance of data samples across classes can arise as a result of human error, improper/unguided data sample selection, and so on (Tarekegn et al. in Pattern Recogn 118:107965, 2021). However, it is observed that applying imbalanced datasets to the data mining and machine learning approaches, it retains the biased in results which leads to the poor decision-making ( Barella et al. in Inf Sci 553:83– 109, 2021; Zhang et al. in ISA Trans 119:152–171, 2021; Ahmed and Green in Mach Learn Appl 9:100361, 2022). The primary motivation for this research is to explore and develop novel ensemble approaches for dealing with class imbalance and efficient way of retrieving synthetic data. In this paper, an ensemble method called IPSO-SMOTE-AdaBoost is developed to solve the class imbalance problem by combining the synthetic minority oversampling technique (SMOTE) (Gao et al. in Neurocomputing 74:3456–3466, 2011; Prusty et al. in Prog Nucl Energy 100:355– 364, 2017), improved particle swarm optimization (PSO) (Yang et al. in J Electron Inf Technol 38:373–380, 2016), and AdaBoost. AdaBoost combined with SMOTE provides an optimal set of synthetic samples, thereby modifying the updating weights and adjusting for skewed distributions. The typical AdaBoost approach, on the other hand, consumes far too many system resources to avoid redundant or ineffective weak classifiers. With the proposed ensemble framework, IPSO-SMOTE-AdaBoost, parameters can be re-initialized to counter the concept of local optimum as well with the SMOTE that is boosted with AdaBoost method. The proposed method is validated using three datasets on six classifiers: extra tree (ET), naive Bayes (NB), random forest (RF), support vector machine (SVM), decision tree (DT), and Knearest neighbor (KNN). After that, the IPSO-SMOTE-AdaBoost is compared to S. Anandarao (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] P. Veenadhari · G. S. Priya · G. Raviteja Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_46
555
556
S. Anandarao et al.
the existing SMOTE-PSO. The evaluation of proposed work is done with measures, namely accuracy, precision, recall, sensitivity, and F-score, and result shows that the proposed technique outperformed the usual PSO and SMOTE variations. Keywords Class imbalance · SMOTE · PSO · Naive Bayes · K-nearest neighbor
1 Introduction Many machine learning techniques used to solve classification problems suppose that the classes are properly balanced [1–5]. However, in practice, many applications such as medical diagnosis, credit card fraud, malicious app identification, cancer detection, and many more datasets are imbalanced, which challenges classic classification algorithms’ assumptions. Imbalance happens in uncommon cases [6], for example, in medical diagnostics, the diseased patient will be minimal in comparison with the non-cancerous patient. Because the minority class is small, the classifier will struggle to make predictions. When the data is uneven or the class distributions are skewed, the predictive potential of minority classes is compromised, and it performs poorly for minority classes. There are numerous ways for rebalancing the data distribution when training the model. Reweighting and resampling are common solutions. The reweighting approaches concentrate on the costs of different classes, with a particular emphasis on the costs of minority classes and less on majority classes [7]. Resampling approaches directly alter the training data by repeating minority class instances and eliminating some majority class instances [8]. By altering the number of instances during training, resampling approaches try to obtain a more balanced data distribution. This adjustment is primarily at the class level, and it includes the following two types: oversampling, undersampling, and class-balance sampling methods. By repeating or interpolating [9], oversampling increases the number of examples of minority classes. Undersampling [9] reduces the proportion of majority classes by removing certain instances of majority classes. A few prominent resampling approaches are random oversampling of the minority class, undersampling of the majority class, and some state-of-the-art synthetic sampling approaches that endeavor to rebalance class distribution at the data level (Fig. 1). These balance solutions, however, have some drawbacks. Undersampling, for example, is unavoidably associated with information loss, but oversampling via random repetition of the minority class sample typically results in very specific rules, culminating in overfitting [8]. The synthetic minority oversampling method (SMOTE) [9] produces samples of minority classes in the sample space using linear interpolation; however, the sample space of the majority class is frequently invaded by the newly generated samples. The intrusive sample as well exhibits impact on the subsequent data processing [10], which will in turn influence classification performance.
IPSO-SMOTE-AdaBoost: An Optimized Class Imbalance Strategy …
557
Fig. 1 Resampling methods
AdaBoost [11] is a boosting method that is commonly utilized in processing skewed data. As a poor classifier, it employs a single-layer decision tree. Furthermore, it may yield a good number of weak classifiers and results in performance degradation. Through randomly assigning points on the line between minority class samples and their nearest neighbors, SMOTE-AdaBoost produces minority class samples without redundancy. Finally, the classifiers are trained using the ensemble learning technique. In order to improve method efficiency by decreasing system resources and time overhead, SMOTE-AdaBoost was integrated with the improved PSO approach. The total approach attempts to produce synthetic minority classes, train with strong classifiers, and keep global search active. Applying update operations for the best particle and other particles in the population can assist the algorithm in escaping the local optimum and maintaining population diversity. The following is how this article is structured: Sect. 2 contains related work. Section 3 explains the procedures and materials. Section 4 examines the method proposed by the present study. Section 5 demonstrates the comparative experiments. Section 6 offers the summary of the algorithm proposed by the study as well as future development.
2 Relevant Work To overcome the inaccurate data distribution, researchers offered different approaches for rebalancing data samples within minority and majority classes. This section briefly addresses various oversampling procedures for dealing with data imbalance by boosting the SMOTE with optimization mechanisms. Mani and Zhang
558
S. Anandarao et al.
[12] presented the SMOM oversampling algorithm, a KNN-based SMOTE, in which the extra cases will be added to the minority class to balance the data. Prusty et al. [10] proposed a revised version of SMOTE (weighted, WSMOTE), in which the minority is generated adding weight to the minority classes. The estimation of the weight used in this study is Euclidean distance. Finally, compared WSMOTE with the standard and produced better results in F-measure benchmark. Borderline SMOTE [13] that blends the SMOTE with information about the boundary samples is put forward in this connection. Experiments have shown that leveraging information from samples at the border to produce additional samples can increase model performance. Han et al. [14] recently attempted to bring SMOTE to distributed computing systems under Spark for huge dataset jobs. FSMOTE is a novel proposed approach designed by [15] that produces samples by interpolating between minority samples and their k minority class farthest neighbors. Li et al. [13] presented the Easy-SMT technique that blends SMOTE-based oversampling strategy with EasyEnsemble to classify the imbalance problem into balanced learning subproblems. Ding [16] employs the PSO technique to the SMOTE, to improve the imbalance ratio using KNN. Guo et al. [17] presented the BPSO-AdaBoost-KNN approach for multiclass imbalanced data classification, and this algorithm increases the stability of AdaBoost by extracting essential features effectively. By incorporating the genetic algorithm into the AdaBoost algorithm, Chawla et al. [18] suggested an ensemble evolve approach for unbalanced data categorization. Gene evolution and enhanced fitness functions are used to build better classifiers, and unbalanced data classification is optimized throughout evolution. Motivated [17–23] by the performance of the SMOTE and PSO combination, an effort has been made in this work to develop a strategy for generating the synthetic samples as minority classes with the ability of IPSO as well as AdaBoost.
3 Materials and Methods The section discusses the essential approaches for the proposed work, which include SMOTE, AdaBoost, improved PSO, and EV optimization methods.
3.1 SMOTE At the data preprocessing step, the synthetic minority oversampling technique (SMOTE) is a well-known oversampling technique for unbalanced data classification issues [15]. SMOTE is a powerful approach for coping with unbalanced datasets. In training instances, it tends to equalize the number of majority and minority class instants. Let D = (x j , y j ) be the training data with n samples (i.e., |D| = m) and class labels which are represented as, x j = { f 1 , f 2 , . . . , f n }, an individual instance z i ∈ D the n-dimensional feature space, and y j = {1, 2, . . . , c} is class label of x j . Consider
IPSO-SMOTE-AdaBoost: An Optimized Class Imbalance Strategy …
559
the binary classification, where C = 2, and define subsets Dmin ⊂ D and Dmax ⊂ D, where Dmin is the set of minority class instances in and Dmax is the set of majority class examples in D, and so on. Finally, all sets resulting from sampling techniques on D are designated as F, with disjoint subsets Fmin and Fmax reflecting the minority and majority samples of D. Generally, The SMOTE method is defined as linear method in which both minority and majority classes z R and z are considered as a single input and is defined in Eq. (1). D = z + μ ∗ (z R − z)
(1)
0 ≤ μ ≤ 1 and z R is drawn at random from the minority class’s nearest neighbors z . SMOTE is utilized to obtain z and to increase the minority class shown in Eq. (1).
3.2 AdaBoost AdaBoost [11] operates in T rounds, training a weak learner in each round. Following training, the algorithm increases the weight of samples projected to be misclassified and decreases the weight of samples properly classified. As a result, samples that are correctly classified have a lesser probability of being re-used in the next iteration, but samples that are incorrectly classified have a higher chance. AdaBoost accepts as input a set of training samples S = (x1 , y1 ), . . . , (xn , yn ) of size N , where each sample xi is a vector of values for the domain of space X and yi is the label of each sample that belongs to the label space of Y . Weights are initialized and distributed uniformly over the training set at the start of the algorithm, and the weights of each erroneously identified sample are increased in each iteration. From the results, it is observed that, with the training, all the time the weak learner focused on the difficult samples. The weight vector is applied on training samples in the AdaBoost method, and later weight of each sample is updated in each iteration using the weight function and is defined in Eq. (2). Wt+1,i = Wt,i βt1−bi
(2)
In addition, also there is a possibility of estimating error rate ∈ j with the weight of each sample in training data in the concern classifier using Eq. (3). ∈j=
Wt,i∗bi
(3)
i=1
Now, add weights with the proportion of their accuracy in the training phase to the two weak classifiers h 1 , h 2 by generating the strong classifier, H (x).
560
S. Anandarao et al.
3.3 Improved PSO Algorithm The PSO method [15] is used to address optimization issues and is inspired by the behavioral characteristics of biological populations. The objective function defines specific fitness value corresponding to each particle, and every particle travels in distance based on the speed. To improve their fitness value in the solution space is the desired goal of PSO. In a search space N , total particles areN p , an specific position of particle i is z i = [z 1i , z 2i , . . . , z N i ], and νi = [ν1i , ν2i , . . . , ν N i ] is the vector of velocity, respectively. Meanwhile, the particle’s historical best position i is pbi = [pb1i , pb2i , . . . , pb N i ], and its global best position is gbi = [gb1i , gb2i , . . . , gb N i ]. In each iteration, the particle’s position and velocity are updated as follows: νmi (t + 1) = [νmi (t) + c1 ∗ r1 ∗ pbmi (t) − z mi (t)) + c2 ∗ r2 ∗ gbm (t) − z mi (t)) (4) Z mi (t + 1) = Z mi (t) + νmi (t + 1)
(5)
From (4), c1 and c2 are the elements used in the learning process. If c1 = 0, a particle travels with the knowledge of the historical information related to the community. Similarly, if c2 = 0 travels by historical information and neglects the exchange of information among community. The particle is in original location in case of c1 = c2 = 0, respectively. The standard PSO is suffered with issues like prone to poor searchability, degrade diversity of particle, falls into the optimal local solution, low precision, and accuracy in path planning. The improved particle swarm algorithm (IPSO) improves the performance of the classic PSO with the weight strategy. The method is all time focused on update operations related to best particle and other particles of entire community. The new idea helps to escape local optimum as well as sustain population diversity. The major idea is to elect the head among the individuals z i and used to guide the other particles in population. The decision of best particle is done with estimation of fitness value fit(z k ), and later value is compared with individual value in community, respectively. The fit(z k ) function of z i estimation value is maximum fitness and smaller value. The fitness function is given as follows: fit(z k ) =
1 , 1+ f (z)
if f (z) ≥ 0, 1 + abs( f (Z ), else.
(6)
All particles with fitness values greater than z i are grouped as A. After that, the fitness function is used to calculate the probability value prob(k) of each particle z k ∈ A. Following that, the prob(k) and a random number of r values are compared together. If the value of prob(k) is dominated by the r , then the particle in set A at kth position will be the leader to decide the traveling direction of z i .
IPSO-SMOTE-AdaBoost: An Optimized Class Imbalance Strategy …
561
The following Eq. (7) is used to estimate prob(k) of each particle z k ∈ A: prob(k) =
fit(z k ) z t ∈A fit(z t )
(7)
The above modification related to the PSO allows the best particle selection, in turn which can improve the performance and retain the results from the absence of the local optimum. Now, updated the velocity operation by the best parcel as leader z j ∈ A using the fitness and proablity function fit(z k ), respectively, in addition z j updated with the gbest . The following Eq. (8) is used to update the velocity parameter: νmi (t + 1) = [w ∗ νmi (t) + c1 ∗ r1 ∗ pbmi (t) − z mi (t)) + c2 ∗ r2 ∗ (z m j (t) − z mi (t)) (8)
From Eq. (8), w is choosen as very low value, and it is estimated with the following Eq. (9): w = wmax −
wmax − wmin iter Itmax
(9)
From (9), wmax = maximum wight value, wmin = minimum weight values, iter = current iterations, and Itmax = the maximum iterations. The following formula is used to update the position of Z i based on the above analysis: Z mi (t + 1) = Z mi (t) + νmi (t + 1)
(10)
4 Proposed Method:IPSO-SMOTE-AdaBoost The work in this study designed an ensemble IPSO-SMOTE-AdaBoost, and the basic principle of approach is to derive the best synthetic samples to optimize the set of synthetic samples z smote of minority class that leads to the better classification result. Figure 2 describes the complete idea of proposed method, respectively. The proposed method IPSO-SMOTE-AdaBoost is described in detail with the four different steps, and each one is described as follows. The first step 1–2 related to the generation of the synthetic minority classes using SMOTE in phase I. The phase II focused about the re-assignment of weights to the weak classifiers by the AdaBoost which is covered in step 3–4. At last step, strong classifier is optimized using IPSO and trained the data which is defined in phase III. The proposed method is illustrated in terms of steps detailed in the following section:
562
S. Anandarao et al.
Fig. 2 Process of proposed method: IPSO-SMOTE-AdaBoost
Step 1: The complete idea is described. From the principle, it is observed that first applied training data T , from that symmetric minority classes z smote , are retrieved using the SMOTE method with KNN. Assigned result of z smote as Tsmote_ min = {z smote }. Step 2: Split the training data T into major and minority classes as maj_class = Tmaj and min _class = Tmin , and then new minority classes is estimated as Tnew = (Tsmote_ min, Tmaj ). Step 3: From the Tnew estimate Tnew_select based on the particle pp , for each k features in Tnew_select , assign normalize weights wn . Then train Tnew_select using weak classifier h k (). Step 4: Optimize weak classifier h k () with IPSO method and return results as strong j classifier H (k). Then update lbest of p j and gbest . Also adjust velocity and position. Step 5: Finally, Tfinal select as minority classes from the gbest . Now, train Tfinal using H (k) and predict the test label.
5 Results and Discussion The section describes the datasets used in the complete study, evaluation metrics, and results and discussion.
IPSO-SMOTE-AdaBoost: An Optimized Class Imbalance Strategy …
563
Table 1 Imbalanced datasets used in the study Real dataset
Total observation
Maj_Class
Min_Class
IR
Vehicle
846
647
199
3.25:1
Ionosphere
351
225
126
1.79:1
Statlog
2310
1980
330
6:1
5.1 Test Dataset In this study, three real-time imbalance datasets [16, 23] are considered to test the proposed method. The datasets are, namely Vehicle, Ionosphere, and Statlog which were collected from the repositories UCI. The complete details about the datasets are shown in Table 1. From the table, Maj_class, Min_class, and IR indicate majority class, minority classes, and imbalance ratio. The Stalog dataset had the two class labels with the imbalance ratio (IR) of 6:1 of total 2310 samples. The remaining two datasets had IR 3.25:1 and 1.79:1, respectively.
5.2 Evaluation Metrics To evaluate the performance of the proposed method IPSO-SMOTE-AdaBoost, there is a specific need of desired evaluation measures. The study is considered following measures which are, namely accuracy, recall, precision, specificity, F-score, and so on. The basis for any evaluation measure is the confusion matrix and is composition of the four values which are true positive TP, false negative FN , true negative TN , and false positive FP. The complete evaluation measures are described as follows: TP + TN TP + TN + FP + FN TP Precision (Pr ) = TP + FP TP Recall (Rc ) = TP + FN TP Sensivity (Sn ) = TP + FN TN Specificity (Sp ) = TN + FP 2 ∗ Pr ∗ Rc F - measure (Fm ) = Pr + Rc Accuracy (Acc) =
564
S. Anandarao et al.
5.3 Performance Analysis of SMOTE Algorithm with IPSO The section describes about the performance analysis of SMOTE algorithm with IPSO on three real imbalanced datasets with 6 classifiers, namely ET, NB, RF, SVM, DT, and KNN. The first dataset is Vehicle dataset and is divided into two sets training and test sets with the proportion of 80% and 20%, respectively. The improved PSO (ISO) algorithm with SMOTE is an ensemble classification method, in which training is done with six different classifiers on the Vehicle dataset. The performance analysis is shown in Table 2, and the accuracy achieved by the NB, 0.8691, is dominated by a other methods. The ET and DT accuracy values of 0.882 and 0.8652 are close to the performance of the NB. However, RF and SVM are producing the marginal accuracy than the KNN. The precision value, 0.8102, achieved by the ET is superior to all other methods, and only NB precision value of 0.8057 is closer to the ET. However, the nominal precision values derived from the RF, DT, SVM, and KNN are producing lower precision value of 0.6660 than all other methods. Similar kind of performance is observed in recall, F1, and sensitivity. In case of recall, the RF retained better value of 0.8385 than all other methods. The negligence value 0.7024 is achieved by the KNN. The case in F1 and sensitivity NB is dominated by values of 0.7864 and 0.8364 compared to other methods. However, KNN performed poor in all methods with respect to the F1 and sensitivity measures. The second dataset is Ionosphere dataset and is divided into two sets training and test sets with the proportion of 80% and 20%, respectively. The improved PSO (ISO) algorithm with SMOTE is an ensemble classification method, in which training is done with six different classifiers on the Ionosphere dataset. The performance analysis is shown in Table 3, and the accuracy achieved by the ET0.8081 is dominated by other methods. The RF accuracy value of 0.8077 is close to the performance of the NB. However, DT and SVM produce the same values, and NB is producing marginal value 0.8028 and 0.8049, respectively, than the KNN. The precision value, 0.8102, achieved by the RF is superior to all other methods, and ET and SVM precision values of 0.7656 and 0.7211 are closer together. However, the nominal precision values are derived from the DT and KNN methods with values of 0.6646 and 0.6979. The NB method is producing lower precision value of 0.6043 than all other methods. Similar Table 2 Performance analysis of SMOTE-IPSO in Vehicle dataset Algorithm
Accuracy
Precision
Recall
F1
Sensitivity
ET
0.8682
0.8102
0.7633
0.7832
0.8319
NB
0.8691
0.8057
0.7746
0.7864
0.8364
RF
0.8593
0.7350
0.8385
0.7813
0.8521
SVM
0.8572
0.7911
0.7576
0.7737
0.8202
DT
0.8652
0.7816
0.7971
0.7886
0.8423
KNN
0.8433
0.6660
0.7024
0.7657
0.8444
IPSO-SMOTE-AdaBoost: An Optimized Class Imbalance Strategy …
565
Table 3 Performance analysis of SMOTE-IPSO in Ionosphere dataset Algorithm
Accuracy
Precision
Recall
F1
Sensitivity
ET
0.8081
0.7656
0.7221
0.6683
0.7743
NB
0.8049
0.6043
0.6082
0.6122
0.6956
RF
0.8077
0.8065
0.6691
0.6817
0.7236
SVM
0.8028
0.7211
0.6525
0.6542
0.7133
DT
0.8028
0.6646
0.6707
0.6545
0.7090
KNN
0.8031
0.6979
0.6818
0.6808
0.7165
kind of performance is observed in recall, F1, and sensitivity. In case of recall, the ET retained better value of 0.7221 than all other methods. The negligence value of 0.6082 is achieved by the NB. The case in F1 and sensitivity NB and RF methods is dominating with the values of 0.6817 and 0.7743 compared to other methods. However, KNN and NB performed poor in all methods with respect to most of the measures. The third dataset is Statlog dataset and is divided into two sets training and test sets with the proportion of 80% and 20%, respectively. The improved PSO (ISO) algorithm with SMOTE is an ensemble classification method, in which training is done with six different classifiers on the Statlog dataset. The performance analysis is shown in Table 4, and the accuracy achieved by the ET and DT, 0.8428 and 0.8404, methods are dominated by other methods. The RF and NB accuracy values are close together with values of 0.8137 and 0.8156, respectively. However, SVM produces the least accuracy result than all methods. The precision value, 0.7896, achieved by the RF is superior to all other methods, and ET and NB precision values of 0.7495 and 0.7343 are closer together. However, the nominal precision values are derived from the DT and KNN methods with values of 0.7167 and 0.7128. The SVM method produces a lower precision value of 0.7052 than all other methods. Similar kinds of performance are observed in recall, F1, and sensitivity. In case of recall, the ET retained better value of 0.8225 than all other methods. The negligence value of 0.6325 is achieved by the SVM. The case in F1 and sensitivity NB and RF methods isdominating with the values of 0.7486 and 0.8129 compared to other methods. However, SVM performed poorly in all methods with respect to most of the measures.
5.4 Performance Analysis of the IPSO-SMOTE-AdaBoost Ensemble Algorithm The section describes the performance analysis of IPSO-SMOTE-AdaBoost Ensemble on three real imbalanced datasets with six classifiers, namely ET, NB, RF, SVM, DT, and KNN.
566
S. Anandarao et al.
Table 4 Performance analysis of SMOTE-IPSO in Statlog dataset Algorithm
Accuracy
Precision
Recall
F1
Sensitivity
ET
0.8428
0.7495
0.8225
0.7044
0.8129
NB
0.8137
0.7343
0.7563
0.7486
0.7110
RF
0.8156
0.7896
0.6643
0.6782
0.7102
SVM
0.7992
0.7052
0.6325
0.6236
0.7010
DT
0.8404
0.7167
0.7426
0.7339
0.7023
KNN
0.8326
0.7128
0.7269
0.7417
0.6921
Figure 3 indicates the comparison of various measures, namely accuracy, precision, recall, and F1 values of the IPSO-SMOTE-AdaBoost algorithm on the Vehicle dataset. The improved PSO (ISO) algorithm with SMOTE is an ensemble classification method, in which training is done with six different classifiers on the Vehicle dataset. The performance analysis is shown in table, and the accuracy achieved by the NB, 0.9031, is dominated by another methods. The ET and DT accuracy values of 0.9022 and 0.8992 are close to the performance of the NB. However, RF and SVM produce marginal accuracy than KNN. The precision value, 0.8442, achieved by the ET is superior to all other methods, and only NB precision value of 0.8397 is closer to the ET. However, the nominal precision values derived from the RF, DT, SVM, and KNN produce a lower precision value of 0.7342 than all other methods. Similar kinds of performance are observed in recall, F1, and sensitivity. However, NB performed poor in all methods with respect to the F1 and sensitivity measures. The performance of the proposed method is superior than standard SMOTE-IPSO methods in all bench measures. Figure 4 indicates the comparison of various measures, namely accuracy, precision, recall, and F1 values of the IPSO-SMOTE-AdaBoost algorithm on the Vehicle dataset. The improved PSO (ISO) algorithm with SMOTE is an ensemble classification method, in which training is done with six different classifiers on the Ionosphere dataset. The performance analysis is shown in table, and the accuracy achieved by
Fig. 3 Performance analysis of IPSO-SMOTE-AdaBoost in Vehicle dataset
IPSO-SMOTE-AdaBoost: An Optimized Class Imbalance Strategy …
567
the ET, 0.8861, is dominated by another methods. The RF accuracy value of 0.8857 is close to the performance of the NB. However, DT and SVM produce the closer values, and NB is producing marginal value of 0.8829, respectively, than the KNN. The precision value, 0.8845, achieved by the RF is superior to all other methods, and ET and SVM precision values of 0.8436 and 0.7991 are closer together. However, the nominal precision values are derived from the DT and KNN methods with values of 0.7426 and 0.7759. The NB method produces a lower precision value of 0.6823 than all other methods. Similar kinds of performance are observed in recall, F1, and sensitivity. However, NB performed poor in all methods with respect to the F1 and sensitivity measures. The performance of the proposed method is superior than standard SMOTE-IPSO methods in all bench measures. Figure 5 indicates the comparison of various measures, namely accuracy, precision, recall, and F1 values of the IPSO-SMOTE-AdaBoost algorithm on the Vehicle dataset. The improved PSO (ISO) algorithm with SMOTE is an ensemble classification method, in which training is done with six different classifiers on the Statlog dataset. The performance analysis is shown in table, and the accuracy achieved by the ET and DT, 0.9208 and 0.9184, methods is dominated by other methods. The RF and NB accuracy values close together with values of 0.8917 and 0.8936, respectively. However, SVM produces the least accurate result than all methods. The precision value, 0.8676, achieved by the RF is superior to all other methods, and ET and NB precision values of 0.8275 and 0.8123 are closer together. However, the nominal precision values are derived from the DT and KNN methods with values of 0.7947 and 0.7908. The SVM method produces a lower precision value of 0.7832 than all other methods. Similar kinds of performance are observed in recall, F1, and sensitivity. However, SVM performs poor in all methods with respect to the F1 and sensitivity measures. The performance of the proposed method is superior than standard SMOTE-IPSO methods in all bench measures.
Fig. 4 Performance analysis of IPSO-SMOTE-AdaBoost in Ionosphere dataset
568
S. Anandarao et al.
Fig. 5 Performance analysis of IPSO-SMOTE-AdaBoost in Statlog dataset
6 Conclusions An ensemble method called IPSO-SMOTE-AdaBoost is developed to solve the class imbalance problem by combining the synthetic minority oversampling technique (SMOTE), improved particle swarm optimization (PSO), and AdaBoost. AdaBoost combined with SMOTE provides an optimal set of synthetic samples, thereby modifying the updating weights and adjusting for skewed distributions. In the proposed ensemble framework, IPSO-SMOTE-AdaBoost, parameters can be re-initialized to counter the concept of local optimum as well with the SMOTE that is boosted with AdaBoost method. The proposed method is validated using three datasets on six classifiers: extra tree (ET), naive Bayes (NB), random forest (RF), support vector machine (SVM), decision tree (DT), and K-nearest neighbor (KNN). After that, the IPSO-SMOTE-AdaBoost is compared to the existing SMOTE-PSO. The evaluation of proposed work is done with measures, namely accuracy, precision, recall, sensitivity, and F-score, and result shows that the proposed technique outperformed the usual PSO and SMOTE variations. Furthermore, the SMOTE method successfully exploited the benefits of boosting and PSO to improve the predictive analysis of the class imbalance problem on minority class datasets in particular. Our future research will focus on applying the presented algorithm to the field of gene analysis.
References 1. Lavanya K, Suresh GV (2021) An additive sparse logistic regularization method for cancer classification in microarray data. Int Arab J Inform Technol 18(2). https://doi.org/10.34028/ iajit/18/10. ISSN: 1683-3198E-ISSN: 2309-4524 2. Lavanya K, Harika K, Monica D, Sreshta K (2020) Additive tuning Lasso (AT-Lasso): a proposed smoothing regularization technique for shopping sale price prediction. Int J Adv Sci Technol 29(05):878–886 3. Lavanya K, Reddy L, Reddy BE (2019) Distributed based serial regression multiple imputation for high dimensional multivariate data in multicore environment of cloud. Int J Amb Comput Intell (IJACI) 10(2):63–79. https://doi.org/10.4018/IJACI.2019040105
IPSO-SMOTE-AdaBoost: An Optimized Class Imbalance Strategy …
569
4. Lavanya K, Reddy LSS, Eswara Reddy B (2018) Modelling of missing data imputation using additive LASSO regression model in Microsoft azure. J Eng Appl Sci 13(Special Issue 8):6324– 6334 5. Lavanya K, Reddy LSS, Eswara Reddy B (2019) Multivariate missing data handling with iterative Bayesian Additive Lasso (IBAL) multiple imputation in multicore environment on cloud. Int J Future Revol Comput Sci Commun Eng 5(5) 6. Zhang T, Chen J, Li F, Zhang K, Lv H, He S, Xu E (2021) Intelligent fault diagnosis of machines with small & imbalanced data: a state-of-the-art review and possible extensions. ISA Trans 119:152–171 7. Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: experimental evaluation. Inf Sci 513:429–441 8. Barella VH, Garcia LPF, de Souto MCP, Lorena AC, de Carvalho ACPLF (2021) Assessing the data complexity of imbalanced datasets. Inf Sci 553:83–109 9. Li J, Zhu Q, Wu Q, Fan Z (2021) A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors. Inf Sci 565:438–455 10. Prusty MR, Jayanthi T, Velusamy K (2017) Weighted-SMOTE: a modification to SMOTE for event classification in sodium cooled fast reactors. Prog Nucl Energy 100:355–364 11. Yang X, Ma Z, Yuan S (2016) Multi-class Adaboost algorithm based on the adjusted weak classifier. J Electron Inf Technol 38:373–380 12. Mani I, Zhang I (2003) KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets, vol 126 13. Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (eds) Advances in intelligent computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11538059_91 14. Juez-Gil M, Arnaiz-González A, Rodríguez J, Nozal C, García-Osorio C (2021) ApproxSMOTE: fast SMOTE for big data on apache spark. Neurocomputing 464. https://doi.org/10. 1016/j.neucom.2021.08.086 15. Wang KJ, Makond B, Chen KH et al (2014) A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients. Appl Soft Comput 20:15–24 16. Ding Z (2011) Diversified ensemble classifiers for highly imbalanced data learning and their application in bioinformatics. Dissertation, Georgia State University 17. Guo Q-J, Li L, Li N (2008) Novel modified AdaBoost algorithm for imbalanced data classification. Comput Eng Appl 44:217–221 18. Chawla NV, Lazarevic A, Hall LO (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases. Cavtat-Dubrovnik, Croatia, pp 107–109 19. Molina D, Poyatos J, del Ser J, García S, Hussain A, Herrera F (2020) Comprehensive taxonomies of nature- and bio-inspired optimization: inspiration versus algorithmic behavior, critical analysis recommendations. Cogn Comput 12:897–939 20. Wei J, Huang H, Yao L, Hu Y, Fan Q, Huang D (2020) NI-MWMOTE: an improving noiseimmunity majority weighted minority oversampling technique for imbalanced classification problems. Expert Syst Appl 158:113504 21. Zhu T, Lin Y, Liu Y (2017) Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn 72:327–340 22. Shukla A, Tiwari R, Algorithm EV (2017) Discrete problems in nature inspired algorithms, 1st edn. CRC Press, Boca Raton, FL, USA. ISBN SBN9781351260886 23. Li Y, Guo H, Li Y (2016) A boosting based ensemble learning algorithm in imbalanced data classification. Syst Eng Theory Pract 36:189–199
HMLF_CDD_SSBM: A Hybrid Machine Learning Framework for Cardiovascular Disease Diagnosis Prediction Using the SMOTE Stacking Method Satuluri Naganjaneyulu, Gurija Akanksha, Shaik Shaheeda, and Mohammed Sadhak
Abstract Nowadays, predicting the diagnosis of cardiovascular disease (CDD) is one of the critical challenges in the medical field. Every year, millions of people die from cardiovascular diseases, the majority of which are preventable if caught early (Conroy in Eur Heart J 24:987–1003, 2003; Hippisley-Cox in BMJ 336:1475–1482, 2008; Kremers et al. in Arthritis Rheum 58:2268–2274, 2008). To prevent such kind of diseases, effective early prediction techniques using artificial intelligence (AI) and machine learning (ML) techniques for CDD prediction are much demanded (Krittanawong et al. in J Am College Cardiol 69:2657–2664, 2017). Traditional AI and ML models, on the other hand, fail to predict data imbalance and lead to relatively prediction accuracy at minimal rates. Compared to individual classifier models, ensemble methods performed better. Moreover, most researchers are not concerned with the imbalance of classes in the CDD dataset; one of the good approaches is Synthetic Minority Over-sampling Technique (SMOTE) (Guo et al. in Proceedings of Chinese automation congress (CAC); Ge et al. in Proceedings of IEEE conference on energy internet and energy system integration (EI2)). With all the above assumptions, this article introduces a new framework called Hybrid Machine Learning Framework for Cardiovascular Disease Diagnosis Using SMOTE-Based Stacking Method (HMLF_ CDD_SBSM) for efficient and accurate prediction of cardiovascular diseases. A stacking model-based ensemble framework selected the best basic learners using seven classifiers: Logistic Regression (LR), Multilayer Perceptron (MLP), Extra Tree (ET), Extreme Gradient Boost (XGB), AdaBoost (ABoost), Random Forest (RF), and Light Gradient Boosting (LGB) (Light GBM). Logistic regression (LR) is used as a meta-classifier to avoid overfitting. Using diversity among strong classifiers in this model will be a more efficient way to achieve the highest accuracy. S. Naganjaneyulu (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] G. Akanksha · S. Shaheeda · M. Sadhak Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_47
571
572
S. Naganjaneyulu et al.
Accuracy, precision, specificity, sensitivity, and the F1-measure are the parameters used to assess the prediction performance. The framework is validated using two benchmark datasets known as Framingham and Cleveland. The comparative results of the proposed framework for CDD, HMLF_CDD_SBSM outperform the existing state-of-the-art approaches. Moreover, it is concluded that HMLF_CDD_SBSM is a reliable framework suitable for early detection of cardiovascular disease. Keywords Cardiovascular diseases · Machine learning · Missing data · SMOTE · Feature selection and class imbalance
1 Introduction Humanity’s first and most pressing concerns are health and well-being. The hectic schedule of the contemporary times causes a harmful lifestyle resulting ultimately in anxiety syndrome and mental depression [1–3]. This concern, however, is constantly challenged by illnesses and ailments. To deal with these conditions, people tend to smoke, drink, and use drugs excessively. All of these are the primary factors behind many fatal ailments, such as cardiovascular disease and cancer [4]. Various health problems, such as diabetes, hypertension, high cholesterol, and an asymmetrical pulse rate, make it difficult to diagnose heart disease. Cardiovascular disease (CD) is the prime cause of mortality all over the globe, and it accounts for nearly 17.9 million fatalities per year, or approximately 30% of all deaths worldwide [5]. Many risk factors can contribute to cardiovascular disease. These factors can be categorized into two types: modifiable risk factors and non-modifiable risk factors. Obesity, blood lipids, and behavioral factors are considered to be a few modifiable risk factors leading to cardiovascular disease. Non-modifiable risk factors, such as age, sex, and genetic predisposition, are uncontrollable [6]. Health organizations always attempt to diagnose a disease in its preliminary phases. But unfortunately, the disease is usually discovered in its advanced stages or after death of the patient. We intend to diagnose the disease in its preliminary phases [7]. Machine learning (ML) has extreme capability to have a significant impact in the health sector. ML methods can be used to determine who is at risk of having a heart attack and to treat those persons subsequently. There are three types of machine learning: supervised machine learning, unsupervised machine learning, and semisupervised learning [8]. The classifiers used in this study are classified as supervised machine learning [9]. An ensemble model is a useful ML algorithm that can provide a variety of classification and regression techniques. Generally, ensemble mechanism can be classified into three types voting, stacking, and blending [10]. In this study, only stacking mechanism is considered for the design of ensemble framework for the CDD prediction. Generally, the stacking method composed with two levels, set of base or weak classifiers are considered to be input in first level, and then, results of base learners has merged with meta-learner in the next level [11–13]. But, the attention of many studies is focused on feature selection techniques and
HMLF_CDD_SSBM: A Hybrid Machine Learning Framework …
573
classification algorithms, often disregarding the problem of class imbalance. The fact that the issue of class imbalance has a profound influence on the accuracy of the classification algorithm which is ignored. Furthermore, existing feature selection techniques need to be enhanced in order to reduce computational complexity while simultaneously upholding optimal accuracy. In summary, an integrated machine learning framework for cardiovascular disease that enables systematic data balancing, optimal feature selection, and improved classification is indispensable. This not only improves predictions of cardiovascular disease, but also reduces the complexity of calculations. The accuracy of the classification algorithm is strongly affected by the issue of class imbalance. Furthermore, existing feature selection techniques need to be enhanced so as to reduce computational complexity while maintaining adequate accuracy. The study in this work used different base classifiers which includes Multi-Layer Perceptron (MLP), Extra Tree (ET), Extreme Gradient Boosting (XGB), Random Forest (RF), AdaBoost (ABoost), Logistic Regression (LR), and Light Gradient Boosting (LightGBM) as base learners and used LightGBM as the meta-leaner in the stacking model. Prior to the ensemble learning, applied SMOTE and three types of imputation algorithms (i.e., mean, RF, and MICE) to avoid class imbalance and data loss, respectively. In addition, three feature selection methods were used to retain the optimal feature set using information gain. The rest of the study is organized as follows: The related work is described in Sect. 2. The complete proposed framework and its relevant concepts are included in Sect. 3. In Sect. 4, the experimental evaluation and discussion, and the conclusion and future work are shown in Sect. 5.
2 Related Work Cardiovascular ailments constitute one of the primary reasons of mortality in developed countries. To identify these cardiovascular diseases, machine learning (ML) approaches [17] are being deployed by a majority of researchers and medical practitioners. Jiang et al. [14] proposed a solution based on a Random Forest algorithm to predict heart disease. The proposed algorithm, Random Forest, was compared with other classifiers such as Logistic Regression, Naive Bayes, and Support Vector Machine (SVM), and the proposed algorithm, Random Forest, was shown to achieve an accuracy of 84 0.81%. Rodondi et al. [15] validated their framework using the Framingham scoring model. The algorithms used in the experiment were KNN and random forest, and the accuracy provided by the KNN (66.7%) was found to be comparatively better than that provided by Random Forest (63.49%) [16]. As a result, KNN was chosen as the proposed algorithm. UCI laboratory data on patients with heart disease are used to identify patterns with NN, DT, Support Vector Machines (SVM), and Naive Bayes. The performance and accuracy of various algorithms are compared. The hybrid method proposed in the study achieves an F-measurement
574
S. Naganjaneyulu et al.
accuracy of 86.8%, when compared to the existing methods. For example, ensemblebased model is developed by to concentrate on the prediction and analysis of cardiac patients using multiple machine learning techniques like Naive Bayes (NB), Decision Tree (DT) based on Gini index, the gain of information based on DT, the instancebased learner, and support vector machines (SVM) and achieved an accuracy of 87.37% [18]. The main goal of the system developed by this researcher is to create a hybrid ML framework for CD prediction based on the SMOTE stacking model. The accuracy, sensitivity, precision, recall, and F1-scores of six classification algorithms, namely MLP, XGB, ABoost, LR, RF, ET, and LGBM, are compared, and the stacking model is found to be the best classification method for CDD prediction.
3 Proposed Method In this article, we will introduce an effective framework called Hybrid Machine Learning Framework for Cardiovascular Disease Diagnosis Using SMOTE-Based Stacking Method (HMLF_CDD_SBSM) for cardiovascular diseases. The proposed approach consists of three steps: (1) preprocessing the dataset, which includes removing outliers, imputation of missing data, and class balance, (2) relevant feature selection for CDD via information gain, and (3) classification via a stacking model.
3.1 Data Preprocessing The section describes data preprocessing techniques such as outlier detection, data normalization, missing data handling, and class imbalance. All these mechanisms contribute to the improvement of classification accuracy in the diagnosis of cardiovascular diseases. Figure 3 depicts the whole process of the proposed framework HMLF_CDD_SBSM for CDD data (Fig. 1).
3.1.1
Outlier Detection and Data Normalization
In this study, we performed outlier detection as the first step in data preprocessing. Z-score outlier detection was used to improve model performance. Zs =
xj − μ . σ
(1)
In addition, the Min–Max normalization method was used, as well as outlier detection. This technique is also called as feature scaling, and it derives numeric value of data feature from 0 to 1. Also, it is noticed that the data is transformed into the linear
HMLF_CDD_SSBM: A Hybrid Machine Learning Framework …
575
Fig. 1 Proposed framework process of HMLF_CDD_SSEM for CDD data
after normalization. To make data value to normal, it first estimates the minimum and maximum data values and scales data between 0-1. X normal =
x j − xmin . xmax − xmin
(2)
From Eq. (2), j is from 1, 2, …, n and number of features are n.
3.1.2
Missing Data Imputation
Most of the medical records are missed due to the various reasons like unavailability of equipment, typo mistakes, patients are unable to answer the questions in survey, etc. The mechanism of deleting observations with missing data is not appropriate approach for medical filed. Additionally, it is noted that the deletion strategy produces inaccurate prediction results because it decreases the training data. However, an efficient method is required which can replace the missing data with right value estimated through better techniques [19, 20]. To solve the problem of missing values,
576
S. Naganjaneyulu et al.
number of techniques available, in this study, mean, Random Forest (RF), and MICE are used as imputation techniques [19, 21, 22].
3.1.3
Class Imbalance
In our proposed framework, the last step of the preprocessing consists in solving the problem of unbalanced classes. The SMOTE technique is used k-nearest neighbor in this study to improve the samples of the minority class. The main advantage of this techniques: (i) The resample generated with technique is very close to original, (ii) Allow to re-generated minority classes as much as needed, (iii) Deliver the minority classes with random selection of k-nearest neighbor value and leads to robustness to the method. The complete idea about the SMOTE with KNN is described in Algorithm 1. Algorithm 1: Class Imbalance of CDD with SMOTE (ICDD G ) Input: DM CDD = Minority Cardiovascular Disease Diagnosis (CDD) Dataset Output: DCDD = Re sample of Cardiovascular Disease Diagnosis (CDD) Dataset Begin for each observation si∈k of DM CDD = (s1 , s2 , . . . , sk ) obtain KNN sresample = s + rand (0,1)* ||s − sk || add generated sresample to DCDD end return DCDD end
3.2 Feature Selection After data preprocessing, the HMLF_CDD_SBSM framework recommends the feature selection process because the right subset of features can improve model accuracy while reducing computation time and complexity. In terms of selecting features that have a significant effect on the outcome, an information gain-based feature selection method [18] is used on a dataset to remove redundant and unnecessary features.
3.3 Stacking Method The stacking method is an ensemble strategy that uses meta-learning to integrate different base learning algorithms. The complete process behind the method is described in Fig. 2 with two stages. The stage-1 initially applied seven base learners
HMLF_CDD_SSBM: A Hybrid Machine Learning Framework …
577
over training dataset, namely MLP, XGboost, LR, AdaBoost, RF, XTree, and LGBM. The predicted outcomes related to the all the base learners are represented from P1 to P7. The P1 is outcome related to the MLP, P2 for XGboost, and continue for last learner P7 for the LGBM. At stage-2, the stack algorithm considers P1–P7 and produced final prediction with LR as meta-classifier.
Fig. 2 Full stacking method architecture
Fig. 3 CDD prediction using full feature set and ML methods
578
S. Naganjaneyulu et al.
4 Experimental Results and Discussion The section describes results and discusses proposed framework in the different sections. Section 4.1 specifies about the datasets used in the complete study. The outcomes of feature selection and dataset classification utilising our suggested ensemble architecture for CDD prediction in Sects. 4.3 and 4.4 are then described in Sect. 4.2, Evaluation metrics.
4.1 Datasets The study focused on the Framingham datasets, which are one of the popular datasets for the diagnosis of cardiovascular disease [8]. The complete description about the dataset is shown in Table 1. However, the dataset is of 4240 records and 15 attributes related to patient information. The attribute related to the dataset specifies the risk factor of the CDD. Basically, risk factors in this CDD dataset of behavioral and medical. The classification is used to determine if a patient has a 10% chance of developing coronary heart disease within the next 10 years (CHD). The target data which is a 10-year risk of coronary heart disease has 3465 classes of (cases) 0 and 617 classes of 1. Therefore, this makes the dataset highly unbalanced.
4.2 Evaluation Metrics The proposed framework is tested with various benchmark performances which included as accuracy, sensitivity, specificity, precision, recall, and F1-score. All the measures are depended on the true positive (TP) or true negative (TN) and false positive (FP) or false negative (FN). The evaluation metrics used in this study are included below: i. Accuracy (Acc): The measure of reliable estimation of right instances among the complete instances is known as accuracy. The measure of accuracy is calculated as below: Accuracy (Acc) =
TP + TN . TP + TN + FP + FN
ii. Precision (Pn) is defined as the proportion of accurately predicted observations out of all expected positive observations. The measure of precision is calculated as below: Precision (Pr ) =
TP . TP + FP
HMLF_CDD_SSBM: A Hybrid Machine Learning Framework …
579
Table 1 Full description of the Framingham dataset Variable
Description
Range
Sex
Gender instance (nominal)
(0: male; 1: female)
Age
The patient age in years (continuous)
(32–70)
Edu_Back
The level of training of the patient (continuous)
(1–4)
Cur_ Smoker
Smoking status of the patient (nominal)
(Smoker: 1; no: 0)
Cigs_ PerDay
The average number of cigarettes the person smoked in a day (continued)
BP
Whether or not the patient was taking medication for high blood pressure (nominal)
(Blood pressure: 1; no: 0)
Pre_stroke
Whether or not the patient had ever had a stroke (nominal)
(Previously had stroke: 1; no: 0)
Pre_Hyp
Whether the patient was hypertensive or not (nominal)
(Was hypertensive: 1; no: 1)
Diabetes
Whether the patient was diabetic or not (nominal)
(Had diabetes: 1; no: 0)
Tot_Chol
Total cholesterol level (continuous)
200–240 mg/dl
Sys_BP
Systolic (continuous) blood pressure
105/73–151/93
Dia_BP
Diastolic (continuous) blood pressure
105/73–151/93 16.5–30
BMI
Body mass index (continuous)
HOUR
Heart rate (continuous)
Glu_level
Glucose level (continuous)
10_Y_CD
CDD risk over 10 years (binary)
(Yes: 1, No: 0)
iii. Recall (Rc): Recall is the proportion of relevant overall results that the algorithm recognizes correctly. The measure of recall is calculated as below: Recall (Rc ) =
TP . TP + FN
iv. Sensitivity (Sn): Sensitivity is the only true positive metric when the total number of instances is considered, and it can be measured as follows: Sensivity (Sn ) =
TP . TP + FN
v. Specificity (Sp): This metric determines the number of correctly identified true negatives and is calculated as follows: Specificity (Sp ) =
TN . TN + FP
580
S. Naganjaneyulu et al.
vi. F1-measure/F1-score measure: The F1-score is the harmonic mean of precision and recall. The highest F-score is 1, indicating perfect precision and recall. The F-measure is calculated as follows: F - measure (Fm ) =
2 ∗ Pr ∗ Rc . Pr + Rc
4.3 Classification Results Using an Optimized Feature Set The retrieval of optimal features for CDD prediction is one of the most challenging tasks in this study/the specific important features from Framingham CDD dataset are retrieved using the information gain. The technique extracted the potential biomarkers, and it helped to improve the classification accuracy. After, feature selection only Age, Dia_BP, Pre_Hyp, Glu_level, and sys_BP, are the variables used to generate classification model based on the highest rates obtained using the information gain. The study evaluated the performance of every base classifier using optimal features derived from the information gain method (Table 2). The classification performance of each model using feature subset retrieved by information gain of the Framingham dataset is shown in Table 3 and Fig. 4. From Fig. 4, it is shown that the ML models performed better on feature set derived from the information gain than full dataset which is shown in Fig. 3. The RF model achieved the highest accuracy of 84.99%, with 84.24% F1-score, and 84.32% recall with the feature subset. Also, it is observed that AdaBoost also produced closer results of RF and LR produced negligible percentage of all metrics values. Table 2 CDD prediction using full feature set and ML methods Model
Precision (%)
Specificity (%)
Precision (%)
Remind (%)
F1-score (%)
MLP
79.12
79.02
78.83
79.83
79.32
XGBoost
78.55
79.83
77.99
79.83
78.91
L/R
77.41
77.63
77.63
77.63
77.63
ABoost
80.81
80.94
80.94
80.75
80.94
RF
81.82
81.18
81.10
81.18
81.10
HEY
79.12
78.74
79.77
78.74
79.24
LGB
80.26
79.83
80.88
79.83
80.35
HMLF_CDD_SSBM: A Hybrid Machine Learning Framework …
581
Table 3 CDD prediction using subset and ML methods Model
Precision (%)
Specificity (%)
Precision (%)
Remind (%)
F1-score (%)
MLP
82.18
82.07
81.87
82.92
82.39
XGBoost
81.58
82.92
81.00
82.92
81.96
L/R
80.40
80.63
80.63
80.63
80.63
ABoost
83.94
84.07
84.07
83.88
84.07
RF
84.99
84.32
84.24
84.32
84.24
HEY
82.18
81.78
82.85
81.78
82.30
LGB
83.36
82.92
84.01
82.92
83.46
Fig. 4 CDD prediction using subset and ML methods
4.4 Classification Results Using the Proposed Method Table 4 summarizes proposed framework of the stacking model with seven base classifiers MLP, XGBoost, LR, ABoost, RF, ET, and LGB and one meta-classifier LGB. Like any classification mode, initially the dataset is partitioned into 80:20 ratio for training and testing data. To measure the performance of proposed work, various measures are considered which include accuracy, sensitivity, specificity, recall, precision, and F1-score. The experimental results against proposed method over standard techniques are shown in Table 4, and RF achieved impressive results including 96.82% F1-score, 96.59% accuracy, 95.37% recall, and 95.72% accuracy. After the analysis, ABoost gives a significant result of 94.44% accuracy, 94.82% recall, 95.45% precision, and 95.68% F1-score. LGB’s accuracy score was determined to be 93.75%, 93.24% recall, 94.52% accuracy, and finally, 93 0.87% F1-score. The MLP had an accuracy of 92.37%, a recall of 93.24%, a precision of 92.43%, and an F1-score of 92.61% recommendable. However, the LR had 90.27% accuracy, 92.06% recall, 91.32 % precision and an F1-score of 91.29% need further improvement (Fig. 5).
582
S. Naganjaneyulu et al.
Table 4 CDD prediction using the proposed framework Model
Precision (%)
Specificity (%)
Precision (%)
Remind (%)
F1-score (%)
MLP
92.37
92.24
92.43
93.24
92.61
XGBoost
91.66
93.24
90.98
93.24
92.17
L/R
90.27
90.54
91.32
92.06
91.29
ABoost
94.44
94.59
95.45
94.82
95.68
RF
95.72
95.10
96.59
95.37
96.82
HEY
92.36
91.89
93.15
91.89
92.51
LGB
93.75
93.24
94.52
93.24
93.87
Stacking
98.83
97.59
97.64
98.48
97.73
Fig. 5 CDD prediction using the proposed framework
It is observed from the above analysis that the proposed stacking model outperforms all state-of-the-art models for CDD prediction in the Framingham dataset. Additionally, the suggested stacking model significantly outperforms the seven basic leaner models in terms of precision and recall by a factor of 60%. The study also focused the analysis of Receiver Operating Characteristic (ROC) and is estimated using FPR and TPR values in both X-axis and Y-axis. The classification performance of the all the base and meta-classifiers is measured with the Area Under Curve (AUC) under the ROC curve. The proposed stack classifier retains the highest AUC value of 0.99 on the Framingham dataset compared to all base classifiers. The experimental results retained that the stacking model is more efficient in predicting death over the period of 10 years in Framingham dataset (Fig. 6). Additionally, it can be deduced from the experimental results that employing the ideal amount of features will enhance the performance of the suggested prediction accuracy in the CDD dataset. Additionally, results show that using the feature
HMLF_CDD_SSBM: A Hybrid Machine Learning Framework …
583
Fig. 6 ROC curve of Framingham dataset
selection strategy in conjunction with class imbalance enhances prediction accuracy, particularly for the CDD dataset.
5 Conclusion and Future Work The study designed a new framework called Hybrid Machine Learning Framework for Cardiovascular Disease Diagnosis Using SMOTE-Based Stacking Method (HMLF_CDD_SBSM) for efficient and accurate prediction of cardiovascular diseases. A stacking model-based ensemble framework selected the best basic learners using seven classifiers: Logistic Regression (LR), Multilayer Perceptron (MLP), Extra Tree (ET), Extreme Gradient Boost (XGB), AdaBoost (ABoost), Random Forest (RF), and Light Gradient Boosting (LGB) (Light GBM). Logistic regression (LR) is used as a meta-classifier to avoid overfitting. The framework is validated on Framingham dataset and got accuracy of 99%. The comparative results of the proposed framework for CDD, HMLF_CDD_SBSM outperform the advanced approaches available. Moreover, it is concluded that HMLF_CDD_SBSM is a reliable framework suitable for early detection of cardiovascular disease. As further study need to apply optimization mechanism along with the proposed framework to predict CDD with better accuracy.
584
S. Naganjaneyulu et al.
References 1. Mao L, Zhang X, Hu Y et al (2019) Nomogram based on cytokines for cardiovascular diseases in Xinjiang Kazakhs. Mediators Inflamm 2019:4756295. https://doi.org/10.1155/2019/4756295 2. Ambale-Venkatesh B, Yang X, Wu CO et al (2017) Cardiovascular event prediction by machine learning: the Multi-Ethnic Study of atherosclerosis. Circ Res 121(9):1092–1101. https://doi. org/10.1161/CIRCRESAHA.117.311312 3. Kakadiaris IA, Vrigkas M, Yen AA, Kuznetsova T, Budoff M, Naghavi M (2018) Machine learning outperforms ACC/AHA CVD risk calculator in MESA. J Am Heart Assoc 7(22):e009476–e009476. https://doi.org/10.1161/JAHA.118.009476 4. Krittanawong C, Zhang H, Wang Z, Aydar M, Kitai T (2017) Artificial intelligence in precision cardiovascular medicine. J Am College Cardiol 69:2657–2664 5. Sultana M, Haider A, Uddin MS (2016) Analysis of data mining techniques for heart disease prediction. In: 3rd international conference on electrical engineering and information communication technology (ICEEICT), pp 1–5 6. Hippisley-Cox J et al (2008) Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ 336:1475–1482 7. Rajliwall NS, Davey R, Chetty G (2018) Machine learning based models for cardiovascular risk prediction. In: Proceedings of international conference on machine learning and data engineering (iCMLDE), pp 142–148 8. Framingham Dataset by Kaggle. https://www.kaggle.com/amanajmera1/framingham-hearts tudy-dataset. Accessed 20 Nov 2020 9. Cardiovascular Disease by Kaggle. https://www.kaggle.com/sulianova/cardiovascular-diseas edataset. Accessed 15 Oct 15 10. Sharma T, Verma S, Kavita (2017) Prediction of heart disease using cleveland dataset: a machine learning approach. Int J Recent Res Aspects 4(3):17–21 11. Rubini PE, Subasini CA, Katharine AV, Kumaresan V, Kumar SG, Nithya TM (2021). A cardiovascular disease prediction using machine learning algorithms. Ann Romanian Soc Cell Biol 25(2):904–912. https://www.annalsofrscb.ro/index.php/journal/article/view/1040 12. Frohlich ED, Quinlan PJ (2014) Coronary heart disease risk factors: public impact of initial and later-announced risks. Ochsner J 14(4):532 13. Hajar R (2017) Risk factors for coronary artery disease: historical perspectives. Heart Views 18(3):109 14. Jiang Y, Zhang X, Ma R et al (2021) Cardiovascular disease prediction by machine learning algorithms based on cytokines in Kazakhs of China. Clin Epidemiol 13:417–428. https://doi. org/10.2147/CLEP.S313343 15. Rodondi N, Locatelli I, Aujesky D et al (2012) Framingham risk score and alternatives for prediction of coronary heart disease in older adults. PLoS ONE 7(3):e34287. https://doi.org/ 10.1371/journal.pone.0034287 16. Muhammad Y, Tahir M, Hayat M et al (2020) Early and accurate detection and diagnosis of heart disease using intelligent computational model. Sci Rep 10:19747. https://doi.org/10. 1038/s41598-020-76635-9 17. Randa EB (2016) An ensemble model for Heart disease data sets: a generalized model. In: Proceedings of 10th international conference on information system, pp 191–19 18. Gonsalves AH, Thabtah F, Mohammad RMA, Singh G (2019). Prediction of coronary heart disease using machine learning: an experimental analysis. In: Proceedings of 3rd international conference on deep learning and technology, pp 51–56 19. Lavanya K, Harika K, Monica D, Sreshta K (2020) Additive tuning Lasso (AT-Lasso): a proposed smoothing regularization technique for shopping sale price prediction. Int J Adv Sci Technol 29(05):878–886 20. Lavanya K, Reddy L, Reddy BE (2019) Distributed based serial regression multiple imputation for high dimensional multivariate data in multicore environment of cloud. Int J Amb Comput Intell 10(2):63–79. https://doi.org/10.4018/IJACI.2019040105
HMLF_CDD_SSBM: A Hybrid Machine Learning Framework …
585
21. Lavanya K, Reddy LSS, Eswara Reddy B (2018) Modelling of missing data imputation using additive LASSO regression model in microsoft azure. J Eng Appl Sci 13(Special Issue 8): 6324–6334 22. Lavanya K, Reddy LSS, Eswara Reddy B (2019) Multivariate missing data handling with iterative Bayesian additive Lasso (IBAL) multiple imputation in multicore environment on cloud. Int J Future Revol Comput Sci Comm Eng 5(5)
Optimal and Virtual Multiplexer Resource Provisioning in Multiple Cloud Service Provider System Phaneendra Kanakamedala, M. Babu Reddy, G. Dinesh Kumar, M. Srinivasa Sesha Sai, and P. Ashok Reddy
Abstract In today’s digital world, user needs have been diversified in accessing the resources to solve the computing problems. In cloud computing, resource provisioning allocates the resources based on user requirements. Dynamic resource allocation distributes load among virtual machines (VMs). There is an uncertainty in demand for the resources, and resource price must be considered in provisioning VMs. Cloud infrastructure providers are providing a wide range of computing services with different pricing models based on the VM instance types required by the users. Identifying the adaptable resources for the required applications is an optimization problem that can be addressed through provisioning of resources at the offered price. Therefore, there is a need to develop a prediction-based resource management for unpredictable resource demands in terms of both cost and performance. However, we are proposing a multi-cloud broking system that manages the utilization of resources and optimize the cost of resource provisioning and delivery of cloud services cost effectively using stochastic programming and linear regression. It aims to maximize performance and minimize the cost of utilizing the resource. The developed stochastic-based resource provisioning prediction management (SRPPM) P. Kanakamedala (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] M. B. Reddy Department of Computer Science Engineering, Krishna University, Machilipatnam, Krishna, Andhra Pradesh, India G. D. Kumar Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India M. S. S. Sai Department of Information Technology, KKR and KSR Institute of Technology and Sciences, Vinjanampadu, Guntur, Andhra Pradesh 522017, India P. A. Reddy Department of Computer Science and Engineering, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_48
587
588
P. Kanakamedala et al.
increases the performance which is based on linear regression and used to compare the proposed approach optimal and virtual multiplexer resource provisioning (OVMRP). So, based on SRPPM, the proposed OVMRP approach uses linear regression for resource prediction, applies multiplexing on the pool of VMs based on the capacity of VMs, and uses Bayesian theorem to find the least cost VM based on their workload characteristics. The evaluation result shows the proposed algorithm OVRPM accomplishes better performance than the other existing ones. Keywords Cloud computing · Stochastic-based resource provisioning prediction management (SRPPM) · Optimal and virtual multiplexer resource provisioning (OVMRP) · Cloud service provider · Resource provisioning
1 Introduction Cloud computing has emerged as a rapidly growing field in providing computing services. It provides an efficient way of accessing computing resources and consolidating computing services. Cloud computing is a distributed management of resources to deliver services based on shared hardware through the Internet. There are three cloud delivery models Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). Cloud computing depends mainly on IaaS to provide cost-effective, optimal processing, storage, and shared resources. This proposed system is related to Infrastructure as a Service (IaaS) model of cloud computing. Cloud computing transforms information technology into a commodity and utilized by “pay-per-use” pricing model. Cloud computing and economics became important because it is not free in providing services. As defined by NIST as shown in Fig. 1, cloud computing architecture consists of five actors: cloud providers, cloud consumers, cloud carrier, cloud auditor, and cloud broker [1]. Each actor has a key role to participate in the process of performing their tasks in a cloud system. The cloud provider maintains hardware, leases resources, and builds an appropriate cloud system for providing services to the users. A cloud customer submits requests through a web browser to the service provider to get it served using computing resources. The cloud broker role is to coordinate the cloud consumer and cloud provider and will help consumers to use value-added services from a cloud provider. The cloud auditor securely monitors the cloud system and performs an independent performance audit. Cloud carrier is an intermediary that transports the cloud services from a cloud provider to cloud customer. The service provider maintains service level agreements (SLA’s) for providing quality of service [2]. Billing is done based on resource utilization. In a cloud computing environment, there are two main members to concentrate on: cloud provider and cloud customer. The cloud providers maintain the data center with the pool of servers having a variety of physical resources rented to the customers as “pay as you use” basis [3] (Fig. 2).
Optimal and Virtual Multiplexer Resource Provisioning in Multiple …
Fig. 1 NIST cloud computing reference architecture [1]
Fig. 2 Usage of cloud resources by cloud user and service provider
589
590
P. Kanakamedala et al.
The users send requests to execute different applications with variable workloads. The resources are leased to run the applications on the service provider systems. When the cloud user submits the request, it is accepted and resources are provisioned for that request. The service provider then verifies the availability of the resources and allocates the resources necessary to complete the task for the user in the form of VMs [4]. The cloud user can run the application on the allocated resources and pay for the utilized resources. Once the usage is over, the resources are released back to the service provider. In a cloud environment, there are different service providers with variable pricing and SLA’s. The goal of the service provider is to maximize revenue and resource utilization. The purpose of the cloud customer is to reduce the cost of paying for utilizing resources with minimum execution time. Resource provisioning is used to identify and provide appropriate resources for suitable workloads [5]. So, the resources are utilized by the applications effectively in time. This maximizes the revenue to the service provider and improves the satisfaction levels of the user. Cloud computing adaptability is growing with diversified users from business, research, entertainment, and gaming and requires computing solutions. Cloud users are using long-running VMs for online transaction processing and standard workload. There is an increase in the number of short duration tasks used on demand. The IaaS service provider provides access to resources on-demand, reservation, and spot instances. The cloud service providing market is dynamic where new providers enter the market, the rate of providing services used to fluctuate, and the price of resources used to vary significantly [6]. Selection of least cost VM’s from various service providers is a challenging task to the cloud consumers, which needs a different set of information processing of pricing models. To overcome this situation, the cloud broking system has been introduced to ease the task of provisioning resources from the best service provider based on the user requirements. One of the most exciting areas in cloud computing has been a cloud broking system that manages the usage, resource provisioning, and delivery of cloud services from multiple cloud service providers. Keeping track of the price of resources and service provider is not an easy task. At the same time, selecting the best service provider at least cost is not easy without automated systems. So, identifying the adaptable workload based on application requirements is an optimization problem that can be addressed through provisioning of resources offered by multiple service providers using multi-cloud broking system [7]. Therefore, there is a need to provide [8] multi-cloud broking services to focus on characterizing cloud applications and resource provisioning strategies for allocating appropriate resources to cloud customer by increasing performance and minimizing the cost. There is also a need to design a paradigm with trust, immutability, and transparency between the user and the cloud broker. The objective of this study is to maximize performance and profits for the service provider and minimize the payment by using resources by the cloud consumers in a cloud computing environment by establishing a relationship between the on-demand,
Optimal and Virtual Multiplexer Resource Provisioning in Multiple …
591
spot, and reservation of resources, to satisfy the requests of both cloud consumers and service provider simultaneously. This proposed approach uses effective optimization methods. Henceforth, the proposed work depends on this forecasting approach for training and machine learning-oriented calculations using linear regression to predict the overloading of resources. Limiting the under-provisioning issues for the request consolidation and value vulnerability in distributed computing conditions is the essential inspiration to investigate resource provisioning on the multi-cloud broking system.
2 Review of Related Work It is used to identify different issues in resource provisioning and allocation algorithms. The resource provisioning approaches are categorized as uncertainty parameters, resource provisioning types, simulation tools used, and resource prediction methods and methodologies used in cloud computing. The differentiation among the resource provisioning techniques is given in Table 1. A literature survey is also carried out on the blockchain as it is a transparent data structure, where the integrity of the blockchain is verified by the user which can be implemented in a multi-cloud broking system. When it is taken from the consumer perspective, in recent years, only limited optimization approaches are proposed to facilitate the scheduling of tasks and provisioning of resources to the applications in multi-cloud environments. The proposed work is resource provisioning of multi-cloud broking system based on the objective of cost optimization under uncertainties. Previous works were focused on goals like cost, QoS, and performance optimization. As part of this work, optimization approaches are simulated in the cloud computing environment using the CloudSim simulation framework. The use of a CloudSim simulation tool helps the researchers to capture complex provisioning and stochasticbased problems of cloud environments. The problems include demand patterns, price variations, and fluctuating resource performances, in providing QoS services. Finally, when it is taken from the consumer perspective, in recent years, only limited optimization approaches are proposed to facilitate the scheduling of tasks and provisioning of resources to the applications in multi-cloud environments. The proposed work on resource provisioning algorithms is to focus on obtaining better optimization results in multi-cloud broking system, based on the objective of cost optimization under uncertainties.
592
P. Kanakamedala et al.
Table 1 Comparison of resource provisioning techniques in cloud computing Author
Architecture
Year
Optimization criteria
Decision values
Methodology
N. Van H. D
Burst
2009
QoS
Demand
Constraint-based programming
Vanden Bossche
Burst
2010
Cost
Deadline and non-migrate VM
Binary integer programming
Javadi
Burst
2011
Performance and cost
Deadline, performance cost, and failure VM
Knowledge-free approach
Tordsson
Multiple cloud
2011
Cost and performance
Budget and performance
Binary integer programming
Chaisiri
Broker
2012
Cost
Dynamic demand and price of resource
Stochastic integer programming
Zhu and Agarwal
Burst
2012
Cost and QoS Dynamic demand
Control theory
Calheiros
Multiple Cloud
2012
Profit and performance
Performance, reliability, and scalability
Cloud coordinator architecture
Lucas-Simarro
Multiple cloud
2013
Cost and performance
Budget and performance
Binary integer programming
Coutinho
Multiple cloud
2015
Cost and performance
Communication cost
Heuristic-based approach using weighted sum objective function
3 Basic Preliminaries for OVMRP Resource provisioning and utilization strategies are based on stochastic programming and linear regression. It aims to minimize the overall cost of utilizing the resource in the multi-cloud broking system. The cloud consumer needs to select the reservation option for long-term usage. Later, if the resource demand exceeds the reserved resources, the additional resources are provisioned from an on-demand option or the spot option [9]. Under-provisioning problems will ensue due to the demand uncertainty. To deal this problems, the proposed resource provisioning algorithms are developed in a [10] multi-cloud environment under price and resource demand uncertainties. • Firstly, the introduction of multi-cloud resource provisioning system architecture was presented, and this system can provision resources to the users and keeps track of the resource consumption and its cost of utilization.
Optimal and Virtual Multiplexer Resource Provisioning in Multiple …
593
• To optimize the cost of provisioning in a multi-cloud broking system, the developed stochastic-based resource provisioning prediction management (SRPPM) increases the performance, which is based on linear regression and also used to compare the proposed approach OVMRP. • Finally, the proposed optimal and virtual multiplexer resource provisioning (OVMRP) approach applies to multiplex on the pool of VM’s to multiplex VM’s based on capacity and uses the Bayesian theorem to find the least cost VM based on their workload characteristics. The proposed algorithm OVRPM accomplishes better performance than the other existing ones.
4 System Model and Implementation 4.1 Design Procedure Let the data center system having three tuples DC = {PHost, VM, P) for the proposed system, DC represents the entity of data center located in different locations. PHost = {Ph1, Ph2, Ph3, …, Phn } represents the set of “n” number of physical hosts in the data center. VM = {VM1, VM2, VM3, VM4,…, VMm } represents the set with different types of “m” number of virtual machines to be hosted on the physical machines procured from multiple service providers. Let P = {P1, P2,…, Pn } is the list of the service providers of the cloud, and each service provider provides the services in the form of resources to the user (Fig. 3). So, the proposed OVMRP algorithm is developed depending on machine learningoriented calculations like linear regression, which have an appropriate prediction of resource provisioning, based on time, CPU, and memory to predict the resource overloading. There are three phases of resource provisioning in OVMRP: The first phase characterizes the pool of VMs provisioned in the reservation stage. The cloud consumer requests for resources with the agreed-upon parameters requesting for the VM’s in the reservation plan. The cloud broker without any predefined knowledge about the actual demand from the customer provisions the resources in the reservation plan. The consumer tasks executed on the available reserved VM’s without rejecting the tasks. Therefore, at a particular time instance, there may be overloading instances of VMs that takes place. When the instances are overloaded, select the excess required VM’s from the on-demand phase. The second phase portrays the quantity of VMs assigned in the reservation and on-demand phases. If the overloading of VMs in the reserved phase happens, then, based upon the consumers’ task’s parameters, the least cost, and on-demand plan VMs are allocated from the pool of VMs. VM multiplexing is done if the required VMs are not available in the VM pool. If the required VM is available, then it searches for the least cost VM using Bayesian theorem. The total cost is calculated based upon the provisioned and utilization of reserved resources and extra on-demand resources to complete the consumer task.
594
P. Kanakamedala et al.
Fig. 3 Proposed architecture of multi-cloud resource provisioning
The third phase specifies the quantity of VM’s assigned in reservation and spot phases. If the demand surpasses the required reserve resources for the customer requests with specific start and end times, then the spot resources are provisioned. Spot resources are multiplexed if the instance of spot resource needed is not available. If the required spot instance is available, then the least cost instance is selected using Bayesian theorem. Cost estimation in this phase is done using stochastic programming considering the utilization of reserved VMs and spot VMs to complete the task. The primary objective of the service provider is to maximize the profit in utilizing the cloud resources and minimize the payment by using resources for the customer. Along these lines, proactively, forecast-based resource scaling is required to adapt up to the regularly fluctuating application demands, for example online business applications.
4.2 Algorithm Implementation Algorithm for Optimal Resource Provisioning Algorithm Procedure Create Resource Provisioning Plan RPbot Begin // bot E Bot hom represents the bag of VM in reservation plan // Vmt – a task assigned to VM by the user.
Optimal and Virtual Multiplexer Resource Provisioning in Multiple …
595
If bot E Bot hom then (Reservation Plan) Solve for homogeneous bot VM selection from (VM Selection Algorithm) For each Vmt that had at least one task assign do If numTasks = number of tasks assigned to a VM of type Vmt numVM’s = number of VM’s of type Vmt used RPvm t = (numTasks, numVM’s) Evaluate the cost of provisioning Using Stochastic Programming Identify overloading of VM’s from (Overloading VM Detection Algorithm) If Overload occurs, select the VM from on-demand resource or select spot resources. Evaluate the cost of Provisioning Using Stochastic Programming End If End For Else If bot E Bot Het then (On-demand) Solve for Heterogeneous Bot VM Selection from (VM Selection Algorithm) For each VM that had at least one task assigned do Tasks = Tasks assigned to Vm RPVM = (tasks, VM) Evaluate the cost of provisioning Using Stochastic Programming End for End If Else if bot E Bot spot then (Spot Plan) Solve for Spot Bot VM Selection from (VM Selection Algorithm) Tasks = Tasks assigned to Vm RPSpotVM = (tasks, VM) Evaluate the cost of provisioning Using Stochastic Programming Return RPbot End Procedure.
5 Experimental Evaluation Therefore, the proposed (OVMRP) algorithm maximizes the performance and detract from the resource provisioning cost, by regulating the spot, on-demand, and reservation instances under uncertainties in the multi-cloud system. The algorithms are evaluated using CloudSim, and the results show that OVMRP performs better than current state-of-the-art algorithms. The comparison of defined resource provisioning algorithms was performed. The comparison includes the proposed optimal virtual multiplexer resource provisioning (OVMRP), expected value of uncertainty provisioning, stochastic resource provisioning and prediction management (SRPPM), maximum reservation, and no reservation provisioning algorithms.
596
P. Kanakamedala et al. Average costs No. Res
Max. Res
SRPPM
OVMRP
10
36.270
31.268
25.014
20.762
20
49.512
42.683
34.146
28.341
30
61.835
46.910
42.645
39.233
40
75.787
65.334
52.267
43.382
50
93.360
80.483
64.386
53.440
60
110.235
95.030
76.024
63.100
70
129.360
111.518
89.214
74.048
No. of VM’s
“No reservation” yields the highest total cost as the on-demand cost is high. When compared to this, the OVMRP cost is less. The “maximum reservation” yields the next highest cost, but the OVMRP algorithm reserves the VMs and context switch between the reserved to on-demand or reserved to spot instances when under-provisioning of VM’s situation occurs. The results give a useful indication about the number of resources to be provisioned in different stages, where OVMRP provides the most optimal trade-off.
Optimal and Virtual Multiplexer Resource Provisioning in Multiple …
597
6 Conclusion This proposed system addresses the problem of provisioning computing resources in multi-cloud environment by minimizing the resource provisioning cost for the cloud consumer and maximizing the performance of a cloud service provider. The algorithms presented in this proposed system aim to overcome the challenges to predict resource demand and allocate the least cost resources on-demand and spot instances under uncertainties. Possible solutions were identified as a part of this work to improve dynamic resource scaling for different applications with efficient resource utilization in cloud computing. To predict overloading of instance, the proposed method has utilized linear regression techniques which have not been addressed yet. To increase the service utilization of different clients based on optimized and cost reduction of various services in a distributed environment, an (OVMRP) approach is implemented, and it is evident that OVMRP approach minimizes the resource provisioning cost in the cloud computing environment; in this process, different trade-off services like spot, on-demand, and reservation instances are attuned to optimal level.
References 1. Liu F, Jong J, Mao J, Bohn R, Messina J, Badger L, Leaf D (2014) NIST cloud computing reference architecture. In: National institute of standards and technology of US department of commerce. Special Publication, pp 200–294 2. Li X, Du J (2013) Adaptive and attribute-based trust model of service-level agreement guarantee in cloud computing. In: IET information security, pp 39–50 3. Erl T, Puttini R, Mahmood Z (2013) Cloud computing concepts, technology, and architecture. The Prentice Hall Service Technology Series 4. Singh S, Chana I (2016) A survey on resource scheduling in cloud computing: issues and challenges. J Grid Comput 14(2):217–264 5. Christina D, Christos K (2016) HCloud: resource-efficient provisioning in shared cloud systems. In: ASPLOS ‘16 proceedings of the twenty-first international conference on architectural support for programming languages and operating systems, pp 473–488 6. Jamshidi P, Ahmad A, Pahl C (2014) Autonomic resource provisioning for cloud based software. In: Proceeding of 9th international symposium on software engineering for adaptive and self-managing systems. ACM, pp 95–104 7. Liaqat M, Chang V, Gani A, Ab Hamid SH, Toseef M, Shoaib U, Ali RL (2017) Federated cloud resource management: review and discussion. J Netw Comput Appl 77:87–105 8. Ghobaei-Arani M et al (2018) An autonomic resource provisioning approach for service-based cloud applications: a hybrid approach. Future Gener Comput Syst 78:191 9. Calzarossa MC et al (2019) A methodological framework for cloud resource provisioning and scheduling of data parallel applications under uncertainty. Future Gener Comput Syst 93:612 10. Yang J et al (2014) A cost-aware auto-scaling approach using the workload prediction in service clouds. Inf Syst Front 16:7
AI with Deep Learning Model-Based Network Flow Anomaly Cyberattack Detection and Classification Model Sara A. Althubiti
Abstract Network anomaly detection plays a vital part in accomplishing a prominent role in network security. Because of adaptive malware variations in network traffic data, traditional tools and techniques fail to protect networks from attack penetration. Many anomaly detections (AD) approaches dependent upon semi-supervised learning are presented for detecting this unknown cyberattack. Recently, because of the huge enhancement in the count of unidentified attacks, the progress of network intrusion detection systems (NIDS), which efficiently resists unfamiliar attacks, has become a crucial topic for network administrators. The latter utilizes machine learning (ML), or deep learning (DL) approaches for training a classification method with a mixture of abnormal and normal flows. In recent times, several researchers have utilized anomaly-based NIDS. With this motivation, the study emphasizes the design and development of Jarrat’s butterfly optimization algorithm with deep learningbased network flow anomaly detection (JBOADL-NFAD) technique. The aim of the JBOADL-NFAD technique lies in the proper identification and classification of network anomalies. At the initial stage, the presented JBOADL-NFAD technique performs a min–max normalization approach which scales every unique feature to a predefined size. The JBOADL-NFAD technique detects anomalies using a deep neural network (DNN) model. Moreover, the JBOA is exploited as a hyperparameter optimizer of the DNN model showing the novelty of the work. The presented JBOA involves integrating Jarratt’s iterative approach and the BOA to enhance the BOA’s searching process and convergence rate. To ensure the improved anomaly detection outcomes of the JBOADL-NFAD technique, an extensive range of simulations has been performed on benchmark datasets. The extensive comparison study demonstrated the enhancements of the JBOADL-NFAD algorithm over other recent techniques. Keywords Jarrat’s butterfly optimization · Network flow · Anomaly detection · Security · Deep learning S. A. Althubiti (B) Department of Computer Science, College of Computer and Information Sciences, Majmaah University, Al-Majmaah 11952, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_49
599
600
S. A. Althubiti
1 Introduction Cloud computing (CC) and IoT gadgets are among the increasing Internet-connected services, which makes it progressively tough to prevent cyberattacks [1]. Cyberattack has progressed into severe threats to privacy and security since their influence over IoT gadgets leads to financial losses and possibly risks the life of humans. Network intrusion detection (ID) is significant in monitoring and identifying possible threats, breaches, and events [2]. Security systems, like intrusion detection systems (IDS) and firewalls, are prone to the latest cyberthreats as prevailing methods focus on static attack signs and cannot identify novel attack variants. IoT can be a valued target for cybercriminals due to its important economic effect and extensive effect on lives [3]. Cybersecurity can be preferred for IoT structures. Though cybersecurity was examined for several years, the increasing IoT networking system and new threats presented traditional measures as ineffective. It now turns out to be a big concern for many service providers since the count of Internet threats and computer networks raises [4]. It has inspired the growth and application of IDS to help mitigate and prevent threats made by network intruders. A significant role in the detection of abnormalities and network cyberattacks was executed and last to be played by IDS. The authors devised several ID methods for countering the threats imposed by network intruders. But many previously devised IDS have higher rates of false alarms [5]. Figure 1 represents the network flow anomaly detection system. Many recent approaches for detecting cyberattacks on energy prediction (i.e. electricity price, renewable energy, and load) focused on data quality control. Mostly, anomaly detection (AD) techniques in energy prediction are categorized into two
Fig. 1 Overview of network flow anomaly detection system
AI with Deep Learning Model-Based Network Flow Anomaly …
601
classes: data driven and model related [6]. Model-related techniques depend on mathematical methods constituted on healthy historical data, which could be more practical because of the mismatches in model reality. The performance of data-driven techniques is constrained by data accessibility when the solar farms are at their initial phase [7]. Data-driven AD techniques are again categorized into classification oriented, statistical, and clustering oriented. AD was carried out for statistical techniques based on statistical methods or fitted error distributions to determine whether the data point is abnormal [8]. A statistical technique can be unsupervised and direct, which is well trained by lacking complete knowledge regarding the information. But it needs comparatively larger samples for distribution prediction. It invariably includes higher-dimensional computation deep learning (DL) and machine learning (ML)-related deterministic prediction methods that were extensively implemented in work for assisting the AD as mentioned earlier techniques [9] because of their strong capabilities of modelling nonlinearity in data. A method trained by health data could estimate the expected performance based on historical observations. But there is a mismatched performance among the observation and prediction when the data is under cyberattacks [10]. The study emphasizes the design and development of Jarrat’s butterfly optimization algorithm with deep learning-based network flow anomaly detection (JBOADLNFAD) technique. At the initial stage, the presented JBOADL-NFAD technique performs a min–max normalization approach which scales every unique feature to a predefined size. The JBOADL-NFAD technique detects anomalies using a deep neural network (DNN) model. As manual hyperparameter adjustment of the DNN is a tedious process, the JBOA is exploited in this study. A wide range of simulations was performed on benchmark datasets to ensure the enhanced anomaly detection outcomes of the JBOADL-NFAD technique. The key contributions of the paper are given as follows. • An intelligent JBOADL-NFAD technique comprising preprocessing, DNN-based anomaly detection, and JBOA-based hyperparameter tuning is presented. To the best of our knowledge, the JBOADL-NFAD model has never presented in the literature. • Employ DNN classifier for the identification and classification of anomalies. • Hyperparameter optimization of the DNN model using JBOA algorithm using cross-validation helps to boost the predictive outcome of the JBOADL-NFAD model for unseen data.
2 Related Works De Paula Monteiro et al. [11] present a new approach for training a DL-based extracting feature to the anomaly detection (AD) issue on rotating pieces of machinery. It utilizes a prototype selective method for improving the trained procedure of an arbitrarily initializing extracting feature. It can be executed by this method iteratively utilizing data going to one probability distribution as a normal class. In
602
S. A. Althubiti
[12], a hybrid data processing system for network AD was presented, which controls CNN and GWO. For enhancing the abilities of the presented method, the GWO and CNN learning techniques are improved with: (i) enhanced exploitation, exploration, and primary population generation capabilities and (ii) revamped dropout functionality correspondingly. Lin et al. [13] propose and execute a dynamic network AD method utilizing DL techniques. It can utilize LSTM to build a DNN method and improve an attention module (AM) to enhance the system’s efficiency. The SMOTE technique and an enhanced loss function were utilized for handling the class imbalance problems. Li et al. [14] present an innovative DL-based algorithm for AD in mechanical tools by integrating two varieties of DL structures, like SAE and LSMTNNs, for identifying anomaly states in a completely unsupervised manner. This devised technique focuses on the AD via many feature series whenever the history data is unlabelled and practical knowledge regarding anomaly is absent. Al Jallad et al. [15] applied the deep learning method as an alternative to conventional techniques since it contains the maximum generalization capability and, therefore, obtained fewer false-positive with the help of deep models and big data. The author compared DL and ML techniques in optimizing anomaly-related IDS through the decrement of false-positive rates. In [16], an innovative technique was modelled for extracting spatiotemporal multidirectional features of SCADA data based on CNN and BiGRU, including attention systems. Initially, the quartile technique was formulated for dispersing SCADA data to delete and clean abnormal data, thereby enhancing the data validity. After, the input variables will be opted by the Pearson correlation coefficient, and they can be converted into high-dimension features by utilizing CNN. Such features were considered as input to BiGRU networks by the attention system layer. Kao et al. [17] devise a new two-stage DL structure for network flow AD through the combination of the GRU and DAE methods. Employing supervised AD with a selection system for assisting semi-supervised AD, the accuracy and precision of the AD mechanism will be enhanced. In this devised framework, the author initially leverages the GRU method for analysing the network flow and later considers the result from the softmax function as confidence scores.
3 The Proposed Model This study formulated a new JBOADL-NFAD technique to identify and classify network anomalies properly. The intention of the JBOADL-NFAD technique lies in properly classifying and identifying network anomalies. The JBOADL-NFAD techniques encompass data preprocessing, DNN-based anomaly detection, and JBOA-based hyperparameter tuning.
AI with Deep Learning Model-Based Network Flow Anomaly …
603
3.1 Data Preprocessing Data preprocessing is an extremely crucial step in the ML technique. There are different data preprocessing systems for the attributes of various data kinds. Generally, continuous features utilize feature scales for scaling distinct features to similar comparative intervals, and discrete features utilize coding approaches for classifying various kinds of data. This article carried out min–max normalized for continuous features. An instance comprises several continuous features, and distinct features contain distinct numerical ranges. With scaling all the features to a set size, the trained could be independent of certain attributes. The formula of min–max normalized was demonstrated in Eq. (1), whereas x stands for the actual value of instance in features that are computed; max(x) and min (x) correspondingly indicate maximal and minimal values of attributes from the entire instance. The normalized value × 0 is computed by Eq. (1), compressing the attributes of various sizes from zero to one. The data consistency was enhanced with the min–max normalized. x =
x − min(x) max(x) − min(x)
(1)
Generally utilized encoder approaches contain a label encoder and one-hot encoder. The label encoder consecutively encodes the recently developing groups as integers from smaller to larger. This technique is rapid and cannot enhance the count of attributes. However, labelled varieties all have similar attributes and are projected as integers. The drawback of the one-hot encoder is that the count of attributes enhances based on the number of kinds from the original attribute. However, learning the content of features related to the label encoder can be simpler. Once the other feature was in the normal range, it could be until the most recent ML mainstream approach was utilized in the study. Thus, a one-hot encoder system was utilized for distinct features.
3.2 DNN-Based Anomaly Detection At this stage, the JBOADL-NFAD technique applied the DNN model for anomaly detection [18]. AD technique is established by employing a DNN technique as the classifier. The encoder is obtained during the automatic extracting feature of the AE technique in trained data to the input of classifications. Input in the DNN structure is outcome Z , created in the AE procedure from the procedure of trained input data X. The other layer was added to the DNN outcome in output y (a label which contains binary classes or five anomaly classes on NSL-KDD datasets). Afterwards, the procedure of retrains was conducted utilizing biases and weights values of the AE as pre-trained values for learning the outcome y. The outcome yˆ is expressed as
604
S. A. Althubiti
yˆ = f W (l) · h (l) + b(l) = f z (l+1) ,
(2)
, whereas l + 1 = the final layer. Figure 2 represents the structure of the DNN technique. As regards the AE infrastructure function, f denotes the activation function. Among other activation functions, the ReLU function has benefits. In this case, it can be estimated to employ many variations of ReLU like ELU, PReLU, SELU, and leaky ReLU for the hidden state. During the resultant layer, the sigmoid activation function can be utilized for binary classification and softmax activation to the multi-class classification (five classes of anomaly). A primary parameter is identified before training the DNN. W (1) , W (2) , W (3) and b(1) , b(2) , b(3) values are achieved in DAE encoder procedure. The weighted parameter required that initialization arbitrarily as a smaller value [for instance, distributed around zero; (0, 0.1)]. The resultant output is yˆ techniques actual value y. To resultant nodes, the variance betwixt is the network activation yˆ and the actual target value y i was computed. In the hidden unit, the error value was computed dependent upon the average weighted of error nodes that utilize h li as inputs. The activation and loss functions for binary classifiers utilize the binary cross-entropy and sigmoid functions, but the multi-class classifier can utilize the categorical softmax and cross-entropy functions. Fig. 2 Architecture of DNN
AI with Deep Learning Model-Based Network Flow Anomaly …
605
3.3 JBOA-Based Hyperparameter Optimization Finally, the JBOA is used as a hyperparameter optimizer of the DNN model. According to researchers’ outcomes, BFs take a great sense which notes the fragrance source. Additionally, it singles out varieties of fragrances and feels their intensity [19]. In BOA, BFs signify the search agent utilized for performing optimization. All the BFs create a fragrance which is linked with their fitness value. For instance, once the BF passes around, their fitness value forms. In addition, the fragrance of all the BFs covers a distance, and other BFs smell it. Specifically, with propagating its scents, the BFs share their data and procedure a collaborative data network. So, the BFs sense the fragrance of distinct BFs and alteration their places, moving to the BF with optimum fragrance (fitness), developing the global searching step in BOA. But, once a BF cannot feel fragrance in its environment, it begins an arbitrary movement, and this step is called a local search from BOA. In BOA, all fragrances have a certain scent and touch. The fragrance separates the BOA in another meta-heuristic technique, and it can be computed as f = cI a ,
(3)
whereas f denotes the magnitude of fragrances that reproduces that stronger other BFs recognize the fragrance, c stands for the fragrance sensory modality that was employed for differentiating smell in another modality, I implies stimulus intensity, and a denotes the power exponent that is accountable for different degree of absorption based on modality. The values of a and c range betwixt zero and one. The BOA technique reproduces BF movements of BFs from search to its feed to determine optimum solutions. Essential features of BFs’ movements are summarized as follows: • All the BFs produce a fragrance which appeals to other BFs. • All the BFs fly arbitrarily or near the optimum BF that emits the best fragrance among other BFs. • The main function controls the stimulus intensity of BFs. Because of most meta-heuristic techniques, BOA contains three stages: initialization, iteration, and last stages. This technique determines the main function and solution space from an initial stage. Also, the BOA parameters have their values allocated as well. This technique then generates a primary population of BFs to optimize. The BFs were provided with a set memory size for holding their data in the BOA model, as its number has not changed. The next stage in BOA is the iteration stage. This technique has many iterations. The fitness value of every BF from the solution spaces is defined in all the iterations. The BFs have their position and create fragrances utilizing Eq. (3). This technique switches betwixt two kinds of search processes: global and local. During the global search, the BFs move to BF with optimum fitness value, demonstrating optimum solutions. The formulation of global searching was demonstrated utilizing Eq. (4):
606
S. A. Althubiti
xit+1 = xit + r 2 × g ∗ − xir × f i ,
(4)
whereas xit implies the solution vector xi of ith BF from the iteration r , and g ∗ represents the optimum solutions to the present iteration. f i stands for the fragrance of ith BF, and r indicates the arbitrary number betwixt zero and one. Conversely, the BFs during the local searching begin with moving arbitrarily in its prospective areas subsequent in Eq. (5) as: xit+1 = xit + r 2 × x tj − xkt × f i ,
(5)
whereas x tj and xkt represent the jth and kth BFs in the solution spaces. So, Eq. (5) acts on local random walks. In BOA, changes betwixt general global searching and intensive local searching depend on probability value p, typically a value betwixt zero and one. Also, the BOA can trap at local optimum or knowledge divergence issues once resolving NSE. Thus, integrating BOA and Jarratt’s techniques from JBOA is importantly enhanced the efficiency of JBOA [20]. Jarratt’s system was carried out from all the iterations of BOA. An optimum BF place is defined as the BOA has been preserved as a candidate place. The candidate place is then utilized as input to the Jarratt’s system, which in most cases, enhances the BF place. The candidate place is then provided as the Jarratt’s technique that one of the times enhances BF place. Eventually, the result of Jarratt’s techniques was related to candidate places, and the one with optimum fitness was selected. Jarratt’s system creates further correct solutions in fewer iterations because of their higher order of convergences. JBOA carried out the alterations demonstrated in the red box at the end of all the iterations. A comparative was developed betwixt Jarratt’s techniques place (X n+1 ) and BOA BF’s place X b f dependent upon fitness value. Finally, the best place that measures optimum fitness was chosen as an optimum solution. The JBOA system defines a fitness function (FF) for accomplishing increased classifier outcomes. It solves a positive integer to depict a good performance of the candidate solution. In the study, the reduction classifier error rate was deemed FF as presented in Eq. (6). fitness (xi ) =classifier error rate (xi ) number of misclassified samples ∗ 100 = Total number of samples
(6)
AI with Deep Learning Model-Based Network Flow Anomaly …
607
4 Results and Discussion In this section, the anomaly detection performance of the JBOADL-NFAD method is tested utilizing a dataset encompassing 25,000 samples with five class labels as represented in Table 1. The confusion matrices offered by the JBOADL-NFAD technique on the applied data are shown in Fig. 3. The figure represented the JBOADL-NFAD method that has reached effectual anomaly identification under all aspects. Table 2 reports the overall anomaly detection results of the JBOADL-NFAD model on 80% of TR data and 20% of TS data. Figure 4 illustrates a brief result analysis of the JBOADL-NFAD model on 80% of the TR dataset. The outcome inferred that the JBOADL-NFAD method had identified all the anomalies properly. For example, in a normal class, the JBOADL-NFAD technique demonstrated an accu y of 98.93%, precn of 97.90%, recal of 96.74%, spec y of 99.48%, and Fscore of 97.32%. Moreover, in the DDoS class, the JBOADL-NFAD method has presented an accu y of 98.86%, precn of 97.11%, recal of 97.21%, spec y of 99.27%, and Fscore of 97.16%. Figure 5 illustrates a brief analysis of the JBOADL-NFAD method on 20% of the TS dataset. The outcomes denoted by the JBOADL-NFAD algorithm have identified all the anomalies properly. For example, in a normal class, the JBOADL-NFAD method has rendered an accu y of 98.90%, precn of 97.99%, recal of 96.53%, spec y of 99.50%, and Fscore of 97.25%. Furthermore, on DDoS class, the JBOADL-NFAD approach has provided an accu y of 99.02%, precn of 97.18%, recal of 97.87%, spec y of 99.30%, and Fscore of 97.52%. Table 3 demonstrates the overall anomaly detection results of the JBOADL-NFAD method on 80% of the TR data and 20% of the TS dataset. Figure 6 exemplifies a detailed analysis of the JBOADL-NFAD technique on 70% of TR data. The experimental values represented by the JBOADL-NFAD algorithm have identified all the anomalies properly. For example, in a normal class, the JBOADL-NFAD method has presented anaccu y of 98.51%, precn of 95.31%, recal of 97.33%, spec y of 98.81%, and Fscore of 96.31%. Furthermore, in the DDoS class, the JBOADL-NFAD approach has presented an accu y of 99.43%, precn of 98.84%, recal of 98.27%, spec y of 99.72%, and Fscore of 98.55%. Figure 7 demonstrates a comparative result analysis of the JBOADL-NFAD method on 20% of TS data. The simulation values denoted by the JBOADL-NFAD Table 1 Dataset details
Class
No. of instances
Normal
5000
DDoS
5000
DoS
5000
Scan
5000
Data theft Total no. of instances
5000 25,000
608
S. A. Althubiti
Fig. 3 Confusion matrices of JBOADL-NFAD system a, b 80% and 20% of TR/TS database and c, d 70% and 30% of TR/TS database
algorithm have identified all the anomalies properly. For example, in a normal class, the JBOADL-NFAD technique has presented an accu y of 98.61%, precn of 95.67%, recal of 97.56%, spec y of 98.88%, and Fscore of 96.61%. Additionally, in the DDoS class, the JBOADL-NFAD methodology has presented an accu y of 99.47%, precn of 99.15%, recal of 98.24%, spec y of 99.78%, and Fscore of 98.69%. The training accuracy (TRA) and validation accuracy (VLA) accomplished by the JBOADL-NFAD methodology under the test database are exemplified in Fig. 8. The outcome shows that the JBOADL-NFAD algorithm has obtained higher values of TRA and VLA. The VLA is superior to TRA.
AI with Deep Learning Model-Based Network Flow Anomaly …
609
Table 2 AD outcomes analysis of JBOADL-NFAD system with different classes under 80:20 of TR/TS database Class
Accu y
Precn
Recal
Spec y
Fscore
Training phase (80%) Normal
98.93
97.90
96.74
99.48
97.32
DDoS
98.86
97.11
97.21
99.27
97.16
DoS
99.21
97.81
98.28
99.45
98.05
Scan
99.08
98.85
96.52
99.72
97.67
Data theft
98.49
94.89
97.69
98.69
96.27
Average
98.92
97.31
97.29
99.32
97.29
Testing phase (20%) Normal
98.90
97.99
96.53
99.50
97.25
DDoS
99.02
97.18
97.87
99.30
97.52
DoS
99.14
97.69
97.99
99.43
97.84
Scan
99.38
98.99
97.91
99.75
98.45
Data theft
98.76
96.20
97.72
99.02
96.95
Average
99.04
97.61
97.60
99.40
97.60
Fig. 4 Average analysis of the JBOADL-NFAD system in 80% of the TR database
The training loss (TRL) and validation loss (VLL) acquired by the JBOADLNFAD technique under the test database are shown in Fig. 9. The experimental result denoted that the JBOADL-NFAD system has exhibited minimal least values of TRL and VLL. Especially, the VLL is lesser than TRL.
610
S. A. Althubiti
Fig. 5 Average analysis of the JBOADL-NFAD system in 20% of the TS database
Table 3 AD outcomes analysis of JBOADL-NFAD system with various classes under 70:30 of TR/TS database Class
Accu y
Precn
Recal
Spec y
Fscore
Training phase (70%) Normal
98.51
95.31
97.33
98.81
96.31
DDoS
99.43
98.84
98.27
99.72
98.55
DoS
99.03
97.36
97.88
99.33
97.62
Scan
99.50
98.53
98.98
99.63
98.75
Data theft
98.27
96.85
94.39
99.24
95.60
Average
98.95
97.38
97.37
99.34
97.37
Testing phase (30%) Normal
98.61
95.67
97.56
98.88
96.61
DDoS
99.47
99.15
98.24
99.78
98.69
DoS
99.16
97.29
98.42
99.34
97.85
Scan
99.40
98.25
98.72
99.57
98.48
Data theft
98.45
97.41
94.83
99.37
96.10
Average
99.02
97.55
97.55
99.39
97.55
A clear precision-recall examination of the JBOADL-NFAD method under the test database is depicted in Fig. 10. The figure denoted that the JBOADL-NFAD technique has resulted in enhanced values of precision-recall values in every class label.
AI with Deep Learning Model-Based Network Flow Anomaly …
611
Fig. 6 Average analysis of the JBOADL-NFAD system in 70% of the TR database
Fig. 7 Average analysis of the JBOADL-NFAD system in 30% of the TS database
A brief ROC investigation of the JBOADL-NFAD technique under the test database is portrayed in Fig. 11. The fallouts designated by the JBOADL-NFAD algorithm have displayed their ability to classify distinct classes in the test database. To verify the goodness of the JBOADL-NFAD method, a brief comparative study is given in Table 4 [17, 21]. The anomaly detection outcomes of the JBOADL-NFAD approach and existing techniques in terms of accu y and Fscore are illustrated in Fig. 12.
612
S. A. Althubiti
Fig. 8 TRA and VLA analysis of the JBOADL-NFAD system
Fig. 9 TRL and VLL analysis of the JBOADL-NFAD system
The outcomes exhibited by the JBOADL-NFAD approach have reached improved results over the other methods. At the same time, the existing LSTM, GRU, and AdaBoost models have shown ineffectual classification performance. Though the Bi-LSTM model reaches near-optimal accu y and Fscore of 95.66 and 93.44%, the
AI with Deep Learning Model-Based Network Flow Anomaly …
Fig. 10 Precision-recall analysis of the JBOADL-NFAD system
Fig. 11 ROC curve analysis of the JBOADL-NFAD system
613
614
S. A. Althubiti
Table 4 Comparative analysis of the JBOADL-NFAD system with other approaches Methods
Accu y
Precn
Recal
Fscore
JBOADL-NFAD
99.04
97.61
99.40
97.60
LSTM
93.02
96.77
95.11
97.19
GRU
93.95
93.07
93.38
94.95
Bi-LSTM
95.66
93.57
95.85
93.94
SVM
95.57
96.60
94.47
93.72
AdaBoost
93.02
95.12
94.46
93.65
GRU-DAE
94.65
97.06
93.56
94.62
JBOADL-NFAD model outperforms existing ones with maximum accu y and Fscore of 99.04 and 97.60%. The anomaly detection outcomes of the JBOADL-NFAD method and existing techniques interms of precn and recal are exposed in Fig. 13. The outcomes signify that the JBOADL-NFAD method has attained enhanced results over the other models. Simultaneously, the existing LSTM, GRU, and AdaBoost methods have revealed ineffectual classification performance. Though the GRU-DAE technique reaches near-optimal precn and recal of 97.06 and 93.56%, the JBOADL-NFAD approach outperforms existing ones with maximum precn and recal of 97.61 and 99.40%. These results denoted the JBOADL-NFAD model has accomplished improved performance over other ML and DL models on anomaly detection and classification.
Fig. 12 Accu y and Fscore analysis of the JBOADL-NFAD system with other approaches
AI with Deep Learning Model-Based Network Flow Anomaly …
615
Fig. 13 Precn and Recal analysis of JBOADL-NFAD system with other approaches
5 Conclusion In this study, a new JBOADL-NFAD approach was projected to properly identify and classify network anomalies. Primarily, the JBOADL-NFAD approach carried out a min–max normalization approach for scaling every unique feature to a predefined size. Next, the JBOADL-NFAD technique applied the DNN model for anomaly detection. Finally, the JBOA is used as a hyperparameter optimizer where Jarratt’s iterative approach and the BOA are combined to enhance the BOA’s searching process and convergence rate. A wide range of simulations has been performed on benchmark datasets to assure the enhanced anomaly detection outcomes of the JBOADLNFAD approach. The extensive comparison study pointed out the enhancements of the JBOADL-NFAD technique to other recent approaches. In future, feature subset selection and outlier removal processes will be included to extend the performance of the JBOADL-NFAD technique. Besides, the computation complexity of the proposed model will be examined in future.
References 1. Deswal P, Shefali R, Neha C (2022) Anomaly detection in IoT network using deep learning algorithms. Harbin Gongye Daxue Xuebao/J Harbin Inst Technol 54(4):255–262 2. Latah M, Toker L (2018) Towards an efficient anomaly-based intrusion detection for softwaredefined networks. IET Netw 7(6):453–459 3. Sun M, Liu N, Gao M (2022) Research on intrusion detection method based on deep convolutional neural network. In: Artificial intelligence in China, vol 854. Lecture notes in electrical engineering book series. Springer, Singapore, pp 537–544
616
S. A. Althubiti
4. Santhadevi D, Janet B (2022) EIDIMA: edge-based intrusion detection of IoT malware attacks using decision tree-based boosting algorithms. In: High performance computing and networking, vol. 853. Lecture notes in electrical engineering book series. Springer, Singapore 5. Gharib M, Mohammadi B, Dastgerdi SH, Sabokrou M (2019) AutoIDS: auto-encoder based method for intrusion detection system. arXiv preprint arXiv:1911.03306 6. Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K (2018) Deep learning approach combining sparse autoencoder with SVM for network intrusion detection. IEEE Access 6:52843–52856 7. Naseer S, Saleem Y, Khalid S, Bashir MK, Han J et al (2018) Enhanced network anomaly detection based on deep neural networks. IEEE Access 6:48231–48246 8. Simon J, Kapileswar N, Polasi PK, Elaveini MA (2022) Hybrid intrusion detection system for wireless IoT networks using deep learning algorithm. Comput Electr Eng 102:108190 9. Sun M, He L, Zhang J (2022) Deep learning-based probabilistic anomaly detection for solar forecasting under cyberattacks. Int J Electr Power Energy Syst 137:107752 10. Mathonsi T, Zyl TLV (2022) Multivariate anomaly detection based on prediction intervals constructed using deep learning. Neural Comput Appl 1–15. https://doi.org/10.1007/s00521021-06697-x 11. de Paula Monteiro R, Lozada MC, Mendieta DRC, Loja RVS, Filho CJAB et al (2022) A hybrid prototype selection-based deep learning approach for anomaly detection in industrial machines. Exp Syst Appl 204:117528 12. Garg S, Kaur K, Kumar N, Kaddoum G, Zomaya AY et al (2019) A hybrid deep learning-based model for anomaly detection in cloud datacenter networks. IEEE Trans Netw Serv Manage 16(3):924–935 13. Lin P, Ye K, Xu CZ (2019) Dynamic network anomaly detection system by using deep learning techniques. In: Cloud computing—CLOUD 2019: 12th international conference, held as part of the services conference federation, SCF 2019. San Diego, CA, USA, pp 161–176 14. Li Z, Li J, Wang Y, Wang K (2019) A deep learning approach for anomaly detection based on SAE and LSTM in mechanical equipment. Int J Adv Manufact Technol 103(1):499–510 15. Al Jallad K, Aljnidi M, Desouki MS (2020) Anomaly detection optimization using big data and deep learning to reduce false-positive. J Big Data 7(1):1–12 16. Xiang L, Yang X, Hu A, Su H, Wang P et al (2022) Condition monitoring and anomaly detection of wind turbine based on cascaded and bidirectional deep learning networks. Appl Energy 305:117925 17. Kao MT, Sung DY, Kao SJ, Chang FM (2022) A novel two-stage deep learning structure for network flow anomaly detection. Electronics 11(10):1531 18. EIbrahim L, Mohamed ZE (2017) Improving error back propagation algorithm by using cross entropy error function and adaptive learning rate. Int J Comput Appl 161(8):5–9 19. Arora S, Singh S (2019) Butterfly optimization algorithm: a novel approach for global optimization. Soft Comput 23:715–734 20. Sihwail R, Solaiman OS, Ariffin KAZ (2022) New robust hybrid Jarratt-Butterfly optimization algorithm for nonlinear models. J King Saud Univ Comput Inform Sci 34(10):8207–8220 21. Ullah I, Mahmoud QH (2022) Design and development of rnn anomaly detection model for IoT networks. IEEE Access 10:62722–62750
Golden Jackal Optimization with Deep Learning-Based Anomaly Detection in Pedestrian Walkways for Road Traffic Safety Saleh Al Sulaie
Abstract Road traffic safety discusses the procedures and measures utilized for preventing road users from being dead or critically injured. Archetypal road users contain horse riders, cyclists, pedestrians, vehicle passengers, motorists, and passengers of on-road public transport (mostly buses and trams). Anomaly detection in pedestrian pathways is a crucial investigation topic, generally utilized for improving pedestrian safety. Because of the varied consumption of video surveillance methods and the improved quantity of captured videos, the typical manual analysis of labeling abnormal proceedings is a tiresome task, thus, an automated surveillance method in which anomaly detection develops important betwixt computer vision researchers. At present, the progress of deep learning (DL) algorithms has obtained important interest in distinct computer vision procedures. Therefore, this article introduces a new Golden Jackal Optimization with Deep Learning-based Anomaly Detection in Pedestrian Walkways (GJODL-ADPW) for road traffic safety. The presented GJODL-ADPW technique aims to effectively recognize the presence of anomalies (such as vehicles, skaters) on pedestrian walkways. In the presented GJODLADPW technique, Xception methodology was exploited for effective extraction feature process. For optimal hyperparameter selection, the GJO algorithm is utilized in this study. Finally, bidirectional long short-term memory (BiLSTM) approach was employed for anomaly detection purposes. A widespread experimental analysis is performed to examine the enhanced performance of the GJODL-ADPW system. A detailed comparative analysis demonstrated the enhancements of the GJODL-ADPW technique over other recent approaches. Keywords Pedestrian walkways · Road safety · Surveillance system · Anomaly detection · Deep learning
S. Al Sulaie (B) Department of Industrial Engineering, College of Engineering in Al-Qunfudah, Umm Al-Qura University, Makkah 21955, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_50
617
618
S. Al Sulaie
1 Introduction With the fast growth of motorization, the number of casualties relevant to road traffic accidents is raising exponentially. Nowadays, road traffic safety is one of the significant concerns [1]. Road traffic accidents may not just endanger the lives of people, leads to significant property loss, and however even affects the functional status of the road, causing traffic congestion and minimizing service levels and road capacity. The aspects affecting road traffic safety are environment, drivers, road, road traffic management, vehicles, etc. [2]. Certainly, many road traffic accidents are caused by single component, however by coupling effect of many factors. Moreover, with advancement of intellectual networked vehicles, traffic safety management even encounters challenges [3]. Nowadays, it is significant to convert the technical benefits of intelligence and network connectivity into merits in traffic safety governance and road traffic management. Thus, to formulate reasonable road traffic safety measures and enhance road traffic safety to assure future traffic safety and traffic efficiency in intellectual networked atmosphere, it is essential to explore influence system of road traffic safety from multi-factor perspective [4]. More than 270,000 pedestrians’ lives are lost on the world’s roads annually. The capability responding to pedestrian safety was a significant element of efforts to thwart road traffic injuries [5]. Pedestrian collisions, like other road traffic crashes, could not be recognized as unavoidable since they can be preventable and predictable. Current technological advancements, such as surveillance cameras (CCTV) and computer vision (CV), were employed for protecting pedestrians and promoting safe walking and need a kind of nature of risk components for pedestrian crashes [6]. Several CV relies on studies that were modeled by concentrating on some of the operations such as behavioral learning, scene learning, data acquisition, feature extraction, and activity learning [7]. The main purpose of such work was to calculate the functions like traffic observation, scene detection, vehicle prediction, and observation, video processing models, human behavior learning, multi-camera-relied techniques and challenges, anomaly prediction approaches, activity analysis, etc. In this work, anomalous prediction will be a sub-domain of behavioral learning from captured visual scenes [8]. Additionally, anomalous predictive techniques comprehend common behavior through training procedures. Several important changes in normal performance were treated as anomalous [9]. Certain examples of anomalies are jaywalking, the presence of vehicles on pathways, signal bypassing at a traffic junction, person faints, whereas walking, unexpected dispersion of an individual in crowd, Uturn of vehicles in red signals. Eventually, deep learning (DL) depends on anomaly predictive techniques that were deployed. Primarily, CNNs were used and classified the presence of objects [10]. This article introduces a new Golden Jackal Optimization with Deep Learningbased Anomaly Detection in Pedestrian Walkways (GJODL-ADPW) for road traffic safety. The presented GJODL-ADPW technique aims to effectively recognize the presence of anomalies (such as vehicles, skaters) on pedestrian walkways. In the presented GJODL-ADPW technique, Xception model is exploited for effective
Golden Jackal Optimization with Deep Learning-Based Anomaly …
619
feature extraction process. For optimal hyperparameter selection, the GJO algorithm is utilized in this study. Finally, bidirectional long short-term memory (BiLSTM) approach was employed for AD purposes. An extensive experimental examination was made to investigate the better outcomes of the GJODL-ADPW system.
2 Related Works Pustokhina et al. [11] designed an automatic DL-based AD method in pedestrian walkways (DLADT-PW) for the safety of susceptible road users. The aim is to classify and detect the different anomalies that are present in pedestrian paths like jeeps, skating, cars, and so on. The presented method includes preprocessing as the primary stage that is employed to remove the noise and increase the image quality. Furthermore, Mask-RCNN with DenseNet models is used for the recognition technique. The authors [12] developed an agent-based architecture to evaluate pedestrian safety at un-signalized crosswalks. Un-signalized mid-block crosswalks with refuge islands (UMCR) were regarded as an instance that facilitates to performance of the presented architecture, whereby applicable behavioral components like minimum safety margin time, the reaction time, and visual field with difficulties are tackled. The vehicle–vehicle communication is taken into account, and pedestrian–vehicle communication can be modeled. Wang and Yang [13] developed a convolution recurrent autoencoder (CR-AE) that integrates an attention-based CR-AE and convolution LSTM (ConvLSTM) system. The presented method captures the irregularity of spatial irregularity and temporal pattern, correspondingly. The attention model has been employed for obtaining the present output features in hidden state of every Covn-LSTM layer. Anik et al. [14] presented an architecture based on ANN to forecast jaywalker’s trajectory while crossing the road. At present, distinct conditional and causal variables based on jaywalking including direction of crossing, gender, running or walking, roadway lane number, cell phone use, and so forth are considered input variables. By testing prediction accuracy of ANN architecture related to MSE and correlation coefficient, a better ANN framework to forecast jaywalker movement can be defined. Gayal and Patil [15] developed an automated AD technique for detecting anomalies in surveillance video. Primarily, the input surveillance video can be exposed to object recognition utilizing threshold process, and the object tracking was applied by means of minimum outcome sum of squared error tracking system (MOSSE). Later, the feature vector with statistical and textual characteristics was provided as input to DCNN classifiers that categorize the video as normal or abnormal. Sabour [16] recommends two groundbreaking solutions for discovering anomalies in smart transport systems using RNN. The summary of driving RNNs and AD techniques is introduced. Next, two recommended solutions, ThirdEye and DeepFlow, are deliberated. DeepFlow is a technique to identify abnormal traffic flow in smart cities. It has been discussed that finding an entire dataset of vehicle behaviors in driving scenario can be highly challenging. Li et al. [17] developed a future
620
S. Al Sulaie
frame predictive model and a multiple instance learning (MIL) mechanism by leveraging attention scheme for learning anomalies. In this work, the authors utilized the attention-based model for localizing anomalies.
3 The Proposed Model In this article, we have introduced a novel GJODL-ADPW technique for AD in pedestrian walkways for improved road traffic security. It encompasses a series of operations like bilateral filtering (BF)-based noise removal, Xception extracting feature, GJO parameter tuning, and BiLSTM classification.
3.1 BF-Based Noise Removal At the preliminary level, the BF technique is exploited to discard the noise that exists in it. The BF interchanges the central pixel of every filter window through weight average of the neighboring colors [18]. The weighted function has been intended for smoothing from the area of related color pixels however keeping edges intact by heavily weighted individual pixels, i.e., photometrical and spatial associated with the central pixel. ||·||2 represents the Euclidean norms and Fu denotes the central pixel. Later, the weighted W(Fu , Fv ) corresponding to pixel Fv and Fu represents the product of two components, one photometrical, and one spatial as follows: W(Fu , Fv ) = Ws (Fu , Fv )W p (Fu , Fv )
(1)
The spatial element Ws (Fu , Fv ) has been provided by: Ws (Fu , Fv ) = e
−
||u−v||22 2 2σS
(2)
And the photometrical element W p (Fu , Fv ) has offered as: W p (Fu , Fv ) = e
−
E Lab (Fu ,Fv )2 2σ p2
,
(3)
1 whereas E Lab = (L ∗ )2 + (a ∗ )2 + (b∗ )2 2 illustrates the perceptual color errors from the L ∗ a ∗ b∗ color space, and σs , σ p > 0. The color vector resultant Fu of filtering has been evaluated by the normalization weight as follows: F ∈w Fu = V
W(Fu , Fv )Fv
FV ∈w
W(Fu , Fv )
(4)
Golden Jackal Optimization with Deep Learning-Based Anomaly …
621
The Ws weight purpose decreases as spatial distance in the image among u and v rise, and W p weight function decreases as perceptual color variances between the color vectors increase. The photometric component reduces the control of individual pixels that are dissimilar, whereas the spatial component reduces the control of farthest pixels decreasing blurring. In the presented method, sharpness of edges is maintained, and perceptually related regions of pixels were averaged together. σs and σ p parameters are correspondingly applied to adjust the control of photometric and spatial components. Consider that rough threshold to recognize pixels applicably related or closer to the central one. Note that if σ p → ∞ the BF method Gaussian filter and if σs → ∞, the filter method ranges filter without spatial conception. In such cases, if σ p → ∞ and σs → ∞ are combined, then BF is performed as AMF.
3.2 Feature Extraction: Optimal Xception Model To derive optimal set of features, the presented GJODL-ADPW technique executes the Xception model. The Xception module can be used for extraction features from the preprocessed face images [19]. Lately, DL technique becomes a popular application derived from the ML technique with an increase in multilayer FFNN. Several layers in traditional NN can be constrained in application of constraint hardware features because of the relationships among the layers and learned parameters needing maximum computation time. The DL algorithm is derivative in CNN and is applied in extensive applications including object prediction, voice analysis, image processing, and machine learning. Also, CNN is a multilayer NN. In addition, CNN helped from F E which minimizes the preprocessing phase to further extent. Consequently, it is inadequate for doing a preliminary analysis to identify visual characteristics. Input, convolution, pooling, fully connected (FC), ReLU, dropout, and classifier layers are components of the CNN architecture. The performance of the layer can be defined as follows. • Convolutional layer depends on the CNN technique that is applied for extracting the feature map via pixel metrics of image. Also, it relies on the circulation of specific filters. Consequently, a novel image matrix progress had been achieved. The filter has different sizes namely 3 × 3, 5 × 5, 7 × 7, 9 × 9, or 11 × 11. • The input layer is the initial layer of CNN. The data is offered to the system without preprocessing. Then, input size diverging from pre-trained DL architecture has been used. • The pooling layer is defined afterward the Conv. and ReLU approaches are employed. It is primarily exploited for preventing the scheme from memorizing and reducing the size of input. It considers the average or max pooling function on image matrix that considered a specific characteristic. Next, maximal pooling has been applied; meanwhile, it demonstrated the better function. • The ReLU layer is positioned behindhand the Conv. layer. The key features that differentiate the function in activation function, including a sinus and hyperbolic
622
S. Al Sulaie
tangent, have led to compelling conclusion. Typically, this function can be utilized for nonlinear transformation techniques. • The classification layer is constructed after the FC Layer, once the classification process was implemented. The softmax function is exploited for achieving these metrics. • FC layers occur afterward ReLU, pooling, and Conv. layers. The neuron in this layer is interconnected with the region of the predefined layer. • The dropout layer was used for removing the network memory. The effectiveness of these methods is evaluated by means of the random elimination of some nodes. The DL-based Xception framework is used for extracting features. The Xception technique is similar to inception, excepting that the inception has substituted by depthwise separate Conv. layer. Particularly, the framework of Xception is based on a linear stack of depthwise independent convolution layers with the linear residual attached. In this method, both depthwise and pointwise layers are utilized. During the pointwise layer, a 1 × 1 convolutional layer map s the outcome of channel space from the depthwise convolution applications, and during the depthwise layer, spatial convolutions perform manually in the channel of input dataset. For optimal hyperparameter tuning of the Xception approach, the GJO technique was utilized. The GJO is a novel optimization technique stimulated by the cooperative attacking behaviors of the golden jackals [20]. In this work, every golden jackal signifies a search agent or candidate solution. In this phase, the prey population can be randomly initialized for obtaining the uniform distribution candidate solution that can be evaluated by: Y0 = Ymin + rand(Ymax − Ymin )
(5)
In Eq. (5), Y0 formulates the position of the primary golden jackal population, rand formulates an arbitrary number ranging from zero to one, Y min and Y max formulates the lower and upper boundaries of the solution. The first and second fittest is a jackal pair, and the primary matrix individual can be evaluated by ⎧ Y1,1 ⎪ ⎪ ⎨ Y2,1 Prey = ⎪ M ⎪ ⎩ Yn,1
Y1,2 Y2,2 M Yn,2
L L M L
⎫ Y1,d ⎪ ⎪ ⎬ Y2,d M ⎪ ⎪ ⎭ Yn,d
(6)
In Eq. (6), Yi, j formulates the jth dimension of ith prey, n denotes the overall amount of prey, and d denotes the variable of the problem:
FOA
⎫ ⎧ f Y1,1 , Y1,2 , Y1,d ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ f Y2,1 , Y2,2 , Y2,d = ⎪ ⎪ M ⎪ ⎪ ⎩ ⎭ f Yn,1 , Yn,2 , Yn,d
(7)
Golden Jackal Optimization with Deep Learning-Based Anomaly …
623
In Eq. (7), FOA indicates a matrix which involves the fitness value of the preys, and f shows the fitness function. The initial fittest can be interpreted by the male jackal, and the next fittest can be interpreted by the female jackal. The jackal pair identifies the commensurable individual location. The golden jackal predicts and captures the prey based on its own attacking features; however, the prey sometimes rapidly evades and escapes the jackal foraging. Therefore, the female jackal follows the male jackal for achieving waiting and searching for other prey as follows: Y1 (t) = Y M (t) − E · |Y M (t) − rl · Prey(t)|
(8)
Y2 (t) = YFM (t) − E · |YFM (t) − rl · Prey(t)|
(9)
where t expresses the existing iteration, Prey (t) expresses the position vector, Y M (t) and Y F M (t) express the current positions of the male and female jackals, respectively. Y1 (t) and Y2 (t) express the refreshed positions of the male and female jackals. Figure 1 illustrates the steps involved in GJO technique. The evading energy of prey E can be evaluated by: E = E1 ∗ E0 Fig. 1 Steps involved in GJO
(10)
624
S. Al Sulaie
In Eq. (10), E 1 denotes the diminishing energy of prey, and E 0 formulates the original state of energy. E0 = 2 ∗ r − 1
(11)
In Eq. (11), r indicates the arbitrary number within [0, 1]. t E 1 = c1 ∗ 1 − T
(12)
In Eq. (12), T shows the maximal iteration, c1 denotes a constant with a value of 1.5, and E 1 declines linearly from 1.5 to 0 based on the iteration. rl formulates the arbitrary vector according to the Lévy distribution: rl = 0.05 ∗ LF(y)
(13)
The LF shows the fitness function of levy flight as follows: 1 LF(y) = 0.01 × (μ × σ )/ v β
⎤ β1 ⎡ πβ (1 + β) × sin 2 ; σ =⎣ β−1 ⎦ 1+β 2 ×β × 2 2
(14)
In Eq. (14), μ and v stand for the arbitrary number within [0,1], β shows the constant with the value of 1.5 and it can be formulated by: Y (t + 1) =
Y1 (t) + Y2 (t) 2
(15)
The evading energy of prey would quickly reduce once it is attacked by the jackal pair, and the golden jackal instantaneously encloses and captures the prey: Y1 (t) = Y M (t) − E · |rl · Y M (t) − Prey(t)|
(16)
Y2 (t) = YFM (t) − E · |rl · YFM (t) − Prey(t)|
(17)
Now t denotes the present iteration, Prey (t) indicates the location vector, and Y M (t) and YFM (t) represent the present location of the male and female jackals correspondingly. Y1 (t) and Y2 (t) denote the refreshed position of the male and female jackals. Lastly, the updated location of the golden jackal can be evaluated as follows. The GJO manner grows a fitness function (FF) for accomplishing greater classifier efficacy. It defines a positive integer for signifying a good effectual of candidate solutions. During this study, the minimizing of classifier rate of errors is assumed that FF is written in Eq. (18).
Golden Jackal Optimization with Deep Learning-Based Anomaly …
fitness(xi ) = classifier error rate(xi ) number of misclassified samples ∗ 100 = Total number of samples
625
(18)
3.3 Anomaly Detection: BiLSTM Model Finally, the BiLSTM model is applied for the recognition of anomalies. Recurrent neural network (RNN) processes the sequence of input dataset and output based on present input and state that has the memory of previous input [21]. In conventional NN, each input is independent of one another. RNN is like human memory, as we read we understand sentences according to words we recognize and read. Another way of thinking about RNN is to understand it as a sequence of NNs or a looping layer. This framework enables older data to be remembered via the learning process. The architecture of RNN is relatively easy. The output h t is a fusion of the data from preceding cell h t−1 (cell is one piece of the chain) and novel dataset xt .
h t = tan h W · h t−1 , xt + b
(19)
LSTM is an RNN structure that can able to learn long-term dependency. LSTM is commonly applied in distinct fields of study, where will we find the sequence, as the action is sequence of words in a sentence or some sequence of frames. The RNN has vanishing gradient problem for long input sequences. RNN is unrolled into a feedforward network with many layers (single layer for a one-time step). Once the gradient is passed back more than once for every time step, it sometimes explodes (tends to vanish), similarly it happens in the typical NN with a larger number of layers. LSTM resolves these problems by having three gates. LSTM has a chain structure as each RNN model. The more interesting about these models is that they contain different gates. The gate is NNs which decides which data is significant and required to be passed to the cell. The LSTM has two kinds of memories: cell state Ct that acts as long-term memory and hidden state h t that acts as working memory (a state at current time step). Different from LSTM, RNN has hidden state. The initial gate (it is positioned in the bottom left) is forget gate, now select which part of the data needed to ignore forget from the preceding hidden state h t−1 (a bottom line from the preceding cell), based on input xt .
f f = σ W f · h t−1; xt + b f
(20)
The next gate is input gate that selects, what we needed to be updated in a cell state Ct , knowing novel input dataset xt . The output after the sigmoid is i t . The sigmoid assists to carry only significant values. Then, reorganize value in the − 1 to 1 range to display the significance of all the values.
626
S. Al Sulaie
i t = σ Wi · h t−1 ; xt + bi
(21)
C˜ = tan h WC · h t−1 , xt + bC
(22)
The cell state is a sum of preceding cell state Ct−1 multiplied by output from forget gate f t , later add output of Input gate i t ∗ C˜ t , based on importance of the value that is scaled. The cell state Ct is the upper line goes into the next cell. Ct = f t ∗ C f −1 + i t ∗ C˜ t
(23)
The final gate is output gate, where we select the data that hidden state must hold h t , along with, the output from the cell h t (the arrow out of cell pointing up). Now, select related value to output based on the input xt and the preceding hidden state h t−1 .
0t = σ W0 · h t−1; xt + b0
(24)
h t = 0t ∗ tan h(Ct )
(25)
Bi-LSTM is a mixture of two LSTMs, where one is taking data as it is (from the beginning), whereas others begin from the sequence end. This technique gets a better understanding of information. It has been demonstrated that they outperformed unidirectional LSTM in several studies. The major drawback of this technique is the entire sequence needs to pass to the model at the beginning. Figure 2 demonstrates the framework of BiLSTM. The output yt is the integration of forward ht and backward h ← t hidden states. This might be summation, concatenation, and multiplication taking average.
Fig. 2 Architecture of BiLSTM
Golden Jackal Optimization with Deep Learning-Based Anomaly …
yt = ht , h ← t
627
(26)
4 Results and Discussion In this section, the experimental results of the GJODL-ADPW model are validated on two datasets [22]: Test-004 and Test-007. Figure 3 depicts some sample images. Table 1 and Fig. 4 offer a comparative accu y assessment of the GJODLADPW model on Test-004 dataset. The results show that the MDT model has the inability to attain improved performance. At the same time, the FR-CNN model has shown certainly increased outcomes. Although the DLADT-PW and RS-CNN models demonstrated considerable performance, the GJODL-ADPW model has outperformed other models with maximum accu y values. Table 2 and Fig. 5 provide a comparative accu y investigation of the GJODLADPW method on Test-007 dataset. The outcomes exhibited that the MDT technique has the inability for attaining improved performance. Also, the FR-CNN approach has
Fig. 3 Sample images
628
S. Al Sulaie
Table 1 Accu y analysis of GJODL-ADPW system with other recent algorithms under Test-004 No. of frame sequence
GJODL-ADPW
DLADT-PW algorithm
RS-CNN algorithm
FR-CNN algorithm
MDT algorithm
SF-40
96.11
96.07
93.37
83.67
76.64
SF-42
97.94
95.95
95.22
83.28
76.00
SF-46
97.40
96.91
96.25
84.53
74.78
SF-51
98.65
97.28
94.86
89.20
78.37
SF-75
99.24
98.94
98.55
81.64
77.35
SF-106
99.10
98.24
97.62
91.16
83.43
SF-123
99.35
98.44
97.57
91.03
88.27
SF-135
99.56
99.44
96.77
93.22
80.66
SF-136
99.46
97.23
96.92
95.01
84.25
SF-137
98.94
98.68
98.36
92.20
85.62
SF-149
99.45
98.75
95.80
84.95
77.87
SF-158
99.72
98.28
97.05
79.62
76.67
SF-177
99.91
98.74
96.53
80.86
75.37
SF-178
98.91
98.18
98.80
83.68
76.31
SF-180
98.51
98.04
97.55
86.41
85.14
Fig. 4 Accu y analysis of GJODL-ADPW system under Test-004
Golden Jackal Optimization with Deep Learning-Based Anomaly …
629
revealed certainly improved outcomes. But the DLADT-PW and RS-CNN methodologies outperformed considerable performance, and the GJODL-ADPW method has outperformed other techniques with maximal accu y values. Table 3 provides an average accu y assessment of the GJODL-ADPW model with other models. The outcome ensured that the GJODL-ADPW approach has reached maximum performance on both datasets. For sample, on Test-004 dataset, the GJODL-ADPW method has shown enhanced average accu y of 98.82%, while the DLADT-PW, RS-CNN, FR-CNN, and MDT methods have reached decreased average accu y of 97.94%, 96.75%, 86.70%, and 79.78%, respectively. Meanwhile, on Test-007 dataset, the GJODL-ADPW approach has exposed improved average accu y of 98.70%, while the DLADT-PW, RS-CNN, FR-CNN, and MDT techniques have attained decreased average accu y of 97.94%, 96.75%, 86.70%, and 79.78% correspondingly. Table 4 and Fig. 6 provide a comparative TPR study of the GJODL-ADPW model on Test-004 dataset. The outcomes show that the MDT methodology has the inability to obtain enhanced performance. Simultaneously, the FR-CNN method has exposed certainly increased outcomes. Afterward, the DLADT-PW and RSCNN techniques demonstrated considerable performance, and the GJODL-ADPW approach outperformed other models with maximal TPR values. Table 5 and Fig. 7 provide a comparative TPR examination of the GJODL-ADPW approach on Test-007 dataset. The outcomes depicted that the MDT system has the inability to reach improved performance. At the same time, the FR-CNN method Table 2 Accu y analysis of GJODL-ADPW system with other recent algorithms under Test-007 No. of frame sequence
GJODL-ADPW
DLADT-PW algorithm
RS-CNN algorithm
FR-CNN algorithm
MDT algorithm
SF-78
96.11
95.62
94.90
89.75
84.11
SF-91
97.94
98.87
95.95
90.84
84.51
SF-92
97.40
97.97
97.49
92.68
86.19
SF-110
98.65
96.77
94.25
91.04
83.64
SF-113
99.24
93.83
92.57
87.88
85.20
SF-115
99.10
86.81
83.76
82.63
81.02
SF-125
99.35
98.11
96.90
92.18
89.19
SF-142
99.16
96.99
96.64
94.88
82.43
SF-146
99.16
83.63
83.02
79.27
74.88
SF-147
98.94
86.58
83.07
80.75
80.34
SF-148
99.15
79.92
77.88
76.14
74.16
SF-150
99.72
90.25
86.33
83.36
69.99
SF-178
99.11
80.67
80.24
74.34
63.47
SF-179
98.91
78.81
75.67
69.69
60.36
SF-180
98.51
87.01
81.77
75.61
72.49
630
S. Al Sulaie
Fig. 5 Accu y analysis of GJODL-ADPW system under Test-007
Table 3 Average accuracy analysis of GJODL-ADPW system with other recent algorithms Average accuracy (%) Methods
GJODL-ADPW
DLADT-PW algorithm
RS-CNN algorithm
FR-CNN algorithm
MDT algorithm
Test-004
98.82
97.94
96.75
86.70
79.78
Test-007
98.70
97.94
96.75
86.70
79.78
has exposed certainly increased outcomes. Although the DLADT-PW and RS-CNN systems demonstrated considerable performance, the GJODL-ADPW algorithm has outperformed other techniques with maximal TPR values. In Table 6, a brief comparative analysis of the GJODL-ADPW method with other existing algorithms is given [11, 13]. Figure 8 exhibits a comparative AUCscore investigation of the GJODL-ADPW model with other methods. The outcome stated that the GJODL-ADPW model has gained effectual outcomes over other models with AUCscore of 96.12%. In contrast, the existing DLADT-PW, RS-CNN, FR-CNN, and MDT models have reported reduced performance with AUCscore of 87.80%, 88.86%, 89.12%, and 88.34%, respectively. Figure 9 showcases a comparative CT investigation of the GJODL-ADPW system with other models. The outcomes referred that the GJODL-ADPW methodology
Golden Jackal Optimization with Deep Learning-Based Anomaly …
631
Table 4 TPR analysis of GJODL-ADPW system with other recent algorithms under Test-004 True positive rate (test sequence—004) False positive rate
GJODL-ADPW
DLADT-PW algorithm
RS-CNN algorithm
FR-CNN algorithm
MDT algorithm
0.00
0.0000
0.0000
0.0000
0.0000
0.0000
0.05
0.1978
0.1928
0.1726
0.1700
0.1322
0.10
0.5013
0.3493
0.3417
0.2963
0.2559
0.15
0.6099
0.4402
0.4705
0.4275
0.4048
0.20
0.7487
0.5765
0.5891
0.5235
0.4780
0.25
0.9299
0.6926
0.6951
0.6522
0.5967
0.30
0.9703
0.8214
0.7810
0.7532
0.6674
0.35
0.9905
0.8592
0.8365
0.8592
0.7886
0.40
0.9905
0.9173
0.8820
0.9173
0.8567
0.45
0.9981
0.9375
0.9400
0.9855
0.8845
0.50
0.9981
0.9552
0.9577
0.9855
0.9173
0.55
0.9981
0.9627
0.9779
0.9855
0.9501
0.60
0.9981
0.9627
0.9829
0.9855
0.9829
0.65
0.9981
0.9627
0.9804
0.9855
0.9829
0.70
0.9981
0.9754
0.9880
0.9855
0.9829
0.75
0.9981
0.9829
0.9880
0.9855
0.9829
0.80
0.9981
0.9855
0.9880
0.9855
0.9829
0.85
0.9981
0.9973
0.9880
0.9855
0.9829
0.90
0.9981
0.9973
0.9880
0.9855
0.9829
0.95
0.9981
0.9973
0.9880
0.9855
0.9829
1.00
0.9981
0.9973
0.9880
0.9855
0.9829
has attained effectual outcomes over other methods with CT of 2.32 s. In contrast, the existing DLADT-PW, RS-CNN, FR-CNN, and MDT approaches have reported decreased performance with CT of 2.65 s, 3.59 s, 3.18 s, and 3.37 s correspondingly. From these extensive results, it can be assumed that the GJODL-ADPW approach has accomplished maximal performance over other models.
632
S. Al Sulaie
Fig. 6 TPR analysis of GJODL-ADPW system under Test-004 Table 5 TPR analysis of GJODL-ADPW system with other recent algorithms under Test-007 True positive rate (test sequence—007) False positive rate
GJODL-ADPW
DLADT-PW algorithm
RS-CNN algorithm
FR-CNN algorithm
MDT algorithm
0.00
0.0000
0.0000
0.0000
0.0000
0.0000
0.05
0.3708
0.2838
0.1937
0.0610
0.2079
0.10
0.4703
0.3113
0.2396
0.2315
0.2632
0.15
0.6802
0.5662
0.3845
0.3653
0.4690
0.20
0.7676
0.6266
0.4271
0.5412
0.6038
0.25
0.7960
0.6675
0.6980
0.6132
0.6344
0.30
0.8620
0.6912
0.7430
0.6685
0.7065
0.35
0.9317
0.8003
0.7849
0.8087
0.7417
0.40
0.9320
0.8003
0.7849
0.8087
0.7637
0.45
0.9496
0.8206
0.8621
0.8457
0.8686
0.50
0.9771
0.8517
0.8691
0.8647
0.9231
0.55
0.9771
0.8714
0.8805
0.8817
0.9271
0.60
0.9771
0.8926
0.9020
0.8817
0.9513
0.65
0.9771
0.8926
0.9127
0.8947
0.9513
0.70
0.9771
0.8987
0.9464
0.9017
0.9513
0.75
0.9771
0.9227
0.9731
0.9147
0.9513 (continued)
Golden Jackal Optimization with Deep Learning-Based Anomaly …
633
Table 5 (continued) True positive rate (test sequence—007) False positive rate
GJODL-ADPW
DLADT-PW algorithm
RS-CNN algorithm
FR-CNN algorithm
MDT algorithm
0.80
0.9771
0.9369
0.9731
0.9337
0.9513
0.85
0.9771
0.9369
0.9771
0.9562
0.9513
0.90
0.9771
0.9369
0.9661
0.9562
0.9513
0.95
0.9771
0.9682
0.9661
0.9562
0.9513
1.00
0.9771
0.9682
0.9661
0.9562
0.9513
Fig. 7 TPR analysis of GJODL-ADPW system under Test-007 Table 6 Comparative analysis of GJODL-ADPW system with other approaches
Methods
AUC score (%)
Computational time (s)
GJODL-ADPW
96.12
2.32
DLADT-PW algorithm
87.80
2.65
RS-CNN algorithm
88.86
3.59
FR-CNN algorithm
89.12
3.18
MDT algorithm
88.34
3.37
634
Fig. 8 AUCscore analysis of GJODL-ADPW system with other approaches
Fig. 9 CT analysis of GJODL-ADPW system with other approaches
S. Al Sulaie
Golden Jackal Optimization with Deep Learning-Based Anomaly …
635
5 Conclusion In this article, we have introduced a new GJODL-ADPW technique for AD in pedestrian walkways for improved road traffic security. In the presented GJODL-ADPW technique, Xception model is exploited for effective feature extraction process. For optimal hyperparameter selection, the GJO algorithm is utilized in this study. Finally, the BiLSTM model is employed for AD purposes. A widespread experimental analysis is carried out to examine the enhanced performance of the GJODL-ADPW system. A comprehensive comparative analysis depicted the enhancements of the GJODL-ADPW technique over other methodologies. Thus, the GJODL-ADPW system was executed as an accurate tool for improving road safety and reducing risks. In future, object detection techniques can be employed for AD in roadways.
References 1. Saxena A, Yadav AK (2022) Clustering pedestrians’ perceptions towards road infrastructure and traffic characteristics. Int J Injury Control Safety Prom 30:1–11 2. Wang H (2021) Development and application of sidewalk anomaly detection algorithm using mobile sensors (Doctoral dissertation, State University of New York at Binghamton) 3. Santhosh KK, Dogra DP, Roy PP (2020) Anomaly detection in road traffic using visual surveillance: a survey. ACM Comput Surv 53(6):1–26 4. Gálvez-Pérez D, Guirao B, Ortuño A, Picado-Santos L (2022) The influence of built environment factors on elderly pedestrian road safety in cities: the experience of Madrid. Int J Environ Res Public Health 19(4):2280 5. Santilli D, D’Apuzzo M, Evangelisti A, Nicolosi V (2021) Towards sustainability: new tools for planning urban pedestrian mobility. Sustainability 13(16):9371 6. Mukherjee D, Saha P (2022) Walking behaviour and safety of pedestrians at different types of facilities: a review of recent research and future research needs. SN Social Sci 2(5):1–16 7. Adinarayana B, Mir MS (2021) Development of pedestrian safety index models for safety of pedestrian flow at un-signalized junctions on urban roads under mixed traffic conditions using MLR. Innov Infrastruc Solutions 6(2):1–9 8. Azouz M, Fahim A (2022) A sustainable road safety approach for pedestrians in new cities of Egypt. In: IOP conference series: earth and environmental science, vol 1056, no. 1. IOP Publishing, pp 012031 9. Rankavat S, Tiwari G (2020) Influence of actual and perceived risks in selecting crossing facilities by pedestrians. Travel Behav Society 21:1–9 10. Cie´sla M (2021) Modern urban transport infrastructure solutions to improve the safety of children as pedestrians and cyclists. Infrastructures 6(7):102 11. Pustokhina IV, Pustokhin DA, Vaiyapuri T, Gupta D, Kumar S, Shankar K (2021) An automated deep learning based anomaly detection in pedestrian walkways for vulnerable road users safety. Saf Sci 142:105356 12. Zhu H, Almukdad A, Iryo-Asano M, Alhajyaseen WK, Nakamura H, Zhang X (2021) A novel agent-based framework for evaluating pedestrian safety at unsignalized mid-block crosswalks. Accid Anal Prev 159:106288 13. Wang B, Yang C (2022) Video anomaly detection based on convolutional recurrent AutoEncoder. Sensors 22(12):4647 14. Anik MAH, Hossain M, Habib MA (2021) Investigation of pedestrian jaywalking behaviour at mid-block locations using artificial neural networks. Saf Sci 144:105448
636
S. Al Sulaie
15. Gayal BS, Patil SR (2022) Detecting and localizing the anomalies in video surveillance using deep neuralnetwork with advanced feature descriptor. In: 2022 international conference on advances in computing, communication and applied informatics (ACCAI). IEEE, pp. 1–9 16. Sabour S (2022) Driving anomaly detection using recurrent neural networks (Master’s thesis, Science) 17. Li Q, Yang R, Xiao F, Bhanu B, Zhang F (2022) Attention-based anomaly detection in multiview surveillance videos. Knowl Based Syst 252:109348 18. Sasank VVS, Venkateswarlu S (2022) An automatic tumour growth prediction based segmentation using full resolution convolutional network for brain tumour. Biomed Signal Process Control 71:103090 19. Sunitha G, Geetha K, Neelakandan S, Pundir AKS, Hemalatha S, Kumar V (2022) Intelligent deep learning based ethnicity recognition and classification using facial images. Image Vis Comput 121:104404 20. Chopra N, Ansari MM (2022) Golden jackal optimization: a novel nature-inspired optimizer for engineering applications. Expert Syst Appl 198:116924 21. Thara DK, PremaSudha BG, Xiong F (2019) Epileptic seizure detection and prediction using stacked bidirectional long short term memory. Pattern Recogn Lett 128:529–535 22. http://www.svcl.ucsd.edu/projects/anomaly/dataset.html
Explainable Artificial Intelligence-Enabled Android Malware Detection Model for Cybersecurity Laila Almutairi
Abstract Malicious attacks on Android mobile devices are increasing with the rapid growth of using smartphone devices. The Android system implemented various sensitive applications like banking applications; thus, it has become the target of malware that exploited the vulnerability of the security system. Certain researchers presented methods for detecting mobile malware. But, advances are needed to reach higher performance and efficiency. Later, machine learning (ML) techniques are utilized for identifying Android-directed malicious attacks. This study develops an explainable artificial intelligence-enabled android malware detection for cybersecurity (XAIAMD-CS) model. The presented XAIAMD-CS technique accomplishes cybersecurity via the accurate identification of Android malware attacks. Initially, the XAIAMD-CS technique designs a new group teaching optimization algorithmbased feature selection (GTOA-FS) technique. Besides, the gradient boosting tree (GBT) model is applied for the accurate identification and classification of Android malware. To improve the classifier results of the GBT method, the salp swarm optimization (SSO) algorithm is applied in this work. The performance validation of the XAIAMD-CS technique can be tested using the Android malware dataset, and the outcomes were inspected in terms of distinct measures. The results demonstrate the improvements in the XAIAMD-CS algorithm. Keywords Cybersecurity · Android malware · Machine learning · Metaheuristics · Explainable artificial intelligence
L. Almutairi (B) Department of Computer Engineering, College of Computer and Information Sciences, Majmaah University, Al-Majmaah 11952, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_51
637
638
L. Almutairi
1 Introduction Recently, the Android operation system has engrossed malware developers, whose work seems to be growing rapidly. Several malware developers focused on hacking mobile and converting them into bots [1]. This enables hackers to access the connected and infected devices and constitute botnets. Botnets were utilized for performing various malicious attacks like data theft, distributed denial-of-service (DDoS) attacks, sending spam, and many more [2, 3]. The malicious botnet assaults are formulated with advanced methods making it tough to detect malware. This, in contrast, imposes major threats that need the model of potential techniques for identifying such approaches [4]. Android botnets were utilized for executing attacks on the target devices. DDos assaults can be attained by flooding the targeted machine with blocking legitimate requests and superfluous requests, therefore, causing disruption of the services and a failure of the target system [5, 6]. Being the technological paradise, implementing futuristic methods for developing antivirus and robust security tools will be today’s priority. Developing these domains may have a huge contribution in preventing and detecting malware and keeping users away from inessential software [7]. Moreover, several sectors should focus to protect the data of users from malware attacks and other data breaches and security vulnerabilities; the healthcare, financial, and aircraft domain is at the forefront of this target which requires a focus on protecting the privacy and security of patient’s records and users [8, 9]. Blockchain (BC) can be implemented to secure records or data of transportation, industries, and health care, e.g., while Artificial Intelligence (AI) is utilized for the development of fields of malware detection and prevention with the possibility to formulate scalable, efficient, and robust malware recognition units [10, 11]. Android malware detection techniques are classified into hybrid, static, and dynamic analyses. The static analysis extracted the attributes from Android applications without running them into Android emulators or gadgets [12]. Later, monitoring unusual or suspicious behaviors. It attains high feature coverage but faces several disadvantages like dynamic code loading and code obfuscation. Conversely, dynamic analysis extracted features by running them on a device or Android emulator [13, 14]. This technique proves an advantage over static analysis by detecting more attributes or risks that may not be detected in static analysis. But, static analysis outpaces dynamic analysis in the computational resources and cost of time. On the other hand, hybrid analysis was a combination of the two previous ones, dynamic and static [15]. It is highly efficient in the identification process. This technique leverages static analysis. This study develops an explainable artificial intelligence-enabled Android malware detection for cybersecurity (XAIAMD-CS) model. The presented XAIAMD-CS technique accomplishes cybersecurity via the accurate identification of Android malware attacks. Initially, the XAIAMD-CS technique designs a new group teaching optimization algorithm-based feature selection (GTOA-FS) technique. Besides, the gradient boosting tree (GBT) model is applied for the accurate identification and classification of Android malware. To improve the classifier results of the GBT method, the salp swarm optimization (SSO) algorithm is applied in this
Explainable Artificial Intelligence-Enabled Android Malware …
639
work. The performance validation of the XAIAMD-CS technique is tested using the Android malware dataset, and the outcomes are reviewed under various aspects.
2 Related Works Yadav et al. [16] presented a performance analysis of 26 existing pre-trained CNN methods in Android malware detection. It even adds performance gained by largescale learning with RF and SVM techniques and stacking with CNN methods. Depending on the outcomes, an EfficientNet-B4 CNN-based technique was modeled for identifying Android malware utilizing image-related malware representations of Android DEX files. EfficientNet-B4 extracted related attributes from the malware images. Albahar et al. [17] introduced a modified ResNeXt technique by entrenching a novel regularization approach for enhancing the classifier task. Additionally, the authors presented a complete assessment of the Android malware detection and classification utilizing our modified ResNeXt. The non-intuitive features of malware were transformed into fingerprint images for extracting the rich data from the inputted dataset. Additionally, the authors enforced fine-tuned DL depending on the CNN on the visualized malware samples to acquire discriminative features automatically that separated normal and malicious datasets. Rasool et al. [18] modeled a method for classifying and detecting Android malware. The authors use EML comprising KNN, SVM, LSTM, and DT, DL methods for malware classification and detection in combination with selecting features for augmenting the efficiency of a model. In [19], a novel technique, BIR-CNN, was modeled for categorizing Android malware. It integrates CNN with inceptionresidual (BIR) network and batch normalization modules through 347-dim network traffic features. CNN integrates inception-residual units with a convolutional layer that can improve the model learning ability. Mahindru and Sangal [20] devised a technique called “SOMDROID” that operates on the principle of unsupervised ML technique. To formulate an efficient and effective Android malware detection method, the authors gather 500,000 different Android applications from promised sources and derive 1844 exclusive features. With the selective feature sets or features, the authors apply the SOM technique of Kohonen and measured four different performance parameters. Almomani et al. [21] present an efficient and automatic vision-related AMD method that has 16 fine-tuned and well-developed CNN techniques. This method precluded the necessity for premolded signated features extraction when generating precise forecasting of malware images with high detection speed and the least cost. This performance is attained with grayscale malware or colored images, by utilizing imbalanced or balanced data.
640
L. Almutairi
3 The Proposed Model In this study, automated Android malware detection using the XAIAMD-CS technique has been developed for cybersecurity. The presented XAIAMD-CS technique accomplishes cybersecurity via the accurate identification of Android malware attacks. Initially, the XAIAMD-CS technique designs a new GTOA-FS technique. Besides, the GBT model is applied for the accurate identification and classification of Android malware. To enrich the classifier results of the GBT method, the SSO algorithm is enforced in this work. Figure 1 demonstrates the workflow of the XAIAMD-CS system.
3.1 Feature Selection Using GTOA-FS Technique To select an optimal subset of features, the GTOA-FS technique is employed to it. A novel GTOA technique was utilized in simulating a group teaching approach, and the acquaintance of total class (c) is improved, viz., the basic model behindhand the projected approach that is GTOA [22]. For executing the GTOA technique to optimize the approach, a simple group teaching approach was projected based on
Fig. 1 Workflow of XAIAMD-CS approach
Explainable Artificial Intelligence-Enabled Android Malware …
641
the subsequent rules. The only alteration betwixt the student is the capability that acquiescent of knowledge. The maximum challenges to the teacher in expressing the teaching strategies can be dependent upon the variance in the ability to accept the skill. The quality of a decent teacher is to concentrate further interest on the student that has a worse ability to accept the knowledge. With self-learning, or interrelating with fellow students, students are capable of developing its skill in their free time. To improve student skills, decent teacher allocation approaches are extremely useful. For representing the skill of total classes, standard distribution functions are utilized which are formulated as in Eq. (1). 1 2 f (x) = √ exp(x−μ) . 2π δ
(1)
whereas χ defines the value whereas the standard distribution function is required. μ stands for the mean and δ denotes SD. In the GTOA technique, all the students are divided into two groups. An outstanding group is a set of students containing the better capability to grasp a skill, but the groups taking a worse ability to grasp a skill are identified as the average group. Teacher Phase During this phase, students learn from the teacher, i.e., the second rule defined before. In GTOA, the teacher creates different plans to average as well as outstanding groups. The teacher’s concentration is further on emerging the data of whole classes due to an optimum ability of students to accept the data. The student who fits the outstanding group is a superior probability of enhancing skill in Eq. (2). t+1 X teacher = X tj + a × T t − F × B × M t + c × X tj , −j
(2)
1 , N
(3)
b + c = 1,
(4)
Mt =
in which the count of students can be provided as N , X j implies the data of all the students, T refers to the teacher knowledge, and M denotes the mean group knowledge. Teacher features can be offered as F, and X teacher− j signifies that the student knowledge j learns from the teacher. Arbitrariness was projected as, b, and c from the interval of 0 and 1. Due to the weaker acceptance data ability, based on the second rule, the teacher offers a higher concentration on the average group. The average integration of students is gained data utilizing in Eq. (5). t t+1 t t X teacher, j = Xj +2×d × T − Xj ,
(5)
642
L. Almutairi
in which d denotes the arbitrariness from the range of 0 and 1. Equation (6) expresses the problem that a student cannot obtain data in the teacher step. t+1 X ieacher, j
⎧ ⎨ Xt
t+1 < f Xt f X ieacher, j j = t ⎩ X t , f X t+1 j teacher, j ≥ f X j teacher, j
(6)
Student Phase During their free time, students can obtain information by self-learning or interrelating with classmates which are provided arithmetically in Eq. (7). The student step connects to third rule by adding student phases I and II. t+1 X teacher, j
⎧ ⎨ Xt
t+1 t+1 + e × X teacher, j − X teacher,k + g = t+1 t+1 ⎩ Xt − e × X − X teacher, j teacher,k − g teacher, j t+1 ⎫ t+1 t+1 t ⎬ × X teacher, j − X j , f X teacher, j < f X teacher,k , t+1 t+1 t+1 t ⎭ × X teacher, j − X j , f X teacher, j ≥ f X teacher,k teacher, j
(7)
t+1 whereas e and g denote the two ransom numbers from the range of 0 and 1, X student, j t+1 signifies the student data i, and X teacher, j refers to the student data j which learns from the teacher. The students cannot obtain data in this phase. He/she is tackled in Eq. (8).
X t+1 = j
⎧ ⎨ Xt
teacher, j
⎩ Xt
siudeni, j
t+1 f X teacher, j < t+1 f X teacher, j ≥
t f X student, j t f X student, j
(8)
Teacher Allocation Phase For enhancing the student data, a decent teacher-assigned method was essential and is defined as the fourth rule. Inspired by the hunting performance of gray wolves, the top three students can be chosen as illustrated in Eq. (9). T =
⎧ ⎨ xt ⎩
t first f X first ≤ t t t X first +X second +X third 3
f >
t t t X first +X second +X third t 3 t t X +X +X third f first second 3
(9)
t t t in which X first , X second , and X third stand for the top three optimum students correspondingly. The fitness function (FF) considered the classifier outcome and selective features. It maximized the classifier outcome and reduces the set size of the selective features. Then, the following FF can be employed for assessing individual solutions, as seen
Explainable Artificial Intelligence-Enabled Android Malware …
643
in Eq. (10). Fitness = α ∗ Error Rate + (1 − α) ∗
#SF . #All_F
(10)
Here, ErrorRate is the classifier error rate leveraging the selective features. ErrorRate can be computed as a percentage of incorrect classification to the number of classifications done, expressed as a value between 0 and 1.
3.2 Android Malware Detection Using Optimal GBT Model For the accurate identification of Android malware, the GBT classifier is used. GBT is a supervised learning mechanism, otherwise called multiple additive regression trees (MART) and gradient boost regression trees (GBRT) [23]. In the proposed GBT model, every sample in the training set can be denoted by xi = C/N0i , ηi , θi , whereas i = 1, 2, 3 . . . , N indicates the number of instances, and N denotes the number of instances. The labeled training data are formulated by T = {(x1 , y1 ), (x2 , y2 ), (x3 , y3 ), . . . , (x N , y N )}, whereas yi ∈ {−1, 0, 1} shows the label of all the instances, where −1, 0, 1 signify NLOS, multipath, and LOS signals, correspondingly. By generating weak learners h t (xi ; a) that point in a steepestdescent direction, viz., negative gradient direction, GBDI reduces the predictive value of loss function L(yi , f (xi )). The weak learner’s h t (xi ; a) is a classification tree, the parameter a represents the splitting variable, and the split location and terminal node imply the individual tree. L(yi , f (xi )) =
1 (yi − f (xi ))2 . 2
(11)
Figure 2 shows the framework of GBT. The GBT-based GPS signal reception classification method is given below: 1. Initialization of weak learners f 0 (x) for the training dataset:
Fig. 2 Structure of GBT
644
L. Almutairi
f 0 (x) = argmin
N
γ
L(yi , γ ).
(12)
i=1
f 0 (x) shows the RT comprising a single root node. Meanwhile, L is carefully chosen as a square loss function, and f 0 (x) becomes: f 0 (x) = y.
(13)
2. For m = 1 to M: 2.1 Calculate the negative gradient
∂ L(yi , f (xi )) . y˜i = − ∂ f (xi ) f (x)= f m−1 (x)
(14)
2.2 Swap the label yi of training data with y˜i to achieve a new data Tm = {(x1 , y˜1 ), (x2 , y˜2 ), (x3 , y˜3 ), . . . , (x N , y˜ N )}, and construct a new RT h m (xi ; am ) by training the new data Tm : N at = argmin ( y˜ − h m (xi ; a))2 . a
(15)
i=1
2.3 Update the strong learner: f m (x) = f m−1 (x) + ρh m (x; am ).
(16)
In Eq. (17), ρ indicates the learning rate, typically selected to value among 0 ~ 1 for preventing overfitting. 3. Afterward the iteration is ended, output f M (x) as the concluding classifiers: f M (x) = f 0 (x) +
M
ρh m (x; am ).
(17)
m=1
4. f M (x) is utilized for predicting the signal reception types of the recently gathered unlabeled instances x = (C/N0 , η, θ ) from testing data. The predictive value needs to be iterated to the nearby values of 1, 0, or − 1. To adjust parameters related to the GBT method, the SSO technique is applied. SSO is a new swarm-based meta-heuristic technique which simulates the forage and navigates processes of salp’s from the oceans [24]. During the SSO technique, either leader or follower salp’s procedure salp chain. Half of the salps are selected as leaders for improving the population diversity of technique and the size to exit the local optimum. A group of parameters Pi j are created in lower as well as upper
Explainable Artificial Intelligence-Enabled Android Malware …
645
bounds’ values as per its chosen levels utilizing the subsequent relation (Eq. (18)). Pi j = lb j + rand ∗ ub j − lb j ,
(18)
whereas lb j and ub j signify the lower as well as upper bounds’ values of the jth parameter, and rand refers to the arbitrary value betwixt zero and one. The nondominated salp is dependent upon its dual objectives calculated, and an optimum salp dependent upon the crowding distance was considered as a source of food F1 j and saved in the file. The file size was upgraded to its maximum size. The value of the squared exponential covariance variable (C1) that determined the leader salp position was computed utilizing the subsequent relation (Eq. (19)). c1 = 2e
2 4it − max_it
,
(19)
whereas it can be the present iteration number and max− it refers maximal iteration number or end condition. The position of leader salp’s was upgraded with the subsequent connection (Eqs. (20) and (21)). i f c3 < 0.5P1 j = F1 j + c1 ub j − lb j c2 + lb j
(20)
P1 j = F1 j − c1 ub j − lb j c2 + lb j
(21)
Else
End. Whereas c2 and c3 signify the arbitrary values betwixt zero and one, lb j and ub j signify the lower as well as upper bounds’ values of the jth parameter, and F1 j denotes the food source position. Additionally, the follower salp’s position was defined as follows (Eq. (22)): P1 j =
1 P1 j + P(i−1) j . 2
(22)
At last, the upgraded salp’s position was verified, if it is within its bounds, viz., lb j and ub j . A primary salp position was upgraded with a novel upgraded position utilizing the whole replacement approach.
646
L. Almutairi
4 Performance Validation In this section, the Android malware classification results of the XAIAMD-CS method are inspected using a dataset comprising 2000 samples as given in Table 1. Figure 3 illustrates that the confusion matrices of the XAIAMD-CS model are well studied under different runs. On run-1, the XAIAMD-CS model has recognized 49.60% of samples as malware and 49% of samples as benign. In addition, on run-3, the XAIAMD-CS method has recognized 49.40% of samples as malware and 48.60% of samples as benign. Also, on run-4, the XAIAMD-CS approach has recognized 48.95% of samples as malware and 48.85% of samples as benign. Meanwhile, on run-5, the XAIAMD-CS methodology has recognized 47.95% of samples as malware and 46.45% of samples as benign. The overall Android malware detection performance of the XAIAMD-CS method is examined in Table 2. Figure 4 reports the results of the XAIAMD-CS model under run-1. The XAIAMD-CS model has shown enhanced outcomes under all classes. On malware class, the XAIAMD-CS model has obtained an accubal of 98.60%, precn of 98.61%, sens y of 98.60%, spec y of 98.60%, Fscore of 98.60%, and AUCscore of 98.60%. Besides, on benign class, the XAIAMD-CS model has attained an accubal of 98.00%, precn of 99.19%, sens y of 98.00%, spec y of 99.20%, Fscore of 98.59%, and AUCscore of 98.60%. Moreover, the XAIAMD-CS model has reached an accubal of 98.60%, precn of 98.61%, sens y of 98.60%, spec y of 98.60%, Fscore of 98.60%, and AUCscore of 98.60%. Figure 5 reports the results of the XAIAMD-CS model under run-2. The XAIAMD-CS method has shown enhanced outcomes under all classes. On malware class, the XAIAMD-CS approach has gained accubal of 98.30%, precn of 98.10%, sens y of 98.30%, spec y of 98.10%, Fscore of 98.20%, and AUCscore of 98.20%. Likewise, on benign class, the XAIAMD-CS model has reached an accubal of 98.10%, precn of 98.30%, sens y of 98.10%, spec y of 98.30%, Fscore of 98.20%, and AUCscore of 98.20%. Additionally, the XAIAMD-CS methodology has reached an accubal of 98.20%, precn of 98.20%, sens y of 98.20%, spec y of 98.20%, Fscore of 98.20%, and AUCscore of 98.20%. Figure 6 exhibits the results of the XAIAMD-CS model under run-3. The XAIAMD-CS method has shown enhanced outcomes under all classes. On malware class, the XAIAMD-CS approach has gained accubal of 98.80%, precn of 97.24%, sens y of 98.80%, spec y of 97.20%, Fscore of 98.02%, and AUCscore of 98%. Also, in benign class, the XAIAMD-CS technique has achieved an accubal of 97.20%, precn Table 1 Details of the dataset
Class
No. of samples
Malware
1000
Benign
1000
Total number of samples
2000
Explainable Artificial Intelligence-Enabled Android Malware …
647
Fig. 3 Confusion matrices of XAIAMD-CS approach a run-1, b run-2, c run-3, d run-4, and e run-5
648
L. Almutairi
Table 2 Android malware detection outcome of XAIAMD-CS approach under distinct runs Class
Accuracybal
Precision
Sensitivity
Specificity
F-score
AUC score
Run-1 Malware
99.20
98.02
99.20
98.00
98.61
98.60
Benign
98.00
99.19
98.00
99.20
98.59
98.60
Average
98.60
98.61
98.60
98.60
98.60
98.60
Run-2 Malware
98.30
98.10
98.30
98.10
98.20
98.20
Benign
98.10
98.30
98.10
98.30
98.20
98.20
Average
98.20
98.20
98.20
98.20
98.20
98.20
98.80
97.24
98.80
97.20
98.02
98.00
Run-3 Malware Benign
97.20
98.78
97.20
98.80
97.98
98.00
Average
98.00
98.01
98.00
98.00
98.00
98.00
Run-4 Malware
97.90
97.70
97.90
97.70
97.80
97.80
Benign
97.70
97.90
97.70
97.90
97.80
97.80
Average
97.80
97.80
97.80
97.80
97.80
97.80
Run-5 Malware
95.90
93.11
95.90
92.90
94.48
94.40
Benign
92.90
95.77
92.90
95.90
94.31
94.40
Average
94.40
94.44
94.40
94.40
94.40
94.40
Fig. 4 Average outcomes of the XAIAMD-CS approach under run-1
Explainable Artificial Intelligence-Enabled Android Malware …
649
Fig. 5 Average outcomes of the XAIAMD-CS approach under run-2
of 98.78%, sens y of 97.20%, spec y of 98.80%, Fscore of 97.98%, and AUCscore of 98%. Additionally, the XAIAMD-CS method has reached an accubal of 98%, precn of 98.01%, sens y of 98%, spec y of 98%, Fscore of 98%, and AUCscore of 98%. Figure 7 displays the results of the XAIAMD-CS model under run-4. The XAIAMD-CS model has shown enhanced outcomes under all classes. On malware class, the XAIAMD-CS method has gained accubal of 97.90%, precn of 97.70%, sens y
Fig. 6 Average outcomes of the XAIAMD-CS approach under run-3
650
L. Almutairi
Fig. 7 Average outcomes of the XAIAMD-CS approach under run-4
of 97.90%, spec y of 97.70%, Fscore of 97.80%, and AUCscore of 97.80%. In addition, on benign class, the XAIAMD-CS method has attained an accubal of 97.70%, precn of 97.90%, sens y of 97.70%, spec y of 97.90%, Fscore of 97.80%, and AUCscore of 97.80%. Likewise, the XAIAMD-CS technique has reached an accubal of 97.80%, precn of 97.80%, sens y of 97.80%, spec y of 97.80%, Fscore of 97.80%, and AUCscore of 97.80%. Figure 8 displays the results of the XAIAMD-CS approach under run-5. The XAIAMD-CS method has shown enhanced outcomes under all classes. On malware class, the XAIAMD-CS algorithm has gained accubal of 95.90%, precn of 93.11%, sens y of 95.90%, spec y of 92.90%, Fscore of 94.48%, and AUCscore of 94.40%. Also, on benign class, the XAIAMD-CS method has achieved an accubal of 92.90%, precn of 95.77%, sens y of 92.90%, spec y of 95.90%, Fscore of 94.31%, and AUCscore of 94.40%. Furthermore, the XAIAMD-CS technique has reached an accubal of 94.40%, precn of 94.44%, sens y of 94.40%, spec y of 94.40%, Fscore of 94.40%, and AUCscore of 94.40%. For inspecting the enhanced performance of the XAIAMD-CS model, a widespread comparative study is made in Table 3 [25, 26]. Figure 9 examines the accu y analysis of the XAIAMD-CS method with recent models. The results implied that the LSTM, AE, XGBoost, and GB models have shown the least outcome with closer accu y values of 93.61%, 93.88%, 93.40%, and 93.85%, respectively. Moreover, the CNN-LSTM and Bi-LSTM models have reported moderately improved accu y values of 94.86% and 94.42%, respectively. But the XAIAMD-CS model has shown an effectual outcome with a maximum accu y of 98.60%. Figure 10 examines the precn analysis of the XAIAMD-CS model with recent methods. The results implied that the LSTM, AE, XGBoost, and GB approaches
Explainable Artificial Intelligence-Enabled Android Malware …
651
Fig. 8 Average outcomes of the XAIAMD-CS approach under run-5
Table 3 Comparative analysis of XAIAMD-CS approach with other recent algorithms Methods
Accuracy
Precision
Sensitivity
F-score
XAIAMD-CS
98.60
98.61
98.60
98.60
LSTM
93.61
94.54
93.88
93.63
CNN-LSTM
94.86
93.16
94.22
94.24
AE
93.88
94.91
94.46
93.34
Bi-LSTM
94.42
94.84
93.72
94.54
XG boost
93.40
94.74
93.03
94.70
Gradient boosting
93.85
95.00
94.91
94.13
have shown the least outcome with closer precn values of 94.54%, 94.91%, 93.03%, and 95% correspondingly. Moreover, the CNN-LSTM and Bi-LSTM methods have reported moderately improved precn values of 93.16% and 94.84% correspondingly. But, the XAIAMD-CS method has shown an effectual outcome with maximum precn of 98.61%. Figure 11 examines the sens y analysis of the XAIAMD-CS model with recent models. The results implied that the LSTM, AE, XGBoost, and GB techniques have shown the least outcome with closer sens y values of 93.88%, 94.46%, 93.03%, and 94.91% correspondingly. Still, the CNN-LSTM and Bi-LSTM methods have reported moderately improved sens y values of 94.22% and 93.72% correspondingly. But, the XAIAMD-CS techniques have shown effectual outcomes with a maximum sens y of 98.60%. Figure 12 inspects the Fscore analysis of the XAIAMD-CS technique with recent methods. The results implied that the LSTM, AE, XGBoost, and GB algorithms
652
L. Almutairi
Fig. 9 Accu y analysis of XAIAMD-CS approach with other recent algorithms
Fig. 10 analysis of XAIAMD-CS approach with other recent algorithms
have shown the least outcome with closer Fscore values of 93.63%, 93.34%, 94.70%, and 94.13% correspondingly. Also, the CNN-LSTM and Bi-LSTM methods have reported moderately improved Fscore values of 94.24% and 94.54% correspondingly. But, the XAIAMD-CS method has shown an effectual outcome with a maximum Fscore of 98.60%. These results show the better performance of the XAIAMD-CS technique than other DL models.
Explainable Artificial Intelligence-Enabled Android Malware …
653
Fig. 11 Sens y analysis of XAIAMD-CS approach with other recent algorithms
Fig. 12 Fscore analysis of XAIAMD-CS approach with other recent algorithms
5 Conclusion In this study, automated Android malware detection using the XAIAMD-CS technique has been developed for cybersecurity. The presented XAIAMD-CS technique accomplishes cybersecurity via the accurate identification of Android malware attacks. Initially, the XAIAMD-CS technique designs a new GTOA-FS technique.
654
L. Almutairi
Besides, the GBT model is applied for the accurate identification and classification of Android malware. To enrich the classifier results of the GBT method, the SSO algorithm was implemented in this work. The performance validation of the XAIAMDCS technique is tested using the Android malware dataset, and the outcomes are inspected in terms of distinct measures. The results demonstrate the improvements of the XAIAMD-CS method. In future, the performance of the XAIAMD-CS technique can be improvised by the outlier removal process. Funding None. DataAvailabilityStatement Data Availability Statement Data sharing not applicable to this article as no datasets were generated during the current study.
Declarations Conflict of Interest The authors declare that they have no conflict of interest. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Ethics Approval This article does not contain any studies with human participants performed by any of the authors.
References 1. Liu K, Xu S, Xu G, Zhang M, Sun D, Liu H (2020) A review of android malware detection approaches based on machine learning. IEEE Access 8:124579–124607 2. Zhao S, Li S, Qi L, Xu LD (2020) Computational intelligence enabled cybersecurity for the internet of things. IEEE Trans Emerg Topics Computa Intell 4(5):666–674 3. Dovom EM, Azmoodeh A, Dehghantanha A, Newton DE, Parizi RM et al (2019) Fuzzy pattern tree for edge malware detection and categorization in IoT. J Syst Architect 97:1–7 4. Sapalo Sicato JC, Sharma PK, Loia V, Park JH (2019) VPNFilter malware analysis on cyber threat in smart home network. Appl Sci 9(13):2763 5. Shah Y, Sengupta S (2020) A survey on classification of cyber-attacks on IoT and IIoT devices. In: 11th IEEE annual ubiquitous computing, electronics & mobile communication conference (UEMCON), New York, USA, pp 0406–0413. https://doi.org/10.1109/UEMCON51285.2020. 9298138 6. Ficco M (2019) Detecting IoT malware by Markov Chain behavioral models. In: IEEE international conference on cloud engineering (IC2E), Prague, Czech Republic, pp 229–234. https:// doi.org/10.1109/IC2E.2019.00037 7. Chikapa M, Namanya AP (2018) Towards a fast off-line static malware analysis framework. In: 6th International conference on future internet of things and cloud workshops (FiCloudW), Barcelona, pp 182–187. https://doi.org/10.1109/W-FiCloud.2018.00035 8. Inayat U, Zia MF, Mahmood S, Khalid HM, Benbouzid M (2022) Learning-based methods for cyber attacks detection in iot systems: a survey on methods, analysis, and future prospects. Electronics 11(9):1502 9. Collins Uchenna C, Jamil N, Ismail R, Kwok Yan L, Afendee Mohamed M (2021) Malware threat analysis techniques and approaches for IoT applications: a review. Bulletin EEI 10(3):1558–1571
Explainable Artificial Intelligence-Enabled Android Malware …
655
10. Ahirao P (2021) Proactive technique for securing smart cities against malware attacks using static and dynamic analysis. Int Res J Innov Eng Technol 5(2):10 11. Vasan D, Alazab M, Wassan S, Naeem H, Safaei B et al (2020) IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture. Comput Netw 171:107138 12. Ullah F, Naeem H, Jabbar S, Khalid S, Latif MA et al (2019) Cyber security threats detection in internet of things using deep learning approach. IEEE Access 7:124379–124389 13. Sudhakar SK (2021) MCFT-CNN: malware classification with fine-tune convolution neural networks using traditional and transfer learning in internet of things. Future Gen Comput Syst 125:334 14. Jeon J, Park JH, Jeong Y-S (2020) Dynamic analysis for IoT malware detection with convolution neural network model. IEEE Access 8:96899–96911 15. Naeem H, Ullah F, Naeem MR, Khalid S, Vasan D et al (2020) Malware detection in industrial internet of things based on hybrid image visualization and deep learning model. Ad Hoc Netw 105:102154 16. Yadav P, Menon N, Ravi V, Vishvanathan S, Pham TD (2022) EfficientNet convolutional neural networks-based Android malware detection. Comput Secur 115:102622 17. Albahar MA, ElSayed MS, Jurcut A (2022) A modified ResNeXt for android malware identification and classification. Comput Intell Neurosci. https://doi.org/10.1155/2022/863 4784 18. Rasool A, Javed AR, Jalil Z (2021) SHA-AMD: sample-efficient hyper-tuned approach for detection and identification of Android malware family and category. Int J Ad Hoc Ubiquitous Comput 38:172–183 19. Liu T, Zhang H, Long H, Shi J, Yao Y (2022) Convolution neural network with batch normalization and inception-residual modules for Android malware classification. Sci Rep 12(1):1–17 20. Mahindru A, Sangal AL (2020) SOMDROID: Android malware detection by artificial neural network trained using unsupervised learning. Evol Intel 15:407–437 21. Almomani I, Alkhayer A, El-Shafai W (2022) An automated vision-based deep learning model for efficient detection of android malware attacks. IEEE Access 10:2700–2720 22. Zafar MH, Al-shahrani T, Khan NM, Feroz Mirza A, Mansoor M et al (2020) Group teaching optimization algorithm based MPPT control of PV systems under partial shading and complex partial shading. Electronics 9(11):1962 23. Sun R, Wang G, Zhang W, Hsu LT, Ochieng WY (2020) A gradient boosting decision tree based GPS signal reception classification algorithm. Appl Soft Comput 86:105942 24. Siva Kumar M, Rajamani D, El-Sherbeeny AM, Balasubramanian E, Karthik K et al (2022) Intelligent modeling and multi-response optimization of AWJC on fiber intermetallic laminates through a hybrid ANFIS-salp swarm algorithm. Materials 15(20): 7216 25. Shatnawi AS, Jaradat A, Yaseen TB, Taqieddin E, Al-Ayyoub M et al (2022) An android malware detection leveraging machine learning. Wirel Commun Mob Comput. https://doi.org/ 10.1155/2022/1830201 26. Alkahtani H, Aldhyani TH (2022) Artificial intelligence algorithms for malware detection in android-operated mobile devices. Sensors 22(6):2268
Observing Different Machine Learning Approaches for Students’ Performance Using Demographic Features Neeraj Kumar Srivastava, Prafull Pandey, Manoj Kumar Mishra, and Vikas Mishra
Abstract The use and importance of educational data mining (EDM) are growing very fast as the entire world is going digital. The predictions of EDM [1, 2] are playing very important role not only in improving the academic performance of individual students but also are very useful for institutions, policy makers, administrators to plan and design new policies that will certainly help them to achieve high academic performance of their students. On this basis, this paper approaches students’ performance in secondary school of two Portuguese schools. The informational elements comprise demographic, societal, and educational [3] features. Although there are many ML [4, 5] algorithms, yet in this paper, five ML classifiers such as random forest, Naïve Bayes, logistic regression, decision tree, and support vector machine have been used to predict the students’ performance that on the basis of given features, whether a student can be passed in secondary education or not. On the dataset, the Naïve Bayes classifier gives the best performance than other classifiers, when the dataset is partitioned into 80–20 ratio as training and test datasets. Keywords EDM · Naïve Bayes · SVM · Logistic regression · Decision tree · Random forest · EDA
N. K. Srivastava (B) · P. Pandey Department of Computer Science and Engineering, United Institute of Technology, Naini, Prayagraj, India e-mail: [email protected] M. K. Mishra Department of Computer Science and Engineering, United College of Engineering and Research, Naini, Prayagraj, India V. Mishra Department of Computer Applications, United Institute of Management, Naini, Prayagraj, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_52
657
658
N. K. Srivastava et al.
1 Introduction Improvement in students’ performance is foremost concern of most of the educationists and the Educational Institutions. Therefore, a large number of researches have been conceded to predict pupils’ performance incorporating various ML algorithms along with other technologies. Such researches help significantly to design and develop various educational models to ensure a bright future of students. In most of the cases, the educationists focus only upon the academic performances of students for their assessment. But with due course of time, it has been realized that not only academic background but also demographic features and family background of students including family size, their locality, relation with parents, quantity of alcohol consumption, relationship status, etc., affect their performance in a very significant manner. Henceforth, in this paper, all the experiments are to be implemented on a dataset that includes all aforesaid essential parameters in order to carry out more comprehensive findings related to predicting students’ performance [6]. The proposed model is going to use mainly five ML classifiers, i.e., logistic regression, Naïve Bayes, decision tree, random forest, and support vector machine (SVM) algorithms on categorical dataset to predict performances of students in terms of success rate by examining and analyzing the academic as well as family and social backgrounds recorded in the dataset. The visualization of the process and comparison/analysis of the findings using different machine learning algorithms will be discussed in Sect. 4 of this paper.
2 Research Methodology The EDM [7–9] is a course of action of recording, analyzing, and scrutinizing the records to ascertain prototypes, irregularity and to produce a hypothesis focused upon our perceptive of the provided information. A widespread study and assessment of different ML algorithms are to be done for predicting the future performance of students. Since a well define labeled dataset is required for interpretation of data samples to be analyzed, instead of using unsupervised learning techniques, the supervised ones are more suitable for future performance prediction of students. The proposed model includes following steps: 1. 2. 3. 4. 5.
Description of data Data cleaning and preprocessing Model selection Experiments Result
1. Description of Data: In this paper, the suggested mock up is going to use a students’ dataset that contains data of students undergoing higher secondary
Observing Different Machine Learning Approaches for Students’ …
659
education. The dataset includes demographical attributes like gender, age, family size, details of parents’ education, relation status, study time, free time, pattern of alcohol consumption, health conditions of students. Using this dataset, prediction regarding future performance of students can be carried out in terms of what will be the chances of future success of a student depending upon his demographical information. There is a single target variable “passed” in the dataset that is actually brings into play to forecast the upcoming recital of the undergraduates. It contains binary values either TRUE or FALSE depending upon the values of other attributes. The structure of the table can be accessed (https://drive.google. com/file/d/1DncuS6RrxkRVHZu9IFPJ_R9Z8YThadBo/view?usp=share_link). 2. Data Preprocessing: The following preprocessing tasks will be applied on dataset in order to achieve reliable-findings: a. Replacing missing values b. Dropping columns c. Splitting dataset into training and test data 3. Model Selection: This paper presents a mock up for forecasting recital of undergraduates by using and analyzing dissimilar ML algorithms to fabricate the outcome with maximum correctness. The machine leaning algorithms used in the proposed model are listed below. A. Random Forest The random forest [10] classifier is a supervised learning algorithm which is widely used in classification and regression problems. This algorithm manufactures several verdict trees on the basis of diverse taster and retrieves the preponderance of votes to classify and calculate average for regression. The best attribute of this classifier can be said that it could cope up with a collection of records inhibiting continuous variables for the case of regression and categorical variables as the case of classification. This algorithm gives better performance for solving classification problems (Fig. 1).
Fig. 1 Random forest classifier
660
N. K. Srivastava et al.
B. Logistic Regression (LR) The LR classifier [11] is based upon a function, named logistic function (acknowledged as sigmoid function also). It was designed by a statistician to explain the attributes of populace growth in ecology. This represents an S figured arch, which could receive real value as input and maps that value between 0 and 1, but never on the limits. The logistic function is represented as 1/ 1 + evalue Here: e = base of natural logarithms (Euler’s number of the EXP function); value = actual numerical value that to be transformed. Following is given a plot, where values between − 5 and 5 are transformed into the range of 0 and 1 using the logistic function (Fig. 2): C. Naïve Bayes The Naïve Bayes classifier [12, 13] is based upon Bayes theorem and widely used for statistical classification. This is one among the simplest learning classifiers, which is considered as an accurate, fast, and reliable algorithm that produces highly accurate and faster results on larger datasets. This algorithms works on the principle that assumes that the effect of a feature is independent of other features in a class. Such as: A loan applicant is eligible or not depending upon his/her income, older loan repayment history, age and location, etc. Although these features are interdependent, they are still considered as independently. Due to this assumption, the computations get easier. Therefore, it is considered as Naïve. This is known as class conditional independence. P(h|D) =
Fig. 2 Logistic regression classifier
P(D|h)P(h) P(D)
Observing Different Machine Learning Approaches for Students’ …
661
Fig. 3 Support vector machine (SVM) classifier
P(h): probability of h (when probability of hypothesis h being true); P(D): prior probability of data D; P(h|D): posterior probability; P(D|h): posterior probability of data D given that the hypothesis h was True. D. Support Vector Machine (SVM) The SVM [14, 15] is a classification approach that could be applied in categorization as well as regression problems both. This algorithm can easily manage many categorical and continuous variables together. To distinguish different classes, this algorithm creates a hyperplane in multi-dimensional space and produces optimal hyperplane in a looping fashion that minimizes errors. To divide the dataset into classes into the best possible manner, the SVM observes a maximum marginal hyperplane (Fig. 3). E. Decision Tree A decision tree [16] is a type of supervised ML classifier that is decision-making conditional construct that includes branches like a tree, in which, the attributes are represented by internal nodes, whereas a decision rule is represented by branch. The leaf nodes represent the result (output). A decision tree is a non-parametric method that does not depend upon probability distribution assumptions. The decision tree classifier can easily manage larger dataset with high accuracy. This tree starts from a special designated node that is known as root node. It is able to visualize human thinking in an easier manner which is simple to understand and interpret also. Just like white box algorithms, the decision tree classifier exposes internal decision-making logic. The training time of this classifier is faster compared to the neural network algorithm (Fig. 4). The result and other discussions will take place in the next section of this paper.
662
N. K. Srivastava et al.
Fig. 4 Decision tree classifier
3 Result and Discussion In this model, different classification algorithms like random forest, Naïve Bayes, logistic regression, decision tree, and SVM have been used. The performance of this model is evaluated by using different metrics like accuracy score and confusion matrix to find out the best among these classifiers. Within our experiments, it has been observed that Naïve Bayes classifier gives the best performance with an accuracy of 74.68% than other ML classifiers. The details of different parameters of ML classifiers used in this model including accuracy in percentage are listed in the following classification report that has been populated on 79 records (test data) out of 315 records of the dataset, shown in Table 1. The comparison of accuracy of different ML classifiers used in this model is plotted in the following Fig. 5.
Observing Different Machine Learning Approaches for Students’ …
663
Table 1 Classification report of different ML classifiers Classifier
Logistic regression Decision tree
Not passed (0) Passed (1)
Confusion matrix
0
[[14 16] [6 43]]
0.69 0.70
[[10 20] [16 33]]
0.4
0.33
30
0.63
0.69
49
0.73
0.37
30
1 0 1
Random forest
0
Support vector machine
0
1
Naïve Bayes
1 0 1
Fig. 5 Comparison of accuracy score of ML classifiers
[[13 17] [4 45]]
Precision
Recall
Support
Accuracy (in %)
0.37
30
72.15
0.90
49
0.70
0.92
49
[[0 30] [0 49]]
0.75
0.2
30
0.66
0.96
49
[[16 14] [6 43]]
0.73
0.53
30
0.75
0.88
49
56.96 73.41 62.02 74.68
664
N. K. Srivastava et al.
4 Conclusion It has been observed that improvement in educational system is one of the most challenging tasks now a day. The use of technology like machine learning (ML) and educational data mining (EDM) has made it quite simpler indeed in an innovative manner. These methods help us to identify the needs as well as the areas, which are helpful to improve the students’ performance. In this paper, the idea to present a model that predicts students’ performance based upon different demographical features is proposed. The main challenge in this model was to select the best classification algorithms not only to identify the most impactful factors but to suggest the students with a comprehensive summary of positive feature that will help them to achieve higher academic status and to avoid failures that may come in their way. This model also outlines different features among the demographical features that leaves negative impact on students’ performance, hence, should be kept in mind.
References 1. Wook M, Yusof ZM (2016) Educational data mining acceptance among undergraduate students. Educ Inf Technol. https://doi.org/10.1007/s10639-016-9485-x 2. Sorour SE, Mine T, Goda K, Hirokawa S (2014) Predicting students’ grades based on free style comments data by artificial neural network. IEEE 3. Naren J, Elakia GA (2014) Application of data mining in educational database for predicting behavioural patterns of the students. Int J Comp Sci Inf Technol (IJCSIT) 5(3):4649–4652 4. Tarika A, Aissab H, Yousef F (2021) Artificial intelligence and machine learning to predict student performance during the COVID-19”. In: 3rd international workshop on big data and business intelligence, Warsaw, Poland 5. Uskov VL, Bakken JP, Shah A, Byerly A (2019) Machine learning based predictive analytics of student academic performance in STEM education. In: IEEE global engineering education conference 6. Acharya S, Madhu N (2015) Discovery of students’ academic patterns using data mining techniques. Int J Comp Sci Eng (IJCSE) 4:1054–1062 7. Aldowaha H, Al-Samarraiea H, Fauzyb WM (2019) Educational data mining and learning analytics for 21st century higher education: a review and synthesis. In: Telematics and informatics. Elsevier 8. Ashraf M, Zaman M, Ahmed M (2019) An intelligent prediction system for educational data mining based on ensemble and filtering approaches. Int Conf Comput Intell Data Sci 167:1471 9. Kumar M, Shambhu S, Aggarwal P (2016) Recognition of slow learners using classification data mining techniques. Imp J Interdisciplinary Research (IJIR) 2(12):741–747 10. Ghorbani R, Ghousi R (2020) Comparing different resampling methods in predicting students’ performance using machine learning techniques. IEEE Access 11. Widyahastuti F, Tjhin VU (2017) Predicting students performance in final examination using linear regression and multilayer perceptron. In: Inter-conference on intelligent computing, instrumentation and control technologies (ICICICT). IEEE 12. Tripathi A, Yadav S, Rajan R (2019) Naïve Bayes classification model for the student performance prediction. In: 2nd international conference on intelligent computing, instrumentation and control technologies (ICICICT)
Observing Different Machine Learning Approaches for Students’ …
665
13. Tomasevic N, Gvozdenovic N, Vranes S (2020) An overview and comparison of supervised data mining techniques for student exam performance prediction. In: Computer and education. Elsevier 14. Burman I, Som S (2019) Predicting students academic performance using support vector machine. IEEE 15. Chui KT, Liu RW, Zhao M, De Pablos PO (2020) Predicting students’ performance with school and family tutoring using generative adversarial network-based deep support vector machine. IEEE Access 16. Lynn ND, Emanuel AWR (2020) Using data mining techniques to predict students’ performance: a review. ICIMECE
A Workflow Allocation Strategy Using Elitist Teaching–Learning-Based Optimization Algorithm in Cloud Computing Mohammad Imran, Faraz Hasan, Faisal Ahmad, and Mohammad Shahid
Abstract Cloud computing offers a pay-per-use basis to address complex scientific and commercial workflow processing. This paper proposes an elitist teaching– learning-based optimization (E-TLBO) by replacing the worst candidates with the elitist candidate of the previous population to minimize the makespan of workflow tasks submitted from cloud users. This metaheuristic method mimics the classroom teaching and learning behaviors. The workflow allocator is implemented in MATLAB for the performance analysis. The experimental findings demonstrate the better performance than its peer in terms of fitness value (makespan) and convergence rate under study. Keywords Cloud computing · Metaheuristic algorithm · Workflow scheduling · Teaching–learning-based optimization
M. Imran Department of Computer Science, Aligarh Muslim University, Aligarh, India e-mail: [email protected] F. Hasan Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Guntur, India e-mail: [email protected]; [email protected] F. Ahmad Workday Inc., Pleasanton, USA e-mail: [email protected] M. Shahid (B) Department of Commerce, Aligarh Muslim University, Aligarh, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_53
667
668
M. Imran et al.
1 Introduction Workflows naturally describe real scientific applications such as disaster modeling, bioinformatics, astronomy, earth science, and many in physics and presented a Directed Acyclic Graph (DAG). One advantage of workflow depiction is that the various models of workflow may be reused, replicated, and even traced across other workflows. Scientific workflows are divided into several jobs that need complex data of various quantities with huge amount of processing hours. The workflow processing is carried out in two stages in the cloud. Resources for running workflow tasks are discovered and obtained from the cloud in the first stage. The second stage prepares a schedule for each task in order to meet Quality of Service (QoS). Workflow scheduling in cloud systems is widely recognized as an NP-hard problem. Therefore, heuristic algorithms are better suited to solving the nature of an NP-hard issue than deterministic methods [1–6]. In this paper, we have incorporated elitism in teaching–learning-based optimization (TLBO) [7] by replacing the worst candidates by the elitist candidate of previous solution which is proposed to minimize the makespan of workflow tasks submitted from cloud users. E-TLBO algorithm utilizes the classroom teaching behaviors for exploration and exploitation with elitism to find out best solution. It utilizes and models various phases of teaching and learning behaviors. The workflow simulator is implemented in MATLAB for comparative performance analysis with standard TLBO. The study is designed in the rest of the paper as Sect. 2 deals with the associated review of literature. Section 3 denotes the issues chosen or the formulation of metaheuristic algorithm. Section 4 is solely concerned with providing a solution to the specified problem. Section 5 of this study contains a comparative analysis and findings for the suggested model. Finally, the paper’s conclusion has been discussed in Sect. 6, which is the final section of this paper.
2 Related Work Many studies have been conducted on the workflow scheduling problem. Workflow scheduling approaches are divided into two subcategories based on workflow multiplicity and scheduling methods such as heuristic and metaheuristic. The authors of [8] presented the Scaling–Consolidation–Scheduling (SCS) method for heuristicbased workflow scheduling to minimize execution costs based on the cloud’s pricing mechanism. The IaaS Cloud Partial Critical Path (IC-PCP) method, developed by Abrishami et al. [9], is a non-robust model considering the deadline constraints. Assigning each critical path task to the cheapest satisfying the deadline. Algorithm for Just-In-Time (JIT-C) workflow scheduling Sahni et al. [10] presents a dynamic and cost-effective model for cloud systems. In [11], a one of the competitive list-based heuristics, HEFT is presented for scheduling DAGs onto heterogeneous machines.
A Workflow Allocation Strategy Using Elitist …
669
A security-aware version of HEFT for single workflow has been designed in [12], to optimize guarantee ratio. A stochastic scheduler is proposed for multiple workflows to optimize the energy consumption and the turnaround time in [13]. A scheduling approach, namely, levelized heavily communicating node first (LHCNF) has been proposed in [14], to minimize the schedule length of application. A security-aware workflow allocation model is evaluated in [15], to reduce average failure probability by minimizing the total number of failure tasks using level attribute for maintaining the precedence constraints. Based on level attribute, another multi-objective model [16] is presented for a workflow allocation to minimize makespan and flowtime for infrastructure as a service (IaaS) clouds. Another work is also reported in the same line named, levelized multiple workflow allocation strategy with task merging in [17], to minimize turnaround time and its security-prioritized variant for multiple workflow allocation (SPMWA) is presented to minimize the failure probability [18]. A genetic based algorithm with security and deadline constraints has been discussed in [19]. A cost-effective scheduling approach for multi-workflow with time constraints has been evaluated in [20], to reduce the execution cost under given deadlines. A multi-objective optimization approach has been presented in [21], to minimize the makespan and user’s budget cost. A load-balanced workflow allocation has been discussed in [22] using stochastic fractal search algorithms to optimize the resource utilization. Another model uses the differential evolution and Moth– Flame optimization to reduce the energy consumption by minimizing makespan and communication between dependent tasks [23]. A novel approach has been presented in [24] to reduce the makespan for workflow scheduling problems using modified firefly algorithm.
3 The Problem Statement Let us consider a virtual machine set, V = {V k : 1 ≤ k < K}, a workflow tasks set, W f = {t i : 1 ≤ i ≤ N}, and an edge set, E = {exy : 1 ≤ x ≤ N, 1 ≤ y ≤ N} between the tasks T x and T y . The expected time to compute is the estimated execution time of task T i on V k (E ik ). Each job in this process has a precedence level and may require some inter-task communication with parent tasks. This cost is eliminated if parent tasks are assigned to the same VM. As shown in Fig. 1, the communication time (CTl−s,l x ykr ) between T x and T y , allocated to virtual machines V k and V r , of level l and l − s, respectively, can be calculated as: CTl−s,l x ykr = w ∗ ex y ∗ Dkr .
(1)
The workflow allocation in the cloud system can be described as the mapping (m) between W f and V such that m : Wf → V
(2)
670
M. Imran et al.
Fig. 1 Communication time computation
to minimize the makespan (MS) subject to precedence constraints and duplication of the tasks onto others VMs are not allowed. The execution time (ETlk ) on V k at level l is the sum of all task execution times allocated to the V k and may be calculated as
ETlk =
(E ik ).
(3)
Vk ←∀ti ∈ρ l
The communication time (CTlk ) between a task and its parent task is needed on V k and can be expressed as follows: CTlk =
max
Vk ←∀ti ∈ρ l
CTl−s,l x ykr .
(4)
The sum of CTlk and , ETlk V k at level l is the total execution time and can be computed as: TETlk = CTlk + ETlk .
(5)
Makespan (MS) of workflow on to VMs set in cloud computing environment can be estimated as: l=L l MS = max TETk . (6) ∀Vk
l=1
Therefore, to process workflow from cloud users, it is always preferred that the resulting mapping provides an effective allocation plan.
A Workflow Allocation Strategy Using Elitist …
671
4 The Proposed E-TLBO Algorithm In this section, teaching–learning-based optimization approach with elitism (TLBOE) has been proposed. In TLBOE, one best solution is designated as teacher, while others are designated as students. Students learn from their interactions and from teacher. It increases the rate of convergence and generates reasonable diversified solutions. At first, the initial population with P solutions is generated. This algorithm operates in two stages, one is teacher phase and other is student phase. In each iteration, the population is stored before the both phases to persist best students in next iteration. During the teacher phase, a student utilizes Eq. (7) to learn from a randomly selected teacher from the teacher–class as X i = X i + r A ∗ X iteacher − T f ∗ X imean .
(7)
T f denotes the teaching factor, which is utilized to improve convergence rate as well as produce diverse solutions, and T f will be between 0 and 1. X imean is the mean of the considered subject of the ith student from the class. In the learner phase, learners communicate randomly with other learners. (Rao et al., 2011). Learners are picked randomly for interaction. If two learners interact, knowledge transfer will occur from the one with better fitness. In this phase, output population of the teacher phase will work as the initial population of the learner phase and is given as follows: For i = 1: X P Select two learners, X 1 and X 2 , randomly where X 1 and X2 do not have equal marks If f (X 1 ) < f (X 2 ), then X 1 new = X 1 old + bi(X 1 − X 2 )
(8)
X 1 new = X 1 old + bi(X 2 − X 1 )
(9)
Else
End if End for After the completion of teacher and learner phase, some fixed number of students are selected from the previously stored population and transferred to the current population by replacing the worst candidates.
672
M. Imran et al.
E-TLBO Begin 1. Randomly initialized the P particles //with real numbers. 2. Apply transformation. 3. Compute the fitness function. 4. Set the teacher. 5. While maximum iteration reached do. 6. Store the population. 7. Compute the mean of all teacher’s class. 8. for i ← 1 to P-1 do. 9. Teacher phase updated by using Eq. (7). 10. for i ← 1 to P do. 11. Student phase updated by using Eqs. (8 and 9). 12. Apply transformation. 13. Compute fitness function for each new solution. 14. Replace worst candidates by elitist candidates. 15. End while. 16. Return optimal solution and best fitness. End
In TLBO algorithm, initial population is generated randomly, and then, the solutions generated are transformed into the feasible solution having integer numbers between 1 and K. Then, we do the evaluation of the population as per the fitness function given in Eq. (6). Apply the various E-TLBO processes to create new solutions. Again, repair the updated solution space and perform the fitness evaluation. Finally, some worst fitted individuals are replaced by best solutions from the previous population. This procedure is repeated until the stopping criteria met and optimal solution found.
5 Experimental Results In this section, with the help of an experimental study using a novel metaheuristic algorithm has been made to minimize the makespan. This method is tested on MATLAB using ThinkCentre Intel(R) Core i7, 64 GB of RAM. Here, the parameters’ settings are as follows: population size = 100, maximum iterations = 100. The comparative analysis of proposed algorithm for workflow execution has been conducted with its peer. The tuning parameters are determined by trial-and-error approach. The simulation environment is set up by a random workflow using various input parameters such as workflow size, task size, number of VMs, precedence level, amount communication requirement, VM distances, and computing speeds. For the performance analysis, the parameters are as follows: Workflow size (N) = 60–600, K = 6–60, el,l−s = 0 − 41, Dkr , = 0–41, Capk = xy 70–100, L = 10, Nl = 6–60.
A Workflow Allocation Strategy Using Elitist …
673
The fitness function is a minimization function to minimize the makespan. The simulation experiments are performed for four sets of tasks and VMs. The convergences curves are computed for considering the makespan as the fitness value for various iterations and the same has been shown in Figs. 2, 3, 4, and 5, respectively. The experimental results in Figs. 2, 3, 4, and 5 confirm that the TLBO with elitism (E-TLBO) strategy for workflow tasks’ execution exhibits the better performance than the TLBO on account of objective (makespan) and convergence behavior for all considered tasks and VM sets in experimental study.
Fig. 2 Convergence curve (N = 60, K = 6)
10
Fitness Value (M akeSpan)
1.15
4
TLBO E-TLBO
1.1 1.05 1 0.95 0.9 0
Fig. 3 Convergence curve (N = 60, K = 60)
20
40 60 Number of Iterations
80
100
Fitness Value (MakeSpan)
7400 TLBO E-TLBO
7300 7200 7100 7000 6900 0
20
40 60 Number of Iterations
80
100
674
M. Imran et al.
Fig. 4 Convergence curve (N = 600, K = 6)
10
Fitness Value (MakeSpan)
1.46
5
TLBO E-TLBO
1.44 1.42 1.4 1.38 1.36 1.34 0
Fig. 5 Convergence curve (N = 600, K = 60)
20
10
8.05
40 60 80 Number of Iterations
4
TLBO E-TLBO
8 Fitness Value (MakeSpan)
100
7.95 7.9 7.85 7.8 7.75 7.7 0
20
40 60 Number of Iterations
80
100
6 Conclusion This paper presents a teaching–learning-based metaheuristic algorithm incorporating the elitism at each generation that minimizes the makespan of the workflow submitted for execution. The MATLAB tool is used for simulation, and the results are compared to baseline method (TLBO) on randomly generated workflow. According to the
A Workflow Allocation Strategy Using Elitist …
675
observations, the proposed method performs better than its peer in terms of fitness value (makespan) and convergence rate. The proposed work has the potential to be expanded in a real-world cloud system.
References 1. Erl T, Puttini R, Mahmood Z, Cloud computing: concepts, technology and architecture, 1st ed. Pearson Education India 2. Kaur A, Kaur B, Singh D (2019) Meta-heuristic based framework for workflow load balancing in cloud environment. Int J Inf Technol 11:119–125 3. Kaur A, Kaur B (2022) Load balancing optimization based on hybrid Heuristic-Metaheuristic techniques in cloud environment. J King Saud Univer-Comput Inf Sci 34(3):813–824 4. Juve B, Deelman E (2010) “Scientific workflows and clouds” XRDS: crossroads. ACM Mag Stud 16(3):14–18 5. Juve G, Chervenak A, Deelman E, Bharathi S, Mehta G, Vahi K (2013) Characterizing and profiling scientific workflows. Futur Gener Comput Syst 29(3):682–692 6. Wu F, Wu Q, Tan Y (2015) Workflow scheduling in cloud: a survey. J Supercomput 71(9):3373– 3418 7. Rao RV, Rao RV (2016) Teaching-learning-based optimization algorithm. Springer International Publishing, pp 9–39 8. Mao M, Humphrey M (2011) Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis on - SC 11. https://doi.org/10.1145/2063384.206 3449 9. Abrishami S, Naghibzadeh M, Epema DH (2013) Deadline constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gen Comput Syst 29(1):158–169. https://doi.org/10.1016/j.future.2012.05.004 10. Sahni J, Vidyarthi P (2018) A cost-effective deadline-constrained dynamic scheduling algorithm for scientific workflows in a cloud environment. IEEE Trans Cloud Comput 6(1):2–18. https://doi.org/10.1109/tcc.2015.2451649 11. Topcuoglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274 12. Alam M, Shahid M, Mustajab S (2021) SAHEFT: security aware heterogeneous earliest finish time workflow allocation strategy for IaaS cloud environment. In: 2021 IEEE Madras section conference (MASCON). IEEE, pp 1–8 13. Sajid M, Raza Z (2017) Energy-aware stochastic scheduler for batch of precedence-constrained jobs on heterogeneous computing system. Energy 125:258–274 14. Ilavarasan E, Thambidurai P (2005) Levelized scheduling of directed a-cyclic precedence constrained task graphs onto heterogeneous computing system. In: First international conference on distributed frameworks for multimedia applications. IEEE, pp 262–269 15. Shahid M, Alam M, Hasan F, Imran M (2020) Security-aware workflow allocation strategy for IaaS cloud environment. In: Proceedings of international conference on communication and computational technologies. Springer, Singapore, pp 241–252 16. Shahid M, Ashraf Z, Alam M, Ahmad F, Imran M (2021) A multi-objective workflow allocation strategy in IaaS cloud environment. In: International conference on computing, communication, and intelligent systems (ICCCIS). Greater Noida, India, pp 308–313 17. Ahmad F, Shahid M, Alam M, Ashraf Z, Sajid M, Kotecha K, Dhiman G (2022) Levelized multiple workflow allocation strategy under precedence constraints with task merging in IaaS cloud environment. IEEE Access 10:92809–92827 18. Alam M, Shahid M, Mustajab S (2023) Security prioritized multiple workflow allocation model under precedence constraints in cloud computing environment. Cluster Comput 1–36
676
M. Imran et al.
19. Shishido HY, Estrella JC, Toledo CFM, Arantes MS (2018) Genetic based algorithms applied to a workflow scheduling algorithm with security and deadline constraints in clouds. Comput Electr Eng 69:378–394 20. Ding R, Li X, Liu X, Xu J (2018) A cost-effective time-constrained multi-workflow scheduling strategy in fog computing. In: International conference on service-oriented computing. Springer, pp 194–207 21. Zuo L, Shu L, Dong S, Zhu C, Hara T (2015) A multi-objective optimization scheduling method based on the ant colony algorithm in cloud computing. IEEE Access 3:2687–2699 22. Hasan F, Imran M, Shahid M, Ahmad F, Sajid M (2022) Load balancing strategy for workflow tasks using stochastic fractal search (SFS) in cloud computing. Procedia Comput Sci 215:815– 823 23. Ahmed OH, Lu J, Xu Q, Ahmed AM, Rahmani AM, Hosseinzadeh M (2021) Using differential evolution and moth-flame optimization for scientific workflow scheduling in fog computing. Appl Soft Comput 112:107744 24. Bacanin N, Zivkovic M, Bezdan T, Venkatachalam K, Abouhawwash M (2022) Modified firefly algorithm for workflow scheduling in cloud-edge environment. Neural Comput Appl 34(11):9043–9068
Churn Prediction Algorithm Optimized and Ameliorated Vani Nijhawan, Mamta Madan, and Meenu Dave
Abstract Customer churn is a commonly faced problem by any of the service industries. The same issue exists in the domain of mobile telecommunication where all the service providers are giving a tough competition to each other by offering better rates and lucrative calling, message, and data pack plans. If the company can identify the churners, beforehand, then it may help their business to a great extent. So, to be able to predict this churn, behaviour is the need of the hour. The proposed model has applied decision tree algorithm for machine learning on the collected dataset and optimized it with a combination of genetic algorithm and hill climbing to predict customer churn. In the absence of literature on the same, a comparison has been done on the results before and after the application of hill climbing. After comparison, it has been shown that the results are better when decision tree model is optimized with genetic algorithm and hill climbing, in comparison with optimization done with GA alone and can give a better prediction. Keywords Machine learning · Decision tree · Genetic algorithm · Hill climbing · Churn prediction
1 Introduction Today, a country is called as rich, if it is rich in data. In the era of big data, enormous data is available in every sphere. The presence of bulk of data gives a good opportunity to use the lying data to predict the future trends. This is what is being done in data mining and machine learning wherein we make use of our very own data of the old customers to predict the performance or behaviour of current and upcoming customers. The proposed model predicts the churning trends of mobile telecom users V. Nijhawan (B) · M. Madan VIPS, Delhi, India e-mail: [email protected] M. Dave JaganNath University, Jaipur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_54
677
678
V. Nijhawan et al.
by implementing decision tree in an optimized way with a combination of genetic algorithm and hill climbing. And as a result, it was observed this combination is producing enhanced results when compared with the optimization done using genetic algorithm without hill climbing [1, 2].
1.1 Churn Terminology There are a few terms which are used in the churn taxonomy and explained as follows
1.1.1
Why and Wherefore for Churn
Customer churn can happen due to multiple reasons, and the reasons can be categorized as well as prioritized, if the two following categories are understood clearly. (a) Involuntary Churn:—Initiated by company. Here, service providers decide to remove subscribers from their subscriber list, due to fraud, non-payment, or similar reasons. (b) Voluntary Churn:—Initiated from customer end. This can be either incidental churn that does not happen because of customer dissatisfaction, e.g. change in financial condition or location of the subscriber, etc., or it can be deliberate churn which is primary area of concern and are more frequent. E.g. switch because of better technology, price sensitivity, etc. [3]. 1.1.2
Churn Management and Its Techniques
Churn is inevitable in every sphere. But it can be managed and looked upon after it has started happening or proactively can be anticipated. Churn can also be predicted by analyzing old data of customers who turned out to be churners. So, on the same line, this paper suggests an approach which applies machine learning for churn prediction. The proposed model in this paper applies decision tree optimized with a combination of local and global search approach for predicting the churn behaviour of the telecom customers.
1.2 Decision Tree Decision tree, a popular algorithm for prediction in classification problems, has been used in multiple problems in different domains. It resembles the structure of a tree, having its node representing a test on an attribute value and branches representing decisions and their corresponding outcomes [4–7].
Churn Prediction Algorithm Optimized and Ameliorated
679
It has been applied by making use of its various algorithms, namely • ID3:—Iterative Dichotomiser uses the concept of information gain or entropy to choose the best feature. • C4.5:—An improvement over ID3 algorithm, it makes use of depth first search approach and uses information gain as criteria for tree building. • CART:—Chi-square automatic interaction detector uses the value of chi-squares for all classes in a node. • CHAID:—It calculates Gini value and is a simple and popular method, which doesn’t involve complex calculations [4, 8].
1.3 Genetic Algorithm One of the evolutionary computation algorithms and genetic algorithm is a biology inspired algorithm which is used very effectively as an optimization algorithm. It follows several steps and iterations. GA starts by taking some random solutions for a particular problem and converts them into the chromosome representation. Every chromosome is assessed depending on its fitness value [9]. This is done to find out the more fit solutions, which further participate in the reproduction of new chromosomes. Every generation of the process improves the overall solution quality in the pool (of solutions), consequently leading to the deletion of bad solutions. The constituents of genetic algorithm process are • • • • • • •
Solution definition Generating equation for fitness function Pool of population Selection of parents Using genetic operators Selection of survivors Setting the terminating condition.
Out of these steps, few most significant steps are defining the equation for fitness function appropriately, selection of parents by applying appropriate selection mechanism, making use of genetic operators by selecting the suitable type of crossover, mutation methods to be applied [9, 10]. An equally important step is to decide about the survivor selection criteria to be transferred to the next generation.
1.4 Hill Climbing Hill climbing, a local search method, is used as an optimization tool. In this approach, an initial state is generated, and the neighbourhood solutions are searched and checked for a better solution in the vicinity, on its fitness [11].
680
V. Nijhawan et al.
2 Background Studies Literature has been reviewed in the field of prediction in telecom churn as well as in some other fields where predictive analysis in performed [12–14]. This section is divided into 2 segments, based on techniques being used by the authors.
2.1 Literature Review: Decision Tree and Genetic Algorithm Merger Many researchers have worked in the direction of churn prediction in the sector of telecommunication as well as in other sectors and made use of machine learning technique using decision tree, neural network, and many other techniques. This section reflects some pieces of similar works done by many researchers for prediction analysis. Lakshmi et al. [15] have carried out the research to find out the effect of factors that are qualitative in nature including qualification of parents, economic status, etc., on the academic performance of the students. They have made use of ID3, C4.5, and CART in the field of education for identifying the effects of various qualitative factors, in the performance of the students. It was found that the results (accuracy) given by CART were best out of the three. The authors also made use of genetic algorithm for identifying the most significant qualitative factors affecting student performance. Stein et al. [16] have their work in the field of intrusion detection keeping an eye over growing cases of computer attacks in personal as well as in the commercially. The authors have proposed a model which used genetic algorithm first and after that applied decision tree model. For the purpose of selecting features, GA was applied, and after the feature selection, C4.5 algorithm of decision tree model was applied. Their results showed that hybrid of GA and decision tree outweighed application of decision tree without feature selection for identifying attacks. Little similar to what reference [16] had done, Abbasimehr and Alizadeh [17] suggested a model using a combination of GA and C4.5 algorithm in the area of telecommunication for churn management. Their work was based on use of genetic algorithm for best feature selection (wrapper-based) and then made the use of decision tree classifier. The authors made a comparison for performance evaluation between the proposed model, correlation-based feature selection, and chi-square method feature selection. The measures considered for performance evaluation include accuracy, specificity, sensitivity, geometric mean as well as no. of rules generated. After the comparison, proposed model came out to be a better one in terms of accuracy, geometric mean as well as number of decision rules generated.
Churn Prediction Algorithm Optimized and Ameliorated
681
In the above review of literature, some research works have been summarized that made use of a combination of genetic algorithm with decision tree for prediction analysis, and all of these have got the success with the above-mentioned combination.
2.2 Literature Review: Genetic Algorithm and Hill Climbing Merger In the work done by Yusoff and Roslan in reference [18], a combination of genetic algorithms using elitism and hill climbing has been applied by the author. The authors have applied hill climbing optimization technique within genetic algorithm to generate the mutant of the solutions in the mutation step in genetic algorithm. The purpose being to find a better mutant of all and improve the algorithm performance. The results generated from GA using elitism when compared with the results generated with GA-HC hybrid using elitism on the real dataset, reflected that GA-HC with elitism has a clear win over GA with elitism to find out an optimal solution and returns a solution with better fitness value. Reference [19], Su et al. have applied almost the same combination of genetic algorithm and hill climbing with elite-based reproduction, and named it hybrid hill climbing genetic algorithm (HHGA) to solve the problem of protein structure prediction on the 2D triangular lattice. The simulations were carried out by the authors. These showed that HHGA and elite-based reproduction strategy-genetic algorithm (ERS-GA) can be applied successfully on the problem of protein structure prediction, where hill climbing has been applied both at mutation step as well as at crossover step. Smita and Girdhar in reference [20] have also agreed that GA can be mixed with any of the local search technique to get better results. The authors have applied a simple GA and a GA with changing crossover for solving the problem of travelling salesman and have proposed a hybrid algorithm in which hill climbing is applied at selection step in genetic algorithm. The experiments have been conducted using the tool MATLAB, and the corresponding results had shown that the proposed algorithm produced better and more optimal results, when compared to the simple GA. Author in [21] has worked on genetic algorithm and hill climbing combined with the software TRANSYT that is a popular tool used for traffic engineering, in the field of area traffic control. The author has named the model as genetic algorithm with TRANSYT hill climbing (GATHIC), and it had been applied to a well-known road network for fixed set of demands. As a result, GATHIC performed better in signal timing optimization in comparison with TRANSYT. However, it is a bit computationally demanding, but this deficiency was removed by introducing another algorithm named as ADESS. It is evident from Sect. 2.1 that many authors have worked with a combination of decision tree and genetic algorithm for making an optimized model for prediction analysis and have succeeded in their efforts. The literature review Sect. 2.2 has shown
682
V. Nijhawan et al.
the work in the field of optimizing the models by applying hill climbing in any of the steps of genetic algorithm to make it work more effectively for optimization. The next section explains the methodology used in the optimized model proposed.
3 Methodology of Proposed Optimized Model The proposed model has been designed for predicting telecom churn behaviour of the customers by applying machine learning technique, which aims at fine tuning the model based on learnings drawn by its old predictions. Model applies decision tree on the collected set of data for generating the first generation of solutions. These generated solutions are fed into the hill climbing algorithm, which in turn returns an ordered ranking of all generated solutions, on the basis of their fitness effectiveness. This generated ranking of solutions is used as an input by genetic algorithm for the parent selection process using elitism technique, and hence, the most fit solutions are picked up and passed to the next generation of genetic algorithm. The process keeps on repeating itself till the required number of generations are complete.
3.1 Framework of the Model Figure 1 shows the framework of the model and describes the complete flow of the work. Every step in this diagram has been elaborated in detail below. After the collection and cleaning of data, the dataset is divided into 2 categories— namely train and test, as per the process of machine learning. Decision tree represents the selected machine learning classifier to be applied as a part of proposed model.
Fig. 1 Caption: framework of optimized model
Churn Prediction Algorithm Optimized and Ameliorated
683
Generate population step in the framework shows creation of a random population of networks. After that, as per the GA process, fitness value is calculated for all generated solutions. After getting the fitness values of the networks, it is passed along with F1-score, accuracy, precision, and recall value, to hill climbing algorithm in the form of a matrix. Hill climbing algorithm checks for the best sequencing of the passed solutions and generates the most optimal sequencing of the solutions as per the performance measures and returns the ranked solutions. Some of the top ranked solutions are then passed to the next generation of genetic algorithm using the logic of elitism technique for parent selection. Genetic algorithm then follows the rest of its steps like crossover, mutation, and the whole process is repeated for the fixed number of generations. Then, in the final step, the trained model is applied on the test data to get the results of churn prediction.
3.2 Calculation of Fitness Function Equation As mentioned in the above section, the proposed algorithm applies GA for optimization process, and in genetic algorithm process, the most significant step is designing the fitness function equation. The measures which have been selected for the formulation of fitness function are accuracy, F1-score (which is a combination of precision and recall). The reason for selecting these two measures depends upon the nature of problem in consideration and the type of dataset, which is neither completely balanced and nor totally imbalanced. The equation for the fitness function for GA has been derived as below: f (n) = x ∗ a + y ∗ f 1
(1)
where f (n) is fitness value, a refers to accuracy, f 1 refers to F1-score, x refers to a normalization constant with value 0.80, y refers to a normalization constant with value 0.20, where x+y=1
(2)
In the above equation, a. Accuracy and F1-score have been added in the ratio of 4:1. The reason is that both accuracy and F1-score are to be considered for digging the optimal solution. But as per the dataset, the values of FP and FN are not symmetric, and this makes F1 to have a more weight in the equation, than accuracy. b. The process of normalization has been used in the fitness function equation, wherein, to normalize can be defined as (as per the dictionary) “to multiply (a
684
V. Nijhawan et al.
series, function or item of data) by a factor that makes the norm or some associated quantity such as an integral equal to a desired value (usually 1).” As mentioned in the definition above, normalize means to scale a value to 1, by multiplying it with a weight or a factor. Here, two weights “x” and “y” have been given the constant values 0.80 and 0.20 (in the ratio of 4:1), with accuracy and F1-score, so that the total value of fitness can be normalized to be between 0 and 1.
3.2.1
Validation of Fitness Function Equation
This equation (Eq. 1 in Sect. 3.2) has taken into consideration many factors. Those factors should be considered whilst designing the fitness function equation, which are specific to the type of problem as well as to the type of dataset under study. It is a necessary and a significant step to validate the designed equation, in the process of this research work. In the absence of availability of dataset of other researchers, the author has applied the fitness function equations of other researchers on the collected dataset and has observed the results, thereafter. In reference [22], fitness function is taken as a balance between the number of correctly classified instances and size of the tree, with extra parameters α1, α2, to fine tune their relative weight. Their equation is as follows: f f = α1 f 1 + α2 f 2
(3)
where f 1 =1 − ((Total samples correctly classified in sample set) /(Total samples in Training set))
(4)
and f2 =
TreeCurrent_depth Treetarget_depth
(5)
The parameters α1, α2 are the relative importance of the complexity term. Reference [23] has developed and applied two GAs and used a common fitness function equation for both, which is given as follows: ( Fitness =
TP TP + FN
) ( ∗
TN FP + TN
) (6)
where the term (TP/(TP + FN) is sensitivity (Se) or true positive rate, whereas the term TN/(FP + TN) is specificity (Sp) or true negative rate. Another reference [24] has used the same formulae for calculating fitness in their genetic algorithm with decision tree combination model, i.e. the fitness is calculated
Churn Prediction Algorithm Optimized and Ameliorated Fig. 2 Caption: fitness comparison of multiple fitness equations (10 generations)
0.8
685
Fitness Comparison of Multiple Fitness Equations
Fitness Value
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Generations Equation 0
Equation 1
Equation 2
Equation 3
as follows: Fitness = Se ∗ Sp
(7)
where Se and Sp are sensitivity and specificity, respectively. The above-mentioned equations have been used by the author to apply on the collected dataset and compare their performances with the performance of the fitness equation designed by the author. In the bar graph, shown below: 0 represents the fitness function equation of the proposed model. 1 represents the fitness function equation of reference [22]. 2 represents the fitness function equation of reference [23]. 3 represents the fitness function equation of reference [24]. As it is clearly seen from the bar graph shown in the Fig. 2 that the fitness function formulae designed by the author are completely appropriate for the type of research problem and the type of dataset under study and give better results than the others.
3.3 Flowchart of the Model The diagram below shows the detailed flowchart of the complete model. It shows each and every step including initialization of variables till printing of the resultant tree. Multiple processes in the following model can be discussed one by one. First step after getting the dataset from the source and clean it by dealing with missing values and redundancy is selecting the machine learning model or classifier and optimization algorithm, then, a random population of networks in generated
686
V. Nijhawan et al.
using the selected decision tree classifier. After that, calculation of the fitness of all networks based on the formulated fitness function is done. The 5 performance measures, i.e. fitness value, accuracy, precision, recall, and F1score of all solutions, are fed as the input to the hill climbing optimization algorithm to get a ranking of all solutions as output. Select some fit initial networks as parents for passing to the next generation. Select some random solutions and perform crossover using uniform crossover method, where every single gene is selected randomly from any of the parents. Then, mutation is performed by selecting any of the features of decision tree classifier randomly and assigning it any value from its domain of values. But as an exception, two parameters, namely criteria and splitter, are given the fixed value as “Gini” and “best” to get the best of results and add the mutated child into the population set. Repeat the complete process “n” number of times, where “n” is the number of generations and get the results (Fig. 3).
3.4 Working of Proposed Model This section describes the working of the model proposed by the author in details starting from making use of decision tree for generating pool of population, use of hill climbing and finally genetic algorithm as optimizer. The working of the proposed model starts by generating a random population of decision trees, where we get a fixed number (given as population size) of networks generated with decision tree classifier. Then, hill climbing algorithm is applied by providing accuracy, fitness value, precision, recall, and F1-score of the networks, as input. As a result, hill climbing algorithm returns the optimal ordering of the solutions supplied. This ordering highlights the better solutions in comparison with each other. Then, as per the genetic algorithm rule, some of the top ranked records (selected based on the ranking given by Hill Climbing) in the population set are selected directly, to be transferred to the next generation of the population. The number of networks (i.e. records) to be retained depend upon the value given to the retain variable. Then, some more records are selected randomly, depending upon the difference in the size of population set to be retained and the number of networks which are directly selected. Thus, the process of survivor selection mechanism takes place using fitness-based selection. After the selection of survivor, all the retained networks are referred as parents. Then, there, selection mechanism for parents, needs to be applied, and this is done using elitism preservation technique, in which good quality networks are considered for reproduction, so that fit children can be generated. Uniform crossover is performed after taking 2 random networks, and the generated networks are mutated using random selection type mutation. Then, newly created networks are added in the set of parents, after keeping a check on the size of
Churn Prediction Algorithm Optimized and Ameliorated
687
the population. The same process is repeated till the number of generations are completed.
4 Results and Finding After the application of the proposed model on the collected dataset, the results in terms of the 4 performance measures were recorded, and one instance of the same is shown in Fig. 4. This screen shows the fitness values of top 5 networks belonging to the final generation of genetic algorithm (Fig. 4).
4.1 Validation of Results In the absence of literature, using the application of machine learning with a combination of local and global optimization both in a single model, validation of results is done by applying the complete model and comparing its performance by application of combination of decision tree and genetic algorithm without applying the hill climbing at elitism in parent selection. It is interesting to observe that the results of applying genetic algorithm (with hill climbing) when proceeds to further generations and reach to a highly refined state, with very minor variations in few last generations. It can be seen there is a continuous growth in the highest fitness value, over the generations. As shown above, the average fitness value of all the generated trees in a particular generation is improving, generation by generation. It can be observed that there is an overall growth of approx. 6%, from being 63% (rounded off) in the first generation, and further to 69% (rounded off) in the last generation in case of the particular combination of input values. This has been there because of the application of machine learning in combination with optimization technique of genetic algorithm with the use of hill climbing. However, looking at the Fig. 5 which shows the results of average fitness of 10 generations of genetic algorithm without application of hill climbing technique, a change is evident. On matching the range of average fitness values in the graph in Fig. 5, ranging from 64% (approx) to 67% (approx) without hill climbing to the results in Fig. 4 ranging from 63% (approx) to 69% (approx) with hill climbing technique, there is a hike seen in the upper value, however, the lower range has decreased by 1% (Fig. 6).
688
Fig. 3 Caption: flowchart of optimized model
V. Nijhawan et al.
Churn Prediction Algorithm Optimized and Ameliorated
Fig. 4 Caption: snapshot of results of research finding
Fig. 5 Caption: average fitness of trees in each generation (10 generations)
Fig.6 Average fitness of trees in each generation without hill climbing (10 Gen.)
689
690 Table 1 Validation of results with and without hill climbing
V. Nijhawan et al.
Model applied Top fitness value (%)
Avg fitness Gen-1 (% approx)
Avg fitness Gen-10 (% approx)
Model without 75 HC
64
67
Model with HC
63
69
77
5 Conclusion and Future Scope Considering the results and graphs shared in Sect. 4.1, a comparison can be drawn between the work done with application of hill climbing and without its application in genetic algorithm. Table 1 below shares some significant results on the model with and without using hill climbing. Looking at the results shared above, it can be said that the proposed model with a combination of genetic algorithm and hill climbing has a clear win. As a suggestion for future work, the model can be altered to explore the results by applying neural network rather than decision tree with the same combination of optimization.
6 Declarations The authors have no competing interests to declare that are relevant to the content of this article. No funding has been received from any organization for the work carried out by authors.
References 1. Madan M, Madan S, Ameliorating metaheuristics in optimization domains. In: Methodologies, tools and operations research. International conference on European. Published by IEEE Explore. ISBN: 978-0-7695-3886-0 2. Madan M, Bio inspired computation for optimizing scheduling. In: Computer society of India. Springer, online ISBN: 978-981-10-6747-1, https://doi.org/10.1007/978-981-10-6747-1_8 3. Madan M, Dave M, Nijhawan VK (2015) A review on: data mining for telecom customer churn management. Int J Adv Res Comput Sci Software Eng 5(9) 4. Rokach L, Maimon O (2015) Data mining with decision trees theory and applications, 2nd edn. World Scientific Publishing 5. Nijhawan VK, Madan M, Dave M (2017) The analytical comparison of ID3 and C4.5 using WEKA. Int J Comput Appl. 167(11). ISSN 0975-8887 6. Nijhawan VK, Madan M, Dave M (2019) An analytical implementation of CART using RStudio for churn prediction. In: Information and communication technology for competitive strategies. Lecture notes in networks and systems, vol 40. Springer, Singapore
Churn Prediction Algorithm Optimized and Ameliorated
691
7. Nijhawan VK, Madan M, Dave M (2019) A comparative analysis using RStudio for churn prediction. Int J Innov Technol Explor Eng 8(7S2). ISSN: 2278-3075 8. Singh S, Gupta P (2014) Comparative study Id3, cart and C4.5 decision tree algorithm: a survey. Int J Adv Inf Sci Technol 27. ISSN: 2319:2682 9. Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning, 13th edn. Addison Wesley Publications. ISBN-10: 0201157675, ISBN-13: 978-0201157673 10. Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80:8091–8126. https://doi.org/10.1007/s11042-020-10139-6 11. Skiena SS (2020) The algorithm design manual. Springer International Publishing, Germany 12. Madan M, Madan S, Convalescence optimization of input allocation problem using hybrid genetic algorithm. J Comput Sci. ISSN 1549-3636 13. Madan M, Madan R (2013) GASolver-a solution to resource constrained project scheduling by genetic algorithm. Int J Adv Comput Sci Appl 4(2). ISSN: 2156-5570(Online) 14. Madan M, Madan R (2013) Optimizing time cost trade off scheduling by genetic algorithm 2(9):320–328 15. Lakshmi, Martin A, Begum R, Venkatesan V (2013) An analysis on performance of decision tree algorithms using student’s qualitative data. Int J Mod Educ Comput Sci 5:18–27 16. Stein G, Chen B, Wu AS, Hua KA (2005) Decision tree classifier for network intrusion detection with GA-based feature selection. In: Proceedings of the annual southeast regional conference, ACM, vol 2, pp 136–141 17. Abbasimehr H, Alizadeh S (2013) A novel genetic algorithm based method for building accurate and comprehensible churn prediction models. Int J Res Ind Eng 2(4) 18. Yusoff M, Roslan N (2019) Evaluation of genetic algorithm and hybrid genetic algorithmhill climbing with elitist for lecturer university timetabling problem. In: Tan Y, Shi Y, Niu B (eds) Advances in swarm intelligence. Lecture notes in computer science, vol 11655. Springer, Cham. https://doi.org/10.1007/978-3-030-26369-0_34 19. Su SC, Lin CJ, Ting CK (2011) An effective hybrid of hill climbing and genetic algorithm for 2D triangular protein structure prediction. Proteome Sci 9:S19. https://doi.org/10.1186/14775956-9-S1-S19 20. Sharma S, Gopal G (2015) Hybrid genetic algorithm and mixed crossover operator for optimizing TSP. Int J Comput Sci Mobile Comput 4(10):27–34. ISSN 2320–088X 21. Ceylan H (2006) Developing combined genetic algorithm—hill-climbing optimization method for area traffic control. J Transp Eng 132(8). https://doi.org/10.1061/(ASCE)0733-947X(200 6)132:8(663) 22. Jankowski JLD (2015) Evolutionary algorithm for decision tree induction, computer information systems and industrial management. In: CISIM 2015. Lecture notes in computer science, vol 8838. Springer, Berlin, Heidelberg 23. DR, Carvalho FAA (2004) A hybrid decision tree/genetic algorithm method for data mining. Proc Inf Sci 163(1–3):13–35. ISSN 0020-0255 24. MVC, CA, RC, VS (2012) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38(3):315–330.
Employee Turnover Prediction Using Machine Learning Mukesh Dhetarwal, Azhar Ashraf, Sahil Verma, Kavita, and Babita Rawat
Abstract This study goals to know the cause of the employees’ benefits and organizational retention tactics. Findings from critical research indicate that employees have rarer reasons to leave their offices, such as job pressure, job pleasure, job safety, work environment, motivation, wages, and salaries. In addition, employee reimbursements have a significant influence on the organization due to the costs associated with. However, an association must know the requirements of its employees, which will benefit administrations, implement specific tactics to advance employee performance, and lessen profits. Keywords Turnover intention · Job stress · Job satisfaction · Work environment · Retention strategies
M. Dhetarwal · A. Ashraf Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, India e-mail: [email protected] S. Verma (B) · Kavita · B. Rawat Uttaranchal University, Dehradun, India e-mail: [email protected] Kavita e-mail: [email protected] B. Rawat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_55
693
694
M. Dhetarwal et al.
1 Introduction Staff turnover is the main problem for numerous administrations around the globe. In fact, the study of the purposes of the change of staff has long been of great concern in the organization [1]. The turnover is huge a problem for organizations in today’s heated world struggle [2]. Recently, it was revealed that the practice of revenue is a chronic problem of organizations [3]. That can unfavorably affect the performance and effectiveness of organization. On the other hand, the allocation of staff surges the chances of losing the decent employees [4]. Therefore, retaining skilled workers is very important to employers, as they are well-thought-out important organizational efficiency and effectiveness [5, 31]. Otherwise, the transfer of staff has always been a significant issue to manage an organization. There are cost-related outcomes that would have guided the rental costs tolerated while on demand replacement, job loss during a person’s time stops and replacements, production lost while moving staff and the decline in productivity of new appointments over time job learning [6, 34]. However, high level of unemployment, it makes difficult for the persons to find appropriate employment, which decreases efficiency and modernization of both of you exclusively and association, and this can have a negative impact in development of the national economy [7, 32]. So, staff growth is a major problem for government as well organizations. In Malaysia, about 50% of organizations hired by employees profit in 2015. The below figure shows the function interest rate of Malaysia in 2016. For example, 8% of organizations have lower staff turnover rates of 11%, and 12% of organizations were contacted by employees’ profit of more than 60% [8]. Moreover, Southeast Asia and Malaysia are ranked second with the highest voluntary profits with an average of 7.0% and the third highest rate involuntary exchange rate of 10.5% [9, 33] Employees may choose to leave an organization if they are dissatisfied with their job or the overall work environment [10, 36], Conferring to, there are many reasons for an employee to leave a job or organization. Also, less learning and less feedback means fewer opportunities for growth and development. In addition, employees feel they are worthless and nameless, and are overwhelmed by overwork, the work-life gap, and reasons for maintaining trust in their managers. In addition, trust is serious to the presentation and well-being of the organization’s employees [35]. Other factors are also identified, such as the development of employment and career development, higher wages, individuals and colleagues, personal causes of changes in the goal, and the personal cause of the goal change. In addition, there has been a serious problem with the expenses caused by the employee’s benefit, and it has been found that it may have a negative impact on the organizational index. In addition, for mutualrelated factors, employees are one of the reasons why employment changes directly under the direct control. Due to the employment of illustrations, it is compliant with working conditions, management conflicts, resolution, or salary inequality, and employers can understand the cause of profits. Administrators can determine and resolve the problem within the organizations [10, 38]. In addition, the pleasure of the current work, the workability of the work, the work, and the operation of the
Employee Turnover Prediction Using Machine Learning
695
administrator, or the administrator is the general reason why the employee left the organization [11, 37]. It has also been observed that the negative assessment of the current employment can lead to complaints about efforts, and that it is possible to lead the cost of the resignation, thinking and profit on the resignation and the cost of profit. The [12], as suggested based on the organization, greeting management program, organizational communication, and other benefits, suggested four factors provided with employment opportunities. In addition, it reported stress for work, complex work, and job satisfaction and affected the rotation rate of employee and absence. Likewise, it discovered that difficult efforts have been influenced by their intentions or leaving the organization. In addition, it has studied factors that employees have motivated to leave effort in many fields. The results suggest that there are nine elements that resign or stay in the organization. This includes deprived working environments, useless partners, fruit management, overflow, home pressure, low payment, managers, administrators’ safety and solid, and non-ethical behaviors. According to, many factors affect the intention of the workers in any of the field. The results have revealed seven elements of labor, either salary, wellness, work, security lack, weak administrative support, rigorous rules, and absence of individual motivation. We also studied factors that contribute to profitability of the Malaysian construction business workers. Dedication to work, salary, employment, organizations, and leaders are the main reason for the staff leaving his work. Other authors, such as [13], have pointed out that the reasons for changing staff can be divided into 3 categories: first, job-related factors (e.g., job pleasure, pay, efficiency, administrative commitment) and second, individual factors (e.g., age, education, gender, period), and third due to personal reasons. Conversely, there are other reasons for employers to move workers out of nonwork situations beyond their control. We are talking about the personal health of our employees, which affects their performance at work. An example of this is migration and family problem. As already mentioned [14], the maximum number of revenue due to the work reasons is 38.4%, and the turnover due to the non-work-related reasons is 28.4%. Job gratification is an employee’s attitude toward the work. Job dissatisfaction motivates employee to leave the association work, and they leave the company, which leads to employment in other organizations [15]. Also, regular numbers were handled differently. Another theory means that two factors affect the decision of their employees, the simplicity of the exercise, the purpose of the movement, but also the desire to affect job satisfaction [16]. Satisfaction with this operation has a negative impact on profitability in the state of the task, supervision, and wages [17]. In addition, the dissatisfaction of the work is reducing the performance of the employee, reducing the dedication of the organization, and leaving the intent of the employee. Also, found that job satisfaction is directly negative for the intent of employee-related staff with the true interests of employees. On the contrary, some researchers have a large group of social interactions between motivation and loss of interaction between social interactions and interactions between team members, low partnership trends, as well as low partnership trend workgroups of frameworks. In the same way, [18] can lead to the interests of employees among these groups because large collections could not cope up with the modest pressure of social and customer
696
M. Dhetarwal et al.
service. In addition, it considered to have the impact of group cohesion to the intent of employees being resigned. To satisfy and change the positive relationship with the employee. According to research results, stress at work has a positive effect and has a significant effect on profit motive, which can be attributed to a lack of job’s satisfaction, [19]. It was also found that stress at work had a very positive effect on employees for profit. Workers with very depressing jobs are more possible to leave the association. As a result, depressed employees at work have lower job satisfaction and are more likely to consider leaving. On the other hand, depression is not the only problem at work, but it can be stressful due to dissimilarity in the work life, conflict between work and family. Worker health equilibrium is a way to balance work and personal health. Therefore, employees may leave the association due to the severe stress of overwork and the incomplete personal time they can invest without effort. Workplace fatigue is also a chronic stressor that workers may experience at work. Grade fatigue is emotional fatigue, doubt, and success among employees. Fatigue is related with the work-related outcomes, such as job changes. Therefore, fatigue predicts workers’ intention to retire [20–22]. Otherwise, if there is no manual for authority management leading to the change of the employee. Employee decisions do not affect the acts of the leader, and appropriate management can reduce the complaints of the workforce. The behavior and dissatisfaction of management can affect the intention of shit restoring the idea of the development of the employees. The reason that a new employee leaves the organization is that they are not employees that they do not allows the employees to participate in the complex tasks. The management of the organization requires business ethics to achieve the dedication of the organization, the positive impact on labor conditions, and the intent of the employee achieves [24, 38].
2 Literature Review The design of the numerical researches conducted in this study was created and intended to lengthily measure the efficiency of different monitored ML algorithms. Details of experimental designs are shown here to explain the evaluation criteria that are used to implement the numerical experiments conducted in this study [25] Evaluation Matrix. Recruitment statistics require that the inequality between departure and retiring needs to be considered. As mentioned above, the interest rate gain amount remains less than 0.51 (bank data: 0.2844, IBM record: 0.1613), which realizes the accuracy of natural preload. Additional test metrics were introduced to offer comprehensive input and analysis of the results to address this issue. In this survey, repentant employees were assigned to the positive category, and LN employees were included in the
Employee Turnover Prediction Using Machine Learning
697
negative category. Five test metrics to test the ML algorithms are learned in this study: (1) Precision is defined as the percentage of data properly classified by models [26, 39]. (2) Accuracy (PRC) is defined as number of really good things divided by the amount of actual collisions and false positives. (3) ROC was selected as the main endpoint of this study because it produces detailed data on the classification functions associated with uneven samples [27]. Probabilistic and Statistical Analysis: Multi-group comparisons of classifier characteristics (e.g., data type, size, and model choice) were performed using nonparametric Kruskal–Walli tests and Dun’s post hoc test. As a result, the maximum information coefficient (MIC) was introduced for quantitative estimation of linear and nonlinear correlation between characteristics. MIC can measure MI between the continuous and the individual RVs from 0 (regardless) to 1 (uncomfortable). Function ML ¼ x, y is defined as a variety of empty sizes and locations. KNN: The algorithm of the separation of the size (KNN) is a simple and controlled machine learning algorithm. Fix the classification and regression issues. Because these data are used, it is easy to implement and understand, but there is a major disadvantage to make it substantially slow [28]. Since LR shows a linear relationship, we examine how value of the dependent variable changes depending on the value of the independent variable. Logistic Regression: Logistic regression is one of most common machine learning algorithms for supervised learning techniques. It is used to determine categorical dependent variables using a specific set of the independent variables. Logistic regression determines the output categorical dependent variables. Therefore, the result must be a categorical value or a discrete value like yes or no, 0 or 1, true or false, and so on. [29] Decision tree is a controlled learning technology that can be used for the classification and regression issues, but it is recommended to solve the classification problem. This is a tree, which indicates the functionality of the internal node dataset. The branch is a rule to make decisions, and each knot of the leaves indicates the result. Random Forest: Random forest is a popular ML algorithm belonging to a training technology. You can use both the classifications and regressions operation of ML. This combines multiple classifiers to solve complex problems and improve model’s performance [30].
3 Methodology This study describes and evaluates numerous supervised machine learning algorithms in terms of ability to forecast employee turnover. This is an established method first published in 1964 by Moran and Squish. Decision tree method is conceptually simple but powerful. It can be interpreted intuitively. It can handle missing values and mixed features and the ability to automatically select variables. Nevertheless, the predictive power is not that competitive. The solution tree is not usually stable with the high
698
M. Dhetarwal et al.
model variance, and a small modification in the input data causes a large effect in the tree. Random forest (RF)accepts the ensemble approach that offers improvements compared to the main structure of decisions by combining weak students to form weaker students (see Bierman’s Newspaper). The ensemble method uses approaches for separation of contacts to improve the performance of the algorithm. In arbitrary forests, there are several decision-based trees based on the bootable educational set and are selected as a random sample of the predicted factor M as a partition candidate for the full set of a P-predictor for each tree solution. Most predictors after P are not considered. In this case, it is unlikely that every individual tree will be dominated by small number of influential predictors. Support Vector Machine: The support vector machine was first proposed by Vapnik and Cortes in 1996. SVM is typically used in the discrimination classifier required to assign new data sample to one employee sales forecast using machine learning 741 in possible categories. The main idea of SVM is to determine the hyperplane that separates the dimensional data into 2 classes that maximize geometric distances with the nearest data points included in the support vector. Practical linear SVMs give similar results as logistics regression. In addition to perform linear classification, SVM introduces ideas for kernel methods to effectively perform nonlinear classifications. This is a methodology for mapping a function that sends a property to a new feature space (which is typically high) that is disconnected from the data. For more information, see Muller and co-researchers.
4 Results Data: (Figs. 1, 2, 3 and 4).
Fig. 1 Data for SVM
Employee Turnover Prediction Using Machine Learning
699
Fig. 2 Model accuracy
Fig. 3 ROC curve for logistic regression
5 Conclusion Employee turnover is recognized as a key impediment to organizational development. In this study, we evaluated the performance of 10 supervised ML methods on different parameters.
700
M. Dhetarwal et al.
Examples of functional importance rankings and classifier visualizations to improve the interpretability of the employee turnover model, and suggestions for their correct use (Fig. 5). This study presents a powerful approach to predict the employee turnover using machine learning. Data sampling methods allow you to evaluate the impact of organization size on the performance of supervised ML models. In addition, several methods of philosophy and statistical indicators are to study the results. To the best of the writer’s knowledge, this approach is the first to predict employee turnover. Work in this field generally focuses on a single dataset with a single approach for estimation, which limits the generalizability of the results.
Fig. 4 Logistic regression confusion matrix
Fig. 5 Result
Employee Turnover Prediction Using Machine Learning
701
References 1. Alao D, Adeyemo AB, Zhao Y et al (2013) Analyzing employee attrition using decision tree algorithms. Comput Inf Syst Dev Inform Allied Res J 4:756 2. Al-Radaideh QA, Al Nagi E (2012) Using data mining techniques to build a classification model for predicting employee’s performance. Int J Adv Comput Sci Appl 144–151 3. Chang HY (2009) Employee turnover: a novel prediction solution with effective feature selection. WSEAS Trans Inf Sci Appl 6:417–426 4. Chien CF, Chen LF (2008) Data mining to improve personnel selection and enhance human capital: a case study in high-technology industry. Expert Syst Appl 34:280–290 5. Li YM, Lai CY, Kao CP (2011) Building a qualitative recruitment system via SVM with MCDM approach. Appl Intell 35:75–88 6. Nagadevara V, Srinivasan V, Valk R (2008) Establishing a link between employee turnover and withdrawal behaviours: application of data mining techniques. Res Pract Hum Resour Manag 16:81–97 7. Quinn A, Rycraft JR, Schoech D (2002) Building a model to predict caseworker and supervisor turnover using a neural network and logistic regression. J Technol Hum Serv 19:65–85; Sexton RS, McMurtrey S, Michalopoulos JO, Smith AM (2005) Employee turnover: a neural network solution. Comput Oper Res 32:2635–2651 8. Suceendran K, Saravanan R, Divya Ananthram DS, Kumar RK, Sarukesi K, Applying classifier algorithms to organizational memory to build an attrition predictor model 9. Tzeng HM, Hsieh JG, Lin YL (2004) Predicting nurses’ intention to quit with a support vector machine: a new approach to set up an early warning mechanism in human resource management. CIN: Comput Inf Nurs 22:232–242 10. Valle MA, Varas S, Ruz GA (2012) Job performance prediction in a call center using a naive Bayes classifier. Expert Syst Appl 39:9939–9945 11. Haq NF, Onik AR, Shah FM (2015) An ensemble framework of anomaly detection using hybridized feature selection approach (HFSA). In: SAI intelligent systems conference (IntelliSys). IEEE, pp 989–995 12. Punnoose R, Ajit P (2016) Prediction of employee turnover in organizations using machine learning algorithms. Int J Adv Res Artif Intell 5:22–26 13. Sikaroudi E, Mohammad A, Ghousi R, Sikaroudi A (2015) A data mining approach to employee turnover prediction (case study: Arak automotive parts manufacturing). J Ind Syst Eng 8:106– 121 14. McKinley Stacker IV (2015) IBM waston analytics. Sample data: HR employee attrition and performance [Data file]. Retrieved from https://www.ibm.com/communities/analytics/watsonanalytics-blog/hr-employee-attrition/ 15. Shahshahani BM, Landgrebe DA (1994) The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans Geosci Remote Sens 32:1087–1095 16. Géron A (2017) Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media 17. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830 18. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7:179–188 19. Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, Cambridge; Seddik AF, Shawky DM (2015) Logistic regression model for breast cancer automatic diagnosis. In: SAI intelligent systems conference (IntelliSys). IEEE, pp 150–154 20. Bakry U, Ayeldeen H, Ayeldeen G, Shaker O (2016) Classification of liver fibrosis patients by multi-dimensional analysis and SVM classifier: an Egyptian case study. In: Proceedings of SAI intelligent systems conference. Springer, Cham, pp 1085–1095. Employee Turnover Prediction with Machine Learning 757
702
M. Dhetarwal et al.
21. Mathias HD, Ragusa VR (2016) Micro aerial vehicle path planning and flight with a multiobjective genetic algorithm. In Proceedings of SAI intelligent systems conference. Springer, Cham, pp 107–124 22. Ye Q, Zhang Z, Law R (2009) Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst Appl 36:6527–6535 23. Durant KT, Smith MD (2006) Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In: International workshop on knowledge discovery on the web. Springer, Berlin, Heidelberg, pp 187–206 24. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794 25. Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526; Breiman L (2001) Random forests. Mach Learn 45:5–32 26. Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268 27. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232 28. Morgan JN, Sonquist JA (1963) Problems in the analysis of survey data, and a proposal. J Am Stat Assoc 58:415–434 29. Muller KR, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel based learning algorithms. IEEE T Neural Networ 12:181–201 30. Zhang H, The optimality of naive Bayes. AA 1:3 31. Dash S, Verma S et al (2022) Curvelet transform based on edge preserving filter for retinal blood vessel segmentation. Comput Mater Continua 71(2):2459–2476 32. Sharma R, Singh A et al (2022) Plant disease diagnosis and image classification using deep learning. Comput Mater Continua 71(2):2125–2140 33. Dash S, et al (2021) A hybrid method to enhance thick and thin vessels for blood vessel segmentation. Diagnostics (Basel, Switzerland) 11:11. https://doi.org/10.3390/diagnostics1 1112017 34. Ravi N et al (2021) Securing VANET using blockchain technology. J Phys: Conf Ser 1979 012035 35. Kaur N, Gupta D, Singla R, Bharadwaj A, et al (2021) Thermal aware routing protocols in WBAN. In: 2021 4th international conference on signal processing and information security (ICSPIS), pp 80–83. https://doi.org/10.1109/ICSPIS53734.2021.9652442 36. Vishnu NS, et al., PDF malware classifiers – a survey, future directions and recommended methodology. In: Security handbook. CRC Press, USA 37. Ramisetty S, et al, SC-MCHMP: score based cluster level hybrid multi- channel MAC protocol for wireless sensor network. In: Security handbook. CRC Press, USA 38. Kumar Y, et al (2021) Heart failure detection using quantum-enhanced machine learning and traditional machine learning techniques for internet of artificially intelligent medical things. Wireless Commun Mobile Comput Article ID 1616725, 16 pp 39. Kumar K, et al (2020) A survey of the design and security mechanisms of the wireless networks and mobile Ad-Hoc networks. IOP Conf Ser Mater Sci Eng 993:012063
Smart Card Security Model Based on Sensitive Information Reem M. Abdullah and Sundos A. Hameed Alazawi
Abstract Systems for important and sensitive data increasingly use smart cards, especially in large companies and institutions, such as bank cards, access authorization cards, and employee identification cards in human resources. The proposed method of authentication through the smart card helps to secure and legal data access a result of the information sensitivity of employees in important companies, ensuring the security and reliability of information and data, as well as the need for rapid retrieval of this data, and it has become necessary to build a system that meets these requirements. It was found that the use of smart cards in archiving and retrieving information indicates the importance and sensitivity of that data and information, and authentication methods using the smart card may differ from one system to another. Keywords Smart card · Sensitive · Authentication · Human resource · Personal information
1 Introduction The human resource department has taken a large space in organizations and companies in general as a result of the sensitivity of the archived information of employees and affiliates of those institutions. Because the advent of the data age and the rapid evolvement of economic globalization, many enterprises and companies have begun to realize that they must evolve human resource system and the digital information system it transforms; this goal is achieving, and various techniques must be used to ensure the protection of access to personnel data [1, 2]. In the event of a data leak, it can be difficult to rectify issues such as leaking of private information, such as telephone numbers or identification cards. Furthermore, centralized databases can present risks since anyone with access to the database can make changes to the information [3], even if a modification log is kept. This R. M. Abdullah (B) · S. A. Hameed Alazawi Computer Science Department, Al-Mustansiriyah University, Baghdad, Iraq e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_56
703
704
R. M. Abdullah and S. A. Hameed Alazawi
is a major issue for information systems such as credit reporting and academic systems that affect numerous people, making them highly susceptible to tamper. Therefore, safeguarding personnel data requires implementing various techniques to protect against unauthorized access [4]. It is important to prioritize data security when designing, developing, and implementing a human resource system and other information systems to protect the privacy of individuals and organizations [5]. The basic components of the smart card operating system in the security chain, as they provide protection for citizens and protect personal identity, and all smart cards contain (OS), these are firm wares and are specific to devices that provide functions, and an example of these functions is how to access the card storage unit, encryption, and authentication. It is safe to operate [6]. The Internet of Things covers a large number of multimedia services and products [7]. The Internet of Things provides great data security, an example of which is smart health care [8, 9], intelligent transportation, disaster response, and smart city [10, 11].
2 Related Work Smart cards play an important role in social life and economic. Using these cards securely is the main problem for many researchers and users alike. Authentication is one of the most important solutions to secure unauthorized access to personal data. In recent years, smart card authentication has gained the attention of researchers [12]. The following are some of the works related to the purpose of the manuscript, such as the purpose of designing the smart card, and these works are: • Magiera and Pawlak [13], worked on evolvement and validating security tools with IST E-COLLEG, EXTERNAL, PRODNET, and TeleCAREe projects, and the IST projects, TeleCAREe, EXTERNAL, PRODNET and E-COLLEG follow. Then, open R&D questions related to VOs security framework are addressed, are addressed framework, and are addressed. Via the authors’ results, we find, for example, that the industrial partners of the project are able to accept SSL and VPN to protect communication with external partners. Instead, E-COLLE’s ANTS technology had to be deployed and evolved [13]. • Chandramouli [14] proposed a design of the system that supports smart card deployment. IS-SCD requires knowledge in the areas of identity information, generation, storage, distribution, etc. the design basis is a business process known framework built using two credentialing specifications. The results are IS-SCD design and engineering methodology based on two credentialing specifications used for big-scale deployment [14]. • Li et al. [15], the user is entitled to dynamic management, and this is done using the Schnorr system and provides security. Three-factor MAKA protocols were proposed to resist the exploit of the password attack on MAKA [15].
Smart Card Security Model Based on Sensitive Information
705
• Bustillo et al. [16], to assess the validity of the results, the deductive approach was used. So that 300 copies of the questionnaire were distributed and 221 valid copies of the questionnaire were received, as the response rate was 74% [16]. • Xiang et al. [17], for identity management based on PBBIMUA, an approved scheme was presented, and this scheme was introduced for electronic health systems, where people’s biometrics were used [17]. On the other hand, the authors worked on the authentication side of smart card use, and these works are: • Chen and Zhang [18] suggest a biometric smart card-based authentication model (PSBA) that takes into account private health information. To save more security and potential attacks, the authors used an authentication and a security key simultaneously. • Pradhan et al. [19] proposed a remote authentication scheme using smart cards. The authors used a Lee scheme to demonstrate the strength of their proposal, providing safety features, in which the authors provided a chart for the stability of the strength of their system, and they attempt to resolve the issues discovered in Lee’s scheme by proposing their own scheme. • Kandar et al. [20] proposed a multi-server mechanism using biometrics where authentication is mutually done by message passing. The authors worked to create a session key for connections. The security analysis of the proposal proved the strength of the results with superiority. • Chiou1 et al. [21] proposed a secure and efficient authentication protocol with smart cards for wireless communications. There are two sections updating the password efficiently and contributions of confidentiality of the session key. But it turns out that the system is the vulnerable to guess the password via the internet using smart cards through attack. • Kumari et al. [22] suggest framework using smart card with the help of password. The proposed enhances elliptic curve cryptography (ECC)-based authentication framework for the same environment. The proposal is safe to make the password without the need to connect to the internet.
3 Smart Card Authentication In the world of technology, smart cards have an important and significant role in the economy and also have a strong role in social interactions. But there are security weaknesses for these cards, and these weaknesses are a source of fear for many researchers and users of these smart cards. It has authentication. It has been used as a security solution in order to protect data. In recent years, smart card-based password authentication has received a lot of attention from researchers. We note that the great and rapid development of technology has helped a lot in accessing any service at anytime and anywhere, and that smart cards have brought about a fundamental. In recent years, consumers have changed and smart cards use encryption methods to
706
R. M. Abdullah and S. A. Hameed Alazawi
validate consumers and store information [12]. Therefore, it is noted that smart cards are attacked and also obtain keys that help intruders to enter, and thus there is a great need to verify the security of information and the password-based authentication method. Therefore, the server must have a password and an ID to keep in order to access the server’s resources. It is possible to steal the ID and the password from an internal user, and he can impersonate a legal user, as the user authentication of the remote smart card is an effective method and is used in many areas to access the server remotely [12]. A two-factor authentication framework has been proposed, and this authentication is backed by a simple password using a smart card [22]. Smart card-based authentication protocols have an efficient protocol [23]. Weaknesses have appeared in the remote authentication system using smart cards, so a scheme has been used that is capable of resisting the existing weaknesses [19]. To protect a system from unauthorized access, it must be able to determine who can access it. This process involves two steps: identification and authentication, which are implemented through a two-step process [24]. During identification, the user is prompted to provide their identity, while authentication involves verifying that identity. Typically, this is done by entering a username and password or using a secure token. These credentials are then compared to the information stored in the directory services database. If the user’s credentials match the stored information, they are granted access to the system, otherwise access is denied [25].
3.1 Authentication Factors The four basic factors are used for authentication, and they are [19, 24]. • What you know: It is an information such as information that must be confidential and only known to the legitimate user, who is only the one who knows it (such as the PIN, security code, and password). • What you are: It includes a unique human feature that is specific to the legitimate user (such as fingerprint, retina scan, voice recognition, and face). • Where you are: It includes location information that only belongs to the legitimate user (such as the Global Positioning System (GPS), internet identification (IP) address, and cell phone tower). Image ID.
3.2 Authentication Criteria There are many metrics which are important because they consider the element as an identifier used to determine the occurrence [24]. • Uniqueness: In this case, the identifier must be unique and different for each subject. Global: In this case, each topic must have at least one identifier. • Simplicity: Here the identifier must be collected with ease.
Smart Card Security Model Based on Sensitive Information
707
• Permanence: In this case, the conformity criterion must be disregarded, so that the limiter must work in the same way with overtime. • Storage ability: Here, the process of storing the identifier must be possible.
3.3 Biometric Approaches of Authentication To address the security challenges, a number of authentication methods have been proposed, and these methods are incompatible with the usability of the system. In the end, the best solutions are obtained by modifying the techniques of traditional passwords, and the authentication of information is a method widely used by security experts before accessing the system in order to verify the identities of users [26, 27]. The growing need for stronger authentication measures to prevent hacking has led to the adoption of biometric authentication methods to protect against unauthorized access to systems [26]. There are different types of systems provided by the use of human characteristics for biometrics. In complex parts, the most commonly used biometric authentication methods are passwords and fingerprint [26, 28]. Fingerprint biometric authentication The fingerprint is used to prevent unauthorized access to the system and detecting crime [29]. Face Recognition Technology It is used to recognize faces, which is a computer application to determine what a person’s identity is or to verify through a video or digital image, and this is a natural way [30]. IRIS Technology The iris method is used as an identification method, as there are unique types of iris obtained through a system used to obtain images in which we observe the iris of the eye with a complex pattern [31]. Hand Geometry Technology As each person’s hand is shaped differently and does not change over time [32]. Retina Geometry Technology This technique depends on the pattern of the blood vessels within the retina of the eye, and these vessels are located in the back of the eye, and each eye is different from another eye, and also every person is different from the other person [33].
708
R. M. Abdullah and S. A. Hameed Alazawi
Password biometric authentication The password is used to access the user, and the user can use an easy and simple password through the use of a text, but if it is simple, these texts can be vulnerable to attacks [34].
4 Card Classification Cards can be classified into two categories: chip cards and without chip cards. Chip cards, also known as smart cards, are so called because of the chip, which is their distinguishing characteristic. The card is referred to as a memory card or a microcontroller chip if the chip is a memory chip. In this case, the card is known as the processor card. Processor cards are divided into processor cards with or without processors, like elliptic curve cryptosystems (ECC) or Rivest, Shamir and Adleman (RSA) [35]. A tree chart is similar to what is shown in Fig. 1. The top level includes all types of cards, which can have various formats. This category divides the types of cards that are commonly used and can also be extended to include the use of smart card machine technology. Devices are ‘super smart cards’, and tokens are the best-known. examples [6, 35].
4.1 Smart Card Features During the development of smart cards, specifications and a number of standards have been defined to ensure that the cards and applications card acceptance devices can work together [36]. Smart cards are devices designed to save and, in most of the condition, process data. They are very much durable, which makes them suitable for all applications involving identification, payment, and authorization [6].
Fig. 1 Classification of cards with and without chips [35]
Smart Card Security Model Based on Sensitive Information
709
Fig. 2 Contactless smart card [11]
The card technology has development. Since the 1970s, smart cards are a means of security and storage. The advanced cards contain microprocessors and memory that are used for secure storage and processing and are used in security applications that use algorithms of the common key or the public key. Smart cards are contactless, as shown in Fig. 2. Contactless smart cards communicate by means of a radio frequency ID (RFID ) in a system of less than 2 feet. Contactless smart cards work through physical contact between the card reader and the 8-pin contact [6, 11, 36].
4.2 Personal Information Systems (PIS) Personal Information Systems (PIS) are, sometimes, also mean human resource data systems (HRIS). The main different of the Personal Information Systems (PIS) and human resource data systems (HRIS) is the database, and PIS core offering consists of a database for storing employee information. HR professionals can save all personnel data into the system that can be accessed from any time, from anywhere [37, 38]. Personal data system has always been an indispensable part of human society’s life and work. It includes file management of company and government employees, school students and teachers, as well as registered members of hotels and airlines, even includes the national credit information system [2]. In this architecture, the person client can change the data saved in the central database at any time after obtaining the license. The central administrator can control the database, has high privileges, can authorize other users to access, or even modify the database [38].
710
R. M. Abdullah and S. A. Hameed Alazawi
5 Proposed Security Model From this proposal, there is a main goal to develop and design the smart card authentication, that is based on biometric information, and the authentication scheme consists of four stages: 1. The first phase is the user registration stage in which the user is subject to registration once (as a new user or a new employee). 2. The second phase is the stage of authentication and authorization to enter the system; at this phase, A. Dataset collection, for each user: • Enter the face image into the system database • Extracting facial features. B. Data preprocessing • Convert image to gray • Image reduction • Image Resize. C. Building a multi-class CNN model, which includes • The input layer • The hidden layers created with effective functions commensurate of the features of the input images. • The output layer, where the number of classes is the number of users (i.e., the number of users in the system). After the classification stage by CNN, user input is documented and authorized (classified in CNN output). 3. The third phase allows the user to enter the system using his smart card and obtain information about his service summary.
6 The Conclusion Important and sensitive information of smart cards is widely used by large companies and organizations, where huge and private information is stored securely. The cards are divided into cards with chips and cards without chips. There are some researchers who worked on developing security tools using specific projects, some of them designed a system that supports smart card deployment, and many researchers used the smart card because it stores very large information and with high security. New threats appear continuously, regardless of the sources of current threats, and these new threats need to be known in order for the correct treatment process to
Smart Card Security Model Based on Sensitive Information
711
take place, where many methods have been used by researchers despite the provision of security in them, they are considered old methods, so we note the emergence of new methods such as biometric security, which are biometric security programs to automatically identify people based on their behavior or behavioral biology. These methods are considered the best because they are safer and more accurate. This is what we suggest using in the phase of authorizing access to personal information through the smart card. Acknowledgements The authors are thankful to the Department of Computer Science, College of Science, Mustansiriyah University (https://uomustansiriyah.edu.iq/e-newsite.php), for supporting this work.
References 1. Kim H-S, Jeong H-Y, Joo H-J (2019) The big data visualization technology based ecosystem cycle on high speed network. Multimedia Tools Appl 78(20):28903–28916 2. Sun Z, Strang K, Firmin S (2017) Business analytics-based enterprise information systems. J Comput Inf Syst 57(2):169–178 3. Soomro ZA, Shah MH, Ahmed J (2016) Information security management needs more holistic approach: a literature review. Int J Inf Manage 36(2):215–225 4. Cheng L, Liu F, Yao D (2017) Enterprise data breach: causes, challenges, prevention, and future directions. Wiley Interdiscip Rev: Data Min Knowl Disc 7(5):e1211 5. Sokolova M, Matwin S (2016) Personal privacy protection in time of big data. Challenges in computational statistics and data mining. Springer, pp 365–380 6. Pelletier M-P, Trépanier M, Morency C (2011) Smart card data use in public transit: a literature review. Transp Res Part C: Emerg Technol 19(4):557–568 7. Baccelli E, et al (2014) Information centric networking in the IoT: Experiments with NDN in the wild. In: Proceedings of the 1st ACM conference on information-centric networking 8. Karthigaiveni M, Indrani B (2019) An efficient two-factor authentication scheme with key agreement for IoT based E-health care application using smart card. J Ambient Intell Human Comput 1–12 9. Chaudhary RRK, Chatterjee K (2020) An efficient lightweight cryptographic technique for IoT based E-healthcare system. In: 2020 7th international conference on signal processing and integrated networks (SPIN). IEEE 10. Park J-E, et al (2018) IoT based smart door lock. In: Proceedings of the Korean institute of information and communication sciences conference. The Korea Institute of Information and Communication Engineering 11. Shende D, Jagtap S, Kanade M (2019) IOT based energy meter billing and monitoring system. J Sci Res Devel 7:984–986 12. Dowlatshah K et al (2020) A secure and robust smart card-based remote user authentication scheme. Int J Internet Technol Secur Trans 10(3):255–267 13. Magiera J, Pawlak A (2005) Security frameworks for virtual organizations. Virtual organizations. Springer, pp 133–148 14. Chandramouli R (2008) Infrastructure system design methodology for smart ID cards deployment. In: IADIS international conference information systems. International Association for Development of the Information Society (IADIS), Algarve, Portugal 15. Li W et al (2019) Design of secure authenticated key management protocol for cloud computing environments. IEEE Trans Dependable Secure Comput 18(3):1276–1290
712
R. M. Abdullah and S. A. Hameed Alazawi
16. Cendana DI, Bustillo NV, Palaoag D (2018) Thelma. In: E-Purse transit pass: the potential of public transport smart card system in the Philippines. EasyChair 17. Xiang X, Wang M, Fan W (2020) A permissioned blockchain-based identity management and user authentication scheme for E-health systems. IEEE Access 8:171771–171783 18. Chen L, Zhang K (2021) Privacy-aware smart card based biometric authentication scheme for e-health. Peer-to-Peer Network Appl 14(3):1353–1365 19. Pradhan A et al (2018) Design and analysis of smart card-based authentication scheme for secure transactions. Int J Internet Technol Secur Trans 8(4):494–515 20. Kandar S, Pal S, Dhara BC (2021) A biometric based remote user authentication technique using smart card in multi-server environment. Wireless Pers Commun 120(2):1003–1026 21. Chiou S-F et al (2019) Cryptanalysis of the mutual authentication and key agreement protocol with smart cards for wireless communications. Int J Netw Secur 21(1):100–104 22. Kumari A et al (2020) ESEAP: ECC based secure and efficient mutual authentication protocol using smart card. J Inf Secur Appl 51:102443 23. Meshram C et al (2021) A robust smart card and remote user password-based authentication protocol using extended chaotic maps under smart cities environment. Soft Comput 25(15):10037–10051 24. Dasgupta D, Roy A, Nag A (2017) Advances in user authentication. Springer 25. Al-Naji FH, Zagrouba R (2020) A survey on continuous authentication methods in Internet of Things environment. Comput Commun 163:109–133 26. Yusuf N et al (2020) A survey of biometric approaches of authentication. Int J Adv Comput Res 10(47):96–104 27. Bharath M, Rao KR (2022) A novel multimodal hand database for biometric authentication. Int J Adv Technol Eng Explor 9(86):127 28. Kumari B, Gurjar P, Tiwari AK, A review study on biometric authentication 29. Liebers J, Schneegass S (2020) Introducing functional biometrics: using body-reflections as a novel class of biometric authentication systems. In: Extended abstracts of the 2020 chi conference on human factors in computing systems 30. Li L et al (2020) A review of face recognition technology. IEEE Access 8:139110–139120 31. Patel CD, Trivedi S, Patel S (2012) Biometrics in IRIS technology: a survey. Int J Sci Res Publ 2(1):1–5 32. Mohammed HH, Baker SA, Nori AS (2021) Biometric identity authentication system using hand geometry measurements. In: Journal of physics: conference series. IOP Publishing. 33. Alwahaishi S, Zdrálek J (2020) Biometric authentication security: an overview. In: 2020 IEEE international conference on cloud computing in emerging markets (CCEM). IEEE 34. Sarkar A, Singh BK (2020) A review on performance, security and various biometric template protection schemes for biometric authentication systems. Multimedia Tools Appl 79(37):27721–27776 35. Rankl W (2007) Smart card applications: design models for using and programming smart cards. John Wiley & Sons 36. Chen Z (2000) Java card technology for smart cards: architecture and programmer’s guide. Addison-Wesley Professional 37. Hwang Y, Kettinger WJ, Yi MY (2015) Personal information management effectiveness of knowledge workers: conceptual development and empirical validation. Eur J Inf Syst 24(6):588–606 38. Cui W, Zhang N (2017) Research and development of filing management system of school personnel information based on web. J Appl Sci Eng Innov 4(4):127–130
Brain Tumor Classification from MRI Scans Aman Bahuguna, Azhar Ashraf, Kavita, Sahil Verma, and Poonam Negi
Abstract Deep learning has recently been effectively utilized to learn complicated patterns in supervised classification tasks. The researchers want to utilize this machine learning technology to categorize photographs of brain tumors such as glioma, pituitary, and meningioma. This image data consists of 233 individuals and 3064 brain scans of patients with meningioma, glioma, or pituitary tumors. The transverse axial (plane), frontal (coronal plane), and lateral (sagittal plane) planes of T1-weighted contrast-enhanced MRI (CE-MRI) images are employed. Our study focuses on the axial pictures and extends on this dataset by incorporating axial photographs of brains that do not have tumors to enhance the number of images available to the neural network. Index Terms Brain MRI · Tumor classification · Machine learning · Convoluted neural network
1 Introduction A manual review of a patient and his or her test findings by a clinician is utilized to reach a diagnosis. With no automated tools to aid clinicians in treating and a small number of doctors present, there must be a greater danger of misdiagnosis and a A. Bahuguna · A. Ashraf Department of Computer Science and Engineering, Chandigarh University Mohali, Punjab 140413, India e-mail: [email protected] Kavita (B) · S. Verma · P. Negi Uttaranchal University, Dehradun, India e-mail: [email protected] S. Verma e-mail: [email protected] P. Negi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_57
713
714
A. Bahuguna et al.
longer wait time for patients to be seen. Instead of spending time with the patient, doctors should manually examine test results and images. Improved medical techniques mostly in manner of automatic equipment are required to raise doctor performance as well as shorten patient stay in health facilities and duration to rehabilitation in order to enhance patient care [1]. The project’s aim is developing computer-aided tools in assisting doctors in treating patients in such a way that misdiagnosis and patient waiting time are reduced. Particularly, our study accomplishes this goal by classifying brain tumor kinds based on patient brain pictures. Images demand a doctor’s evaluation of several image slices to establish health conditions, taking time from large complicated diagnosis. We aim to correctly classify brain MRI into cancer kinds so that doctors may decrease their workload by outsourcing the most challenging diagnosis to them [2]. Prior research has led in the creation of specific algorithms for the automated categorization of brain tumors. Cheng et al. [3] collected T1-weighted contrastenhanced pictures from 233 patients with three different forms of brain tumors: glioma, pituitary, and meningioma. A number of picture types are also included in the collection, including axial, coronal, and sagittal images. Figure 1 depicts some of these photographs. Cheng et al. utilized image dilation and ring-forming subregions on tumor locations to increase the accuracy for identifying brain tumors employing a Bag of Words (BoW) model up to 91.28%. They also used an intensity histogram and a gray-level co-occurrence matrix (GLCM) in addition to BoW, but the results were less consistent [3, 4]. There are three types of NNs studied: CNNs, RNNs, and DNNs. Given that the inputs are pictures, CNNs are largely employed in this work, while FCNNs have also been investigated. Despite previous attempts, such as the one mentioned above, to
Fig. 1 Example axial brain images
Brain Tumor Classification from MRI Scans
715
apply machine learning to medical data, there are few tools that make use of current breakthroughs in neural networks (NNs). While similar algorithms have been effectively utilized to find patterns in nonmedical images [5, 6], the proposed technique applies these to the medical images with limited datasets. Furthermore, using neural networks to medical imagery has the potential to result in faster and more accurate diagnosis. Neural networks may learn the shape of a brain and begin to discriminate between brains with and without tumors by incorporating pictures of brains without cancer [7]. Deep learning is commonly used to distinguish physiological structures in general. Applying neural networks to medical photographs has the potential to automatically deliver speed and better exact classification, and our study utilizes neural nets into the health profession, in which they are currently underutilized. The following are the primary contributions of this paper. • Using deep learning, develop a more generalized method for brain tumor classification. • Examine the use of tumorless brain images in the classification of brain tumors. • Evaluate neural networks empirically on supplied datasets in terms of image accuracy and patient accuracy.
2 Literature Review A public mind cyst database was generated from Nanfang Hospital, Guangzhou, China, and General Hospital, Tianjing Medical University, China, from 2005 to 2012 and was utilized in Cheng et al. [3] to categorize mind tumors in these concepts. This dataset was analyzed using three preliminary methods: a force graph with bars for values, a silver level co-occurrence mold (GLCM), and a bag-of-dispute (BoW). Rather of relying just on the cyst domain, Cheng et al. enhanced the lump region by distorting the figure to enhance the surrounding tissue, which is responsible for providing views into the carcinoma kind [8, 9]. In consideration of usage related to space monument matching (SPM) to identify local looks via estimating histograms, augmentation persisted by employing expanding ring formations helped for one countenance dilation and based by common normalized Euclidean distances. The local appearance is then evoked in BoW via language creation and histogram similarity, which are subsequently supplemented into a feature heading predicted on a classifier. BoW has the highest classification validity of all three approaches, with a score of 91.28%. This categorization process, however, is quite detailed and necessitates focusing on the tumor or the area of concern as well as lump life information. Animated nerve organ networks, on the other hand, are generalizable and may uncover local lineaments from representation suggestion unique [10]. Neural networks and their generalizability are relatively new concepts. After the rejection of allure in the 1990s, Hinton et al. [11] imported the approach of pre-preparation secret coatings independently using alone the knowledge of limited Boltzmann machines in 2006 (RBMs). This showed how selectively layering RBMs
716
A. Bahuguna et al.
Fig. 2 With 256 × 256 pictures, the loss and accuracy histories for Vanilla FCNN are shown
described how a persuasive design affected animate nerve organ networks. Ever since, the field of deep knowledge has grown, and more effective preparation approaches for live nerve organ networks have been established, resulting in a rapid improvement in the USAs competence. Figure 2 shows the examples of contemporary impacting living nerve organs networks. Convolutional affecting animate nerve organs networks rose to prominence in 2012, when Krizhevsky et al. [5] created a victorious convolutional interconnected system for the ImageNet competition that performed significantly higher than the rest most advanced-level model, despite being first introduced to the general populace in 1998 by LeCun et al. [6]. Understanding the potential for CNNs to demonstrate representation categorization, the calculating vision society chose affecting animate nerve organs networks as the state of the art from now on challenge. Since 2012, convolutional networks affecting live nerve organs have topped several classification tasks, notably the Galaxy Zoo Challenge, which ran from 2013 to 2014. Using renewals, rotations, and translations of photographs, Dieleman et al. [12] popularized how dossier augmentation may significantly boost dataset content. Preventing overfitting in animate nerve organ networks has been a key focus of research, and in 2014, Srivastava et al. [13] suggested quitter as an effective technique for avoiding neuronal co-familiarization. Dropout mistakenly dissolves neuron links, enabling units to improve more freely rather than depending on fresh units to learn appearance. Maxout layers, which are similar in kind, were created to introduce combination with truant. Maxout tiers are comparable to the typical feed-forward multilayer perceptron-linked system created by Goodfellow et al. [14], but they
Brain Tumor Classification from MRI Scans
717
employ a novel inciting function called the maxout component. This process needs the most continuous stimulation possible. Deep education research has come out with quicker practices for training neural networks, in addition to overfitting. Glorot et al. [15] reported that redressed uninterrupted wholes (ReLUs) worked as well as better than enhanced touching in the directed preparation of deep influencing animate nerve organs networks. This is due to ReLU’s nonlinear structure, which allows it to create sparse likenesses that perform well with often insufficient dossiers. Nesterov’s impetus [16–18] is a type of impetus refurbish that has been employed to alter animate nerve organs networks, while ReLUs indicate a change in nonlinearity desire to enhance understanding. Nesterov’s impulse follows the momentum from early amendments, which has supervised amendments in a future position. Nesterov’s impulse follows the momentum from premature amends that has monitored amends in a particular path to a future position. In contrast to normal push, where the gradient is recorded at the current region, this is not the case. Although still in the babyhood phases, neural network research has begun to interact with healing research. While previous studies have yielded encouraging findings [19, 20], it is only now that large amounts of healing dossier have begun to appear. Many of the concepts and plans from previous research on animate nerve organ networks have been directly applied to this study.
3 Methodology 3.1 Model CNNs have shown to be successful in supervised learning tasks. [6]. CNNs may be built with a variety of ways, but the usage and sequencing of Conv2D, MaxPooling2D, and fully connected layers are essentially the same.
3.2 Convolution Neural Networks (CNNs) CNN networks impacting live nerve organs are based on the assumption that neighboring inputs are strongly connected to one another. Figures use dots to represent values, and dots that are near together in representations have a stronger equivalence than dots that are farther away. Convolutional influencing animate nerve organs networks use local concept domains to infer local physiognomy from these subregions, with this acceptance in mind. In the convolutional levels, local feature prediction is a finished task. Convolutional and top-combining layers [21] are usually pushed through properly connected and thick stages in impacting animate nerve organ nets (i.e., every unit
718
A. Bahuguna et al.
inside a coating is attached with all neurons at all tier above). Densely connected layers act high-ranking interpretation in the neural network because they have brimming connections to all incitement in the premature coating. In the convolutional neural network, a nonlinearity function is used for each neuron in each layer except the combining tiers; otherwise, coatings may be crumpled into one cause requesting linear functions may be substituted accompanying administering only one uninterrupted function. The redressed linear parts, that have existed shown to correct interconnected system performance [22], are the nonlinear function secondhand in this place case. The model’s last layer is a dense layer with a three- or four-neuron softmax classifier that determines classification probabilities. The possibility of a picture falling into one of the three categories is represented by these neurons; three neurons represent the three types of brain cancers, while fourth additional neuron represents brains without tumors [23].
3.3 Data Collection There are 3064 T1-weighted contrast-enhanced pictures in the brain tumor collection arranged into three sets: coronal, sagittal, and axial images, as previously stated. These are the planes that are used to view brain images; they correspond to the lateral plane, transverse, and frontal planes, respectively. All of the photographs are 512 x 512 pixels in size, including 1025 sagittal images, 1045 coronal images, 994 axial shots, and 930 images of pituitary, and 1425 images of glioma and 708 slices of meningioma cancers were also included in the photographs. Many of the images in this collection relate to the same individual because they were taken from 233 patients. To prevent confusing the CNN with 3 separate brain surfaces with the similar class, the pictures were segmented into 3 planes, and this paper focuses on the axial images because tumorless brain scans are accessible in the axial plane. We have 191 patients and 989 images in our final brain tumor dataset. About 208 meningioma pictures, 492 glioma images, and 289 pituitary tumor images are among them. There are 3120 axial 256 × 256 pictures from 625 individuals in the tumorless brains MRI scans collection, with five shots from their brain scan splices picked at random to represent each patient.
3.4 Overcoming Overfitting CNNs contain a large number of training examples and train on a big dataset, but state-of-the-art networks have millions of them. Our neural nets have been at risk of overfitting due to the tiny dataset of brain tumor images. Overfitting [24] happens if neural network parameters memorize training information instead of generalizing
Brain Tumor Classification from MRI Scans
719
the input to detect correlations. Small datasets are frequently to blame. To reduce overfitting, we used a variety of approaches, comprising data augmentation, regularization using dropout, and parameter sharing, as seen in the image rotations and transformations below. Classifications of brain tumor pictures, like many other images, are insensitive to translations, transformations, scaling, and rotations. This facilitates the adoption of different data augmentation techniques. Data augmentation has been effective in enlarging limited datasets to minimize overfitting [12]. A variety of data augmentation techniques were applied in a series of experiments performed on the photographs. 1. Rotation: Images were rotated at random from a normal distribution with an angle ranging from 0° to 360°. 2. Shift: Images are randomly moved to the left or right by −4 to 4 pixels, as well as up and down by −4 to 4 pixels [12]. These little modifications, which were generated from a normal distribution, preserved the brains in the center of the image while altering their position enough to reduce visual memory. 3. Scaling: Images are randomly rescaled between 1.31 and 1.3 using Dieleman et al. [12]. 4. Mirror: The y-axis of each picture was reversed.
3.5 Model Creation Only photographs are accepted as input by this neural network. Several layer combinations were tried; however, the following proved to be the most beneficial for this neural network. • • • • • •
Conv2D layer with 64 filters of size 5 × 5 and stride of 1. MaxPool2D layer with pool and stride size 2 × 2. Conv2D layer with 64 filters of size 5 × 5 and stride of 1. MaxPool2D layer with pool and stride size 2 × 2. Fully connected layer with 1024 neurons. Dense layer having softmax activation and three or four neurons based on brain tumor including just in training or tumorless brain inclusion in training.
With the exception of MaxPool2D, each layer employed the ReLU with nonlinear characteristics, while the final three layers used dropout for helping in regularization and overfitting. This neural network will be referred to as CNN from here on. This neural network has neither convolutional or maxpooling layers and just represents images as input. The layers that made up this network were as follows. • 600 units in a dense layer. • 600 units in a dense layer. • Depending on whether the brain tumor is only in training or the tumor-free brain is also in training, the softmax layer has 3 or 4 neurons.
720
A. Bahuguna et al.
Each of these layers was likewise subjected to dropout and ReLUs. This neural network will be referred to as FCNN from here on. More data may be stored in this neural network than a single picture input. The neural network is available in two forms. Each version incorporates a CNN-like neural network. A second input layer, on other hand, shows the tumor’s position by displaying the same picture input or the highest and lowest x and y. Every one of them has its own neural network path that links to the preceding CNN. The following layers make up this second neural network path. • Dense layer with 600 units. • Dense layer with 600 units. The final layer of this route and the last densely connected layer from CNN were concatenated and linked to one last densely connected layer of 800 neurons prior to enter the softmax layer from CNN. In this research, we will designate to this neural network as ConcatNN.
3.6 Random Forests Breiman [25] developed random forests in 2001 as a blend of tree classifiers in which trees have been constructed on picked at random independent vectors. To integrate randomness into the dataset, each tree is given features with minor perturbations, and variation is even further introduced at the process level using attribute randomness. Only the brain tumor dataset was utilized in one prediction test, whereas both the tumorless brain dataset and the brain tumor dataset were used in the other [26].
3.7 Training Patients were randomly assigned to one of three sets for training, validation, and testing for every one of the resampled data, with 149, 21, and 21 patients, respectively. Because the structure of patient images is comparable, a patient represents all of the patient’s photographs. The requirement for both training and testing to aggregate patient data is eliminated, resulting in more accurate predictions [27]. To focus the data, the average image from training was eliminated from train, validate, and test. Figure 3 depicts an example of a mean picture. This resulted in higher accuracies than when the mean picture was left in place. Training data were utilized to update weights in the neural networks, while validation data provided information about how they operated. The number of hyperparameters can be changed. The hyperparameters with the highest accuracies are listed. • The regularization constant is 0.014.
Brain Tumor Classification from MRI Scans
721
Fig. 3 With 69 × 69 photographs, the loss and accuracy history for Vanilla FCNN is shown
• Rate of learning: 0.0001 • Batch size for non-enhanced datasets is 4 and for augmented datasets is 128. • Epochs: 100 epochs (plus one 500) to compensate for accuracy and training time Rather than maintaining a steady learning rate, a fading learning rate was used so increasing accuracy by gradually lowering it. However, under each scenario with the decaying learning rate, the accuracies were much lower than in the absence of them.
4 Experimental Results and Analysis The accuracies for the tests that were done are shown in Tables 1 and 2. The Vanilla CNN with an image size of 256 × 256 and a tumor brain dataset have the maximum accuracy of 91.43%, according to these results. Furthermore, the accuracies of perpatient and per-image were found to be consistent, implying that patient images predict similarly. Despite the increased compute time induced by the increased epochs in CO FCNN with picture size 45 × 45, the larger pictures taught neural networks more precisely, resulting in an 8% improvement in outcomes. The weights from the first convolutional layer of the top neural network are shown in Fig. 4. Minor structures in each of these five weight zones suggest low-level attributes. Fivefold cross-validation was performed to provide the loss and accuracy histories for each model’s training and validation sets in order to compare these models further (Figs. 11–20). Despite the fact that due to a shortage of instances, the 256 × 256
722
A. Bahuguna et al.
Fig. 4 Last models for per-picture accuracy have a mean precision of k, where k spans from 1 to 20
pictures overfitted over time, and their accuracies remained continuously higher than the smaller ones. When these models’ accuracy at k was investigated independently (Figs. 5–10), virtually all of them remained over 90%. Precision at k considers the top k most probable guesses in all pictures. It is a depiction of the photographs in which neural networks had the maximum confidence in classifying them. Having 90% accuracy means that for any neural network, the predictions with the highest probability were typically right. It is worth noting that from k = 1 to 20, any model that employed 256 × 256 pictures had a precision of 1.0. Larger pictures were fed into the neural network, resulting in improved classification success in terms of the algorithm’s best results. When comparing neural networks that employed tumorless brain datasets to those that did not, the findings were mixed. When tumor-free brains were added to smallersized images, they tied or increased accuracies by up to 2%. Tumor-free brains seemed to have somewhat lower accuracies in larger images. Looking for brain tumors with the Vanilla CNN 256 × 256, Tables 3–6 examine the confidence, precision, and sensitivity of the neural networks that performed best overall in terms of per-picture and per-patient accuracies. Figure 5 shows that for all k values from 1 to 20, the Vanilla CNN 256 × 256 obtained a perfect score for average accuracy at k. In Tables 3 and 4, we increase k for each cross-validation until the neural network forecasts per-image and per-patient accuracies incorrectly. Prior to wrongly identifying tumor type, the neural network averaged significantly more than half of the test pictures, with the best cross-validation reaching 90% of shots. In terms of per-patient accuracy, the neural network reaches more than half of the test patients on average, with the best cross-validation correctly predicting 100% of the patients. The precision and recall for the best model for per-picture accuracy but the last model for per-client error, which ranked highest in their individual accuracy metrics, were evaluated for meningioma, glioma, and pituitary cancer types. The essential
Brain Tumor Classification from MRI Scans
723
information is given in Tables 5 and 6. In terms of accuracy, these two models were the most accurate. The most difficult cancers to forecast were meningiomas, which had an average accuracy of 0.84 and recall of 0.74, whereas precision and recall rates for glioma and pituitary tumors were in the late 1997. Table 6 shows that average accuracy and recall for glioma, pituitary, and maligning tumors are nearly equal, with 93%, 93%, and 91%, respectively. Finally, the brain MRI scans data of tumor class along with the tumorless brain MRI scans were subjected to a random forest method. When compared to training neural networks, the former and latter consistently produced average improvements of close to 90%. The accuracy was unaffected by the use of tumor-free brain pictures.
5 Conclusion and Future Work CNNs are at the bleeding edge of computer vision, and their use in medicine might enhance present patient-diagnosis procedures dramatically. With training of CNNs, it has been attempted to recognize tumor types in brain scans, which improves performance metrics and hence paving the way for deep learning in medicine. Not only do neural networks provide similar or better results than Cheng et al. [3] original study, but they also use a more general methodology that simply takes one picture in understanding different categories of brain tumors. Moreover, the performance metrics per-patient metric matched the per-picture accuracy results, indicating that the deep dense network is continually producing predicted classes for patient photographs. Further study might build on this work by looking at neural algorithms that are trained on sagittal and coronal pictures. Furthermore, merging patient photographs from several planes not only increases dataset size but also offers extra information about tumor kind that is difficult to observe from a single plane. This has the potential to dramatically enhance meningioma tumor classification, which has shown to be the most challenging for neural networks to understand. Finally, reducing picture size significantly enhanced the efficiency of neural network training. Improving performance on tiny pictures can be immensely useful in educating and supporting clinicians with patient care. Handling the small-dimensional and noisy MRI scans can aid feed-forward networks in generalizing to more complicated MRI scans, which can aid clinicians to diagnose patients.
References 1. Kaur N, et al (2018) A survey of routing protocols in wireless sensor networks. IJET (UAE) 7(4.12):20–25. ISSN: 2227-524X 2. Gupta R, et al (2018) A comparative analysis of trust based applications in wireless sensor networks. Int J Eng Technol 7.4.12:73–77
724
A. Bahuguna et al.
3. Cheng J, Huang W, Cao S, Yang R, Yang W et al (2015) Correction: enhanced performance of brain tumor classification via tumor region augmentation and partition. PLoS ONE 10(12):e0144479. https://doi.org/10.1371/journal.pone.0144479 4. Shanker R, et al (2018) Analysis of information security service for internet application. Int J Eng Technol 7.4.12:58–62 5. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks, proc. neural information and processing systems. In: Tai Y, Yang J, Liu X (ed) Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) 6. Al-Waisy AS, Al-Fahdawi S, Mohammed MA, Abdulkareem KH, Mostafa SA, Maashi MS, Arif M, Garcia-Zapirain B (2020) Covid-chexnet: hybrid deep learning framework for identifying covid-19 virus in chest x-rays images. Soft Comput 1–16 7. Kavita et al (2018) Implementation and performance evaluation of AODV-PSO with AODVACO. Int J Eng Technol 7.2.4:23–25 8. Chandini, et al (2020) A canvass of 5G network slicing: architecture and security concern. IOP Conf Ser: Mater Sci Eng 993 012060 9. Gaba S, et al, Clustering in wireless sensor networks using adaptive neuro fuzzy inference logic. In: Security handbook. CRC Press, USA 10. Shanker R, et al, Efficient feature grouping for IDS using clustering algorithms in detecting known/unknown attacks. In: Security handbook. CRC Press, USA 11. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric frame work for learning from labeled and unlabeled examples. Mach Learn Res 7:2399–2434 12. Bassi PR, Attux R (2021) A deep convolutional neural network for covid-19 detection using chest x-rays. Res Biomed Eng 1–10 13. Apostolopoulos ID, Mpesiana TA (2020) Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med 43(2):635– 640 14. (2006) A survey of clustering data mining techniques. In: Grouping multidi mensional data. Springer, pp 25–71 15. Cheng H, Tan PN, Jin R (2010) Efficient algorithm for localized support vector machine. IEEE Trans Knowl Data Eng 22(4):537–549 16. Chollet F, et al (2015) Keras. https://keras.io 17. Dash S, Verma S, et el (2022) Guidance image-based enhanced matched filter with modified thresholding for blood vessel extraction. Symmetry 14(2):194. https://doi.org/10.3390/sym140 20194 18. Rani P, et al (2022) Robust and secure data transmission using artificial intelligence techniques in Ad-Hoc networks. Sensors 22(1):251. https://doi.org/10.3390/s22010251 19. Bernheim A, Mei X, Huang M, Yang Y, Fayad ZA, Zhang N, Diao K, Lin B, Zhu X, Li K, et al (2020) Chest ct findings in coronavirus disease-19 (covid-19): relationship to duration of infection. Radiology 200463 20. Cohen JP, Morrison P, Dao L (2020) Covid-19 image data collection. arXiv 2003.11597. https:/ /github.com/ieee8023/covid-chestxray-dataset 21. Hable R (2013) Universal consistency of localized versions of regularized kernel methods. J Mach Learn Res 14(1):153–186 22. Alber M, Lapuschkin S, Seegerer P, Hägele M, Schütt KT, Montavon G, Samek W, Müller KR, Dähne S, Kindermans PJ (2019) Innvestigate neural networks! J Mach Lear Res 20(93):1–8. http://jmlr.org/papers/v20/18-540.html 23. Das NN, Kumar N, Kaur M, Kumar V, Singh D (2020) Automated deep transfer learning-based approach for detection of covid-19 infection in chest x-rays. Irbm 24. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edn. Springer 25. Brunese L, Mercaldo F, Reginelli A, Santone A (2020) Explainable deep learning for pulmonary disease and coronavirus covid-19 detection from x-rays. Comput Meth Prog Biomed 196:105608
Brain Tumor Classification from MRI Scans
725
26. Crammer K, Singer Y, Cristianini N, Shawe-Taylor J, Williamson B (2001) On the algorithm implementation of multiclass kernel-based vector machines. Mach Learn Res 265–292 27. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theor 13(1):21– 27
Recognition of Handwritten Digits Using Convolutional Neural Network in Python and Comparison of Performance for Various Hidden Layers Himansh Gupta, Amanpreet Kaur, Kavita, Sahil Verma, and Poonam Rawat
Abstract In recent times, with the increase of Artificial Neural Network (ANN), deep learning has brought a dramatic change in the field of machine learning by making it more artificially intelligent. Deep learning is used in various fields because of its wide range of applications such as surveillance, health, medicine, sports, robotics, and drones. In deep learning, Convolutional Neural Network (CNN) is at the center of spectacular advances that combine Artificial Neural Network (ANN) and keep updating deep learning strategies. It is used in pattern recognition, sentence classification, speech recognition, face recognition, text categorization, document analysis, scene, and handwritten digit recognition. The goal of this paper is to observe the change of accuracies of CNN to classify handwritten digits using various numbers of the hidden layers and epochs and to see the difference between the accuracies. For this performance evaluation of CNN, we performed our experiment using the Modified National Institute of Standards (MNIS) and Technology (MNIST) dataset. Further, the network is trained using the stochastic gradient descent and the backpropagation algorithm. Deep learning is a machine learning technique that teaches the computers to do what comes naturally to the humans. Keywords Handwritten digit recognition · Convolutional network (CNN) · Deep learning · MNIST dataset · Hidden layers · Stochastic gradient descent
H. Gupta · A. Kaur Department of CSE, Chandigarh University, Gharuan, India e-mail: [email protected] Kavita (B) · S. Verma · P. Rawat Uttaranchal University, Dehradun, India e-mail: [email protected] S. Verma e-mail: [email protected] P. Rawat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_58
727
728
H. Gupta et al.
1 Introduction With time the numbers of fields are increasing rapidly in which deep learning can be applied. In deep learning, Convolutional Neural Networking (CNN) [1, 2] is being applied for visual imagery analyzing. Object detection, face recognition, robotics, video analysis, segmentation, pattern recognition, natural language processing, spam detection, topic categorization, regression analysis, speech recognition, and image classification are various examples that can be done using Convolutional Neural Networking. The accuracy in these fields involved handwritten digits recognition using Deep Convolutional Neural Networks (CNNs) has achieved human level perfection. Mammalian visual systems’ biological model is the one through which the architecture of the CNN is inspired. Cells in the cat’s visual cortex are sensitive to a tiny area of the visual field identified which is known as the receptive field [3, 4]. It was found by D. H. Hubel et al. in 1062. The neocognitron [5, 6], the pattern recognition model influenced by the work of Hubel et al. [7, 8] was the first computer vision. It was introduced by Fukushima in 1980. In 1998, the framework of CNNs is planned by LeCun et al. [9, 10] which has seven layers of the Convolutional Neural Networks. It was adapt in handwritten digits classification direct from pixel values of images [11, 12]. Gradient descent and backpropagation algorithm [13, 14] is applied for training the model. In handwritten recognition digits, characters are specified as input. The model can be perceive by the system. A simple Artificial Neural Network (ANN) has an input layer, an output layer, and some hidden layers in the middle of the input and output layer. CNN has a very alike architecture as ANN. There are various neurons in each layer in ANN. The weighted sum of all the neurons of the layer becomes the input of a neuron of the next layer added a biased value. In CNN, the layer has three dimensions. Here, all the neurons are not fully joined. Instead, every neuron in the layer is interconnected to the local receptive field. A cost function generates in order to instruct the network. It compares the output of the network with respect to the desired output. The signal propagates back to the system, repeatedly, for updating the shared weights and biases in all the receptive fields to minimize value of the cost function which increases the network’s performance [15, 16]. The goal of this article is to observe the influence of hidden layers of the CNN for handwritten digits. We have used a different type of Convolutional Neural Network algorithm on Modified National Institute of Standards and Technology (MNIST) dataset using the TensorFlow, a Neural Network library written in the python. The main purpose of this paper is to observe the variation of outcome results for using the different combination of hidden layers of Convolutional Neural Network. Stochastic gradient and backpropagation algorithm are used for instructing the network and the forward algorithm is used for the testing.
Recognition of Handwritten Digits Using Convolutional Neural …
729
2 Literature Review CNN is playing an essential role in many sectors like image processing. It has the powerful impact on many fields. Even, in the nano-technologies like manufacturing semiconductors, CNN is used for the fault detection and classification [16, 17]. Handwritten digit recognition has become issue of the interest in between researchers. There are a large number of papers and articles which are being published these days regarding this topic. In research, it is shown that deep learning algorithm like multilayer CNN using Keras with Theano and TensorFlow gives the highest accuracy in comparison with the most broadly used machine learning algorithms like SVM, KNN, and RFC. Because of its highest accuracy, Convolutional Neural Network (CNN) is being used on the large scale in image classification, video analysis, etc. Many researchers are trying to make the sentiment recognition in a sentence. CNN is being applied in natural language processing and sentiment recognition by varying the different parameters [18, 19]. It is pretty challenging to get the best performance as more number of parameters are needed for the large-scale neural network. Many researchers are trying to expand the accuracy with less error in CNN. In another research, they have shown that the deep nets performs better when they are trained by the simple backpropagation. There architecture results in lowest error rate on MNIST compare to the NORB and CIFAR10 [20, 21]. Researchers are working on this issue to reduce error rate as much as possible in the handwriting recognition. In one research, an error rate of 1.19% is achieved using the 3-NN trained and tested on the MNIST. Deep CNN can be adjustable with input image noise [22]. Coherence recurrent convolutional network (CRCN) is the multimodal neural architecture [23, 24]. It is being used in the recovering sentences in an image. Some researchers are trying to come up with the new techniques to avoid drawbacks of the traditional convolutional layer’s. Ncfm (No combination of feature maps) is the technique which can be used for better performance using MNIST datasets [25]. Its accuracy is 99.81%, and it can be used for large-scale data. New applications of CNN are developing day by day with various kinds of research. Researchers are trying hard to decrease the error rates. Using MNIST datasets and CIFAR, error rates are being analyzed [26]. To clean the blur images, CNN is being used. For this purpose, a new model was proposed using the MNIST dataset. This approach reached an accuracy of 98% and loss range from 0.1 to 8.5%. In Germany, a traffic sign of recognition model of CNN is suggested. It proposed the faster performance with 99.65% accuracy. Loss function was planned, which is applicable for the light-weighted 1D and 2D CNN. In this case, the accuracies are 93% and 91%, respectively.
730
H. Gupta et al.
3 Modeling of Convolutional Neural Network to Classify Handwritten Digits To recognize the handwritten digits, a seven-layered neural network with one input layer followed the five hidden layers and the one output layer. Small localized areas by convolving a filter with the previous layer. In addition, it consists of the multiple feature maps with the learnable kernels and rectified linear units (ReLU). The kernel size determines the locality of filters. ReLU is used as the activation function at end of each convolution layer as well as the fully connected layer to enhance performance of the model. The next hidden layer is pooling layer 1. It reduces the output information from convolution layer and reduces number of parameters and computational complexity of model. The types of pooling are max pooling, min pooling, average pooling, and L2 pooling. Here, max pooling is used to subsample dimension of each feature map. Convolution layer 2 and pooling layer 2 which has same function as the convolution layer 1 and pooling layer 1 and operates in same way except for their feature maps and kernel size varies. A flatten layer is used after pooling layer which converts 2D featured map matrix to the 1D feature vector and allows output to get handled by fully connected layers. The MNIST handwritten digits database is used for experiment. Out of 70,000 scanned images of handwritten digits from MNIST [27] database, 60,000 scanned images of the digits are used for training the network and 10,000 scanned images of digits are used for the purpose to test network. The images that are used for training and testing network all are the grayscale image with the size of 28 × 28 pixels. Character x is used to represent the training input, where x is a 784-dimensional vector as input of x is regarded as 28 × 28 pixels. The equivalent desired output is expressed by the y(x), where y is the ten-dimensional vector. The network aims is to find convenient weights and biases so that output of the network approximates y(x) for all the training inputs x as it completely depends on the weight values and bias values. To compute network performances, a cost function is defined, expressed by equation. C(w, b) =
2n
y(x) − a 2
2bn
(1)
x=1
The input layer consists of 28 by 28 pixel images which mean that network contains 784 neurons as the input data. The input pixels are grayscale with a value 0 for the white pixel and 1 for the black pixel. Here, this model of CNN has the five hidden layers. The first hidden layer is convolution layer 1 which is responsible for the feature extraction from an input data. This layer performs convolution operation to where w is cumulation of weights in the network, b is all the biases, n is total number of training inputs and a is actual output. The actual output a depends on the x, w, and b. C(w, b) is non-negative as all terms in the sum is non-negative. Moreover, C(w, b) = 0, precisely when desired output y(x) is comparatively equal
Recognition of Handwritten Digits Using Convolutional Neural …
731
to actual output, a, for all training inputs, n. To reduce cost C(w, b) to a smaller degree as a function of weight and biases, the training algorithm has to find the set of weight and biases which cause cost to become as small as possible. This is done using the algorithm known as the gradient descent. In other words, gradient descent is an optimization algorithm that twists its parameters iteratively to decrease the cost function to its local minimum. The gradient descent algorithm deploys following equations to set weight and biases. MNIST Dataset Modified National Institute of Standards and Technology (MNIST) is a large set of computer vision dataset which is extensively used for the training and testing different systems. It was created from two special datasets of National Institute of Standards and Technology (NIST) which holds the binary images of handwritten digits. The training set contains handwritten digits from 250 people, among them 50% training dataset was the employees from Census Bureau and rest of it was from the high school students. However, it is often attributed as first datasets among other datasets to prove effectiveness of neural networks (Fig. 1). w new = w old − η∂C∂w old
(2)
bnew = bold − η∂C∂bold
(3)
However, gradient descent algorithm may be unusable when training data size is large. Therefore, to enhance performance of network, a stochastic version of algorithm is used. In stochastic gradient descent (SDG), a small number of iteration will find effective solutions for optimization problems. Moreover, in SDG, a small number of the iteration will lead to the suitable solution. The stochastic gradient descent algorithm utilizes following equations: The output of the network can be expressed by Fig. 1 Graphical representation of cost versus weight
732
H. Gupta et al.
w new = w old − η∂C x j m∂w old bnew = bold − η∂C x j m∂w old To find amount of weight that contributes to the total error of network, backpropagation method is used. The backpropagation of network is illustrated by the following equations: The database contains 60,000 images used for training as well as few of them can be used for the cross validation purposes and 10,000 images used for the testing. All the digits are grayscale and positioned in the fixed size where intensity lies at the center of image with 28 × 28 pixels. Since all images are 28 × 28 pixels, it forms an array which can be flattened into the 28 * 28 = 784 dimensional vector. Each component of vector is a binary value which describes intensity of the pixel [28].
4 Results and Discussion 4.1 Discussion of the Obtained Simulated R In this section, CNN has been used on MNIST dataset in order to analyze the variation of accuracies for handwritten digits. The accuracies are obtained using the TensorFlow in Python. Training and validation accuracy for the 15 different epochs were observed by exchanging the hidden layers for the various combinations of convolution and hidden layers by taking the batch size 100 for all cases. Figures 2, 3, 4, 5, 6, and 7 show performance of CNN for the different combinations of the convolution, and Table 1 shows minimum and maximum training and validation accuracies of CNN found after simulation for six different cases by varying number of hidden layers for recognition of handwritten digit. In the first case shown in Fig. 4, the first hidden layer is convolutional layer 1 which is used for feature extraction. It consists of 32 filters with kernel size of 3 × 3 pixels, and rectified linear unit (ReLU) is used as activation function to enhance performance. The next hidden layer is convolutional layer 2 which consists of 64 Fig. 2 Observed accuracy for case 1
Recognition of Handwritten Digits Using Convolutional Neural …
733
Fig. 3 Observed accuracy for case 2
Fig. 4 Observed accuracy for case 3
Fig. 5 Observed accuracy for case 4
filters with the kernel size of 3 × 3 pixels and ReLU. Next, a pooling layer 1 is defined where max pooling is used with the pool size of 2 × 2 pixels to decrease the spatial size of output of a convolution layer. A regularization layer dropout is used next to pooling layer 1, where it randomly eliminates 25% of neurons in layer to reduce overfitting. A flatten layer is used after dropout which converts 2D filter matrix into 1D feature vector before entering into fully connected layers. The next hidden layer used after flatten layer is fully connected layer 1 which consists of 128 neurons and ReLU. A dropout with a probability of 50% is used after fully connected
734
H. Gupta et al.
Fig. 6 Observed accuracy for case 5
Fig. 7 Observed accuracy for case 6
layer 1. Finally, output layer which is used here as fully connected layer 2 contains 10 neurons for 10 classes and determines digits numbered from 0 to 9. A softmax activation function is incorporated with output layer to output digit from 0 to 9. The CNN is fit over 15 epochs with the batch size of 100. The overall validation accuracy in performance is found at the 99.11%. At epoch 1, the minimum training accuracy of the 91.94% is found, and the 97.73% of validation accuracy is found. At epoch 13, maximum training accuracy is found 98.99%, and at epoch 14, maximum validation accuracy is found 99.16%. The total test loss for this case is found approximately 0.037045. Figure 2 is defined for case 2, where the convolution 1, pooling 1 and convolution 2, pooling 2 are used one after the another. A dropout is used followed by flatten layer and the fully connected layer 1. Before fully connected layer 2, another dropout is used. The dimensions and parameters used here and for next cases are same which are used earlier for the case 1. The overall validation accuracy in performance is found 99.21%. At epoch 1, minimum training and validation accuracy are found. The minimum training accuracy is 90.11%, and minimum validation accuracy is 97.74%. The maximum training and validation accuracy is found at epoch 14. The maximum training and validation accuracies were 98.94% and 99.24%. The total test loss is found approx 0.026303.
4
3
4
6
3
4
3
2
5
3
4
1
Number of hidden layers
Case
100
100
100
100
100
100
Batch size
1
1
1
1
1
1
91.94
90.50
91.80
92.94
94.35
90.11
1
1
1
1
3
1
97.13
98.16
97.79
98.33
97.74
97.73
Accuracy (%)
Epoch
Epoch
Accuracy (%)
Minimum validation accuracy
Minimum training accuracy
15
13
15
15
14
13
Epoch
99.24
99.09
99.92
100
98.94
98.99
Accuracy (%)
Maximum training accuracy
Table 1 Performance of CNN for the six different cases for various hidden layers and epochs
13
12
13
15
14
14
Epoch
99.26
99.12
99.92
99.06
99.24
99.16
Accuracy (%)
Maximum validation accuracy
99.07
99.09
99.20
99.06
99.21
99.11
Overall performance validation accuracy (%)
Recognition of Handwritten Digits Using Convolutional Neural … 735
736
H. Gupta et al.
For case 3, shown in Fig. 2, where two convolutions are taken one after the another followed by the pooling layer. After the pooling layer, a flatten layer is used followed by two fully connected layers without any dropout. The overall validation accuracy in performance is found 99.06%. The minimum training accuracy is found 94.35% at epoch 1 and epoch 3, and minimum validation accuracy is found 98.33%. The maximum training and validation accuracies were 1% and 99.06% found at epoch 15. The total test loss is found approx 0.049449. Similarly, in case 4 shown in Fig. 4, convolution 1, pooling 1 and convolution 2, pooling 2 are used alternately followed by the flatten layer and the two fully connected layers without any dropout. The overall validation accuracy in performance is found around 99.20%. At epoch 1, the minimum training and validation accuracy were found. The minimum training accuracy is 92.94%, and minimum validation accuracy is 97.79%. Accuracy is found 99.92% at epoch 15 and 13, the maximum validation accuracy also found 99.92%. The total test loss is found approx. Again, for case 5 shown in Fig. 5, two convolutions are used one after the another followed by the pooling layer, flatten layer, and fully connected layer 1. A dropout is used before fully connected layer 2. The overall validation accuracy in performance is found 99.09%. The minimum training and validation accuracy is found at epoch 1. The minimum training accuracy is 91.80%, and minimum validation accuracy is 98.16%. At epoch 13, the maximum training accuracy is found 99.09%, and maximum validation accuracy is found 99.12% at epoch 12. The total test loss is found approx 0.034337. Finally, for case 6 shown in Fig. 6, the convolution 1, pooling 1 and convolution 2, pooling 2 are used alternately followed by the flatten layer and fully connected layer 1. A dropout is used before fully connected layer 2. The overall validation accuracy in performance is found 99.07%. At epoch 1, minimum training and validation accuracy is found. The minimum training accuracy is 90.5%, and minimum validation accuracy is 97.13%. The maximum training accuracy is found 99.24% at epoch 15, and maximum validation accuracy is found 99.26% at epoch 13. The total test loss is found approx 0.028596.
4.2 Comparison with Existing Research Work There are several methods of the digit recognition. The handwritten digit recognition can be improved by using some widely held methods of neural network like the Deep Neural Network (DNN), Deep Belief Network (DBF), and Convolutional Neural Network (CNN), etc. Tavanaei et al. proposed the multi-layered unsupervised learning in the spiking CNN model where they used MNIST dataset to clear the blur images and found overall accuracy of 98% and range of performance loss was 0.1– 8.5%. Rezoana et al. proposed a seven-layered Convolutional Neural Network for purpose of handwritten digit recognition where they used the MNIST dataset to evaluate impact of pattern of the hidden layers of CNN over the performance of overall
Recognition of Handwritten Digits Using Convolutional Neural …
737
network. They have plotted loss curves against the number of epochs and found that performance loss was below 0.1 for most of cases, and sometimes, in some cases, loss was less than 0.05. In another paper, Siddique et al. proposed an L-layered feed forward neural network for handwritten digit recognition where they have applied neural network with the different layers on MNIST dataset to observe variation of accuracies of ANN for the different combinations of the hidden layers and epochs. Their maximum accuracy in performance was found 97.32% for 4 hidden layers at the 50 epochs. Comparing with their above performances based on the MNIST dataset for purpose of digit recognition, we have achieved better performance for CNN. In our experiment, we have found maximum training accuracy 100% and maximum validation accuracy 99.92% both at the epoch 15. The overall performance of network is found around 99.21%. Moreover, overall loss is ranged from 0.026303 to 0.049449. Hence, this proposed method of the CNN is more efficient than other existing method for the digit recognition.
5 Conclusion In this paper, the variations of the accuracies for handwritten digit were observed for 15 epochs by varying hidden layers. The accuracy curves were generated for six cases for different parameter using the CNN MNIST digit dataset. The six cases perform differently because of various combinations of the hidden layers. The layers were taken randomly in the periodic sequence so that each case behaves differently during experiment. Among all observation, the maximum accuracy in performance was found 99.21% for 15 epochs in case 2 (Conv1, pool1, Conv2, pool2 with 2 dropouts). In digit recognition, this type of higher accuracy will cooperate to speed up performance of machine more adequately. However, minimum accuracy among all the observation in performance was found 97.07% in case 6 (Conv1, pool1, Conv2, pool2 with 1 dropout). Moreover, among all cases, the total highest test loss is approx. 0.049449 found in case 3 without dropout and total lowest test loss is approx. 0.026303 found in case 2 with the dropout. This low loss will provide CNN better performance to attain the better image resolution and the noise processing. In the future, we plan to analyze the variation in overall classification accuracy by varying number of hidden layers and the batch size.
References 1. LeCun Y et al (1989) Backpropagation applied to the zip code recognition. Neural Comput 1(4):541–551
738
H. Gupta et al.
2. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with the deep convolutional of neural networks. In: Advances in the neural information processing systems, pp 1097–1105 3. Hubel D, Wiesel T (1971) Aberrant visual projections in the Siamese cat. J Physiol 218(1):33– 62 4. Kaur N et al (2018) A survey of routing protocols in wireless sensor networks. IJET 7(4.12):20– 25. ISSN 2227-524X 5. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. The Nature 521(7553):436 6. Kavita et al (2018) Implementation and performance evaluation of AODV-PSO with AODVACO. Int J Eng Technol 7(2.4):23–25 7. Cire¸san D, Meier U, Schmidhuber J (2012) The multicolumn deep neural networks for the image classification. arXiv preprint arXiv:1202.2745 8. Fukushima K, Miyake S (1982) Neocognitron: a self-organizing neural network model for a mechanism of the visual pattern recognition. In: Competition and cooperation in neural nets. Springer, pp 267–285 9. LeCun Y et al (1990) Handwritten digit recognition with a back- propagation network. In: Advances in the neural information processing systems, pp 396–404 10. Shanker R et al (2018) Analysis of information security service for internet application. Int J Eng Technol 7(4.12):58–62 11. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to the recognition. Proc IEEE 86(11):2278–2324 12. Gupta R et al (2018) A comparative analysis of trust based applications in wireless sensor networks. Int J Eng Technol 7(4.12):73–77 13. Hecht-Nielsen R (1992) Theory of backpropagation neural network. In: Neural networks for the perception. Elsevier, pp 65–93 14. Chandin et al (2020) A canvass of 5G network slicing: architecture and security concern. IOP Conf Ser: Mater Sci Eng 993:012060 15. LeCun Y (2015) LeNet-5, convolutional neural networks, vol 20. http://yann.lecun.com/exdb/ lenet 16. Kuntz K, Sainfort F, Butler M et al (2013) Decision and simulation modeling in systematic reviews. Methods research report. (Prepared by University of Minnesota Evidencebased Practice Center under Contract No. 290-2007-10064-I.) AHRQ Publication No. 11(13)-EHC037-EF 17. Gaba S et al (2022) Clustering in wireless sensor networks using adaptive neuro fuzzy inference logic. In: Security handbook. CRC Press, USA 18. Daugman J (1988) Complete discrete 2D Gabor transforms by neural networks for the image analysis and the compression. IEEE Trans Acoust Speech Signal Proc 36(7):1169–1179 19. Shanker R et al (2022) Efficient feature grouping for IDS using clustering algorithms in detecting known/unknown attacks. In: Security handbook. CRC Press, USA 20. Haykin S (1999) The neural networks: a comprehensive foundation. Prentice Hall, Upper Saddle River, NJ 21. Dash S, Verma S et al (2022) Guidance image-based enhanced matched filter with modified thresholding for blood vessel extraction. Symmetry 14(2):194. https://doi.org/10.3390/sym140 20194 22. LeCun Y, Bengio Y (1995) Convolutional networks for the images, speech, and timeseries. In: Arbib MA (ed) The handbook of brain theory and the neural networks. MIT Press 23. Schapire RE (1990) Strength of weak learnability. Mach Learn 5:197–227 24. Ran P et al (2022) Robust and secure data transmission using artificial intelligence techniques in ad-hoc networks. Sensors 22(1):251. https://doi.org/10.3390/s22010251 25. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: British machine vision conference 26. Ciresan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep Big simple neural nets excel on the handwritten digit recognition. Neural Comput 22(12)
Recognition of Handwritten Digits Using Convolutional Neural …
739
27. LeCun Y et al (1995) Comparison of learning algorithms for the handwritten digit recognition. In: International conference on the artificial neural networks, France, pp 53–60 28. Zeiler MD, Fergus R (2013) Visualizing and understanding the convolutional neural networks. arXiv:1311.2901
Medical Image Watermarking Using Slantlet Transform and Particle Swarm Optimization Eko Hari Rachmawanto, Lahib Nidhal Dawd, Christy Atika Sari, Rabei Raad Ali, Wisam Subhi Al-Dayyeni, and Mohammed Ahmed Jubair
Abstract Image quality can be increased to achieve the best imperceptibility without losing its robustness which is one of the benefits of using optimal schema as the process of embedding secret messages originating from PSO. The proposed technology uses improved SLT transient localization as an improved version of discrete wavelet transform (DWT). In addition, SLT is a multi-line component, and SLT uses zero moment, which is a better signal amplifier than DWT and discrete cosine transform (DCT). PSO is a popular optimization technique and is used to balance failure with success, because PSO increases the weight in the embedding process. As we know that SLT and PSO have been used by researchers for watermarks, this paper is used to examine the potential of SLT–PSO to be understood without compromising its power. In this paper, a technique designed to improve SLT performance during experiments on medical images is demonstrated.
E. H. Rachmawanto (B) · C. A. Sari Informatics Engineering, Computer Science Faculty, Dian Nuswantoro University, Semarang 50131, Indonesia e-mail: [email protected] C. A. Sari e-mail: [email protected] L. N. Dawd · W. S. Al-Dayyeni Department of Computer Techniques Engineering, Dijlah University College, Baghdad, Iraq e-mail: [email protected] W. S. Al-Dayyeni e-mail: [email protected] R. R. Ali National University of Science and Technology, Thi-Qar, Iraq e-mail: [email protected] M. A. Jubair Department of Computer Technical Engineering, College of Information Technology, Imam Ja’afar Al-Sadiq University, 66002 Al-Muthanna, Iraq e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_59
741
742
E. H. Rachmawanto et al.
Keywords Watermarking · Slantlet transformation · Particle swarm optimization · Discrete wavelet transformation · Discrete cosine transformation · Medical image
1 Introduction Telemedicine is one part of the broader field of telematics business, which involves the use of electronic information systems that can receive, store, and distribute information remotely using telecommunications technologies such as the World Wide Web (WWW), email, video conferencing, and data, as well as remote meetings, telemedicine, and tetrobotox can be carried out. Telemedicine is a combination of medical communication technology related to medical fields such as telesurgery, telemedicine, and teleradiology. Watermarking security is a very important issue in many applications. In particular, submitting diagnostic results online carries as much risk as copyright protection, so it is important to hide medical image information in broadcast images. In order to protect the patient’s medical information, the distribution of medical images is implemented in [1]. Watermark technology can be used to solve digital medical image problems such as copyright protection and data authentication. There are several techniques currently used in telemedicine and medical imaging systems, some of which are the use of the discrete wavelet transform (DWT) method [2] and discrete cosine transform (DCT) [3]. The ideal watermark image can be produced with a watermark image that balances opacity and robustness [4]. Image watermarks meet the requirements for durability, opacity, and security. That is high enough to achieve three reasons. According to [4–6], no digital technology satisfies all these requirements. The proposed method using Slantlet transform (hereinafter referred to as SLT) provides better time localization. SLT has better resolution characteristics in the octave and multi-resolution ranges than DWT [7]. This SLT causes a loss of flexibility and awareness. PSO’s job is to generate weighting factors to gain insight without losing robustness. In the research conducted by Y.R, the results of the comparison with the previous research conducted by Wang [6] were introduced, where the SLT function was proposed in [8], and the experiments obtained in [9] and [10] were successful in implementing SLT and yielded results, the good one. The proposed method is suitable for improving the results of using SLT–PSO for medical image watermarking.
2 Comparison Method This section provides a comparison of techniques that have been successfully applied in other fields such as stenography. SLT, DWT, and DWT–PSO are common threads in this study. DWT is a common method for the watermarking process. Due to
Medical Image Watermarking Using Slantlet Transform and Particle …
743
its excellent spatial and multi-resolution capabilities, digital images are often used as objects for watermarking. In addition, the performance of DWT and SLT has improved. Meanwhile, we will discuss the implementation comparison between SLT and PSO. In addition, this white paper describes the capabilities of SLT, DWT, and PSO technologies [11] and [6]. The comparison can be seen in Tables 1 and 2. In Table 1, the tested data are shown with a 128 × 128 image and a 256 × 256 Gy image. Image format is only *.tif. The first table shows that SLT has better performance than DWT. The four PSNR images reached > 50 dB, indicating that SLT can be selected indiscriminately. According to [4], the human optical system (HVS) allows a PSNR of more than 30 dB. SLT is basically used for steganography, but SLT can also be used for watermarks. The ability of hybrid SLT in the application of other medical image enhancement techniques has been demonstrated by several researchers. For example, the application of automated medical images [9] and using SLT and supporting vector machines (SVM) to improve classification. Map of the organization itself SVM, SOM and radial basis function (RBF) [12], on the other hand, used the SLT and fuzzy methods which were performed using medical images. This study found excellent accuracy in the MRI characterization of the human brain. Another study in [10] used SLT and a backward propagation neural network (BPNN) implemented in an electronic tongue Table 1 Comparison of PSNR result between SLT and DWT
Image
Name of image
PSNR (dB) DWT (secret bit = 154)
SLT (secret bit = 154)
new1.tif (airplane)
26.6451
51.9163
new2.tif (lena)
26.2758
59.4998
new5.tif (boat)
26.7111
59.9642
new7.tif (peppers)
26.9762
54.3398
744
E. H. Rachmawanto et al.
Table 2 Comparison of PSNR result between DWT and DWT–PSO [6] Method
Original image
Watermarked image
PSNR (dB)
DWT
42.87
DWT–PSO
47.35
system to authenticate water samples. This proposed method improves accuracy by more than 80% in most of the watermark-related studies. In addition, DWT–PSO has been used by Y.R. Wang as his research method and achieved good performance. On the other hand, DWT–PSO has improved PSNR better than just using DWT. PSO is usually used to achieve imperceptibility through the fitness function. Table 2 provides brief information on the capabilities of the DWT–PSO. The watermarking scheme using the general image Fishingboat.jpg has been successfully implemented using DWT–-PSO. The algorithm presented in [6] uses PSO to find the optimal power and is trained iteratively to find the optimal solution for PSO based on a unique trap function that represents the quality of the watermark image. According to previous researchers and the capabilities of SLT and PSO, this paper can use SLT and the use of PSO on medical image watermarks.
3 Proposed Method 3.1 Slantlet Transform (SLT) SLT uses two zero moments and has the ability to increase the time zone better than DCT and DW. This is because the filter length is short, so the selection frequency of the messy filter bank is lower than that of the conventional DW filter bank, and the frequency selection improves the clock environment [7]. On the other hand, parallel structures have parallel branches [13]. Although SLT has its own drawbacks, this method has been successfully applied to pressure and noise [14]. Slantlet filter banks
Medical Image Watermarking Using Slantlet Transform and Particle …
745
Fig. 1 Two-scale filter bank and an equivalent structure [7]
come in a variety of scales and ratios. The size of each filter bank is 2. Two-stage liquid bank can be explained as shown in Fig. 1. There is an expected amount error. The 2l-SLT filter bank can be represented by the following mathematical model: Suppose gi (n), fit i (n), and hi (n) are filters used for signal analysis at scale i, and each filter supports 2i + 1 . For success, SLT uses multiple channel pairs. This example represents a total of 2 channels. Therefore, the hi (n) low pass filter is coupled with the adjacent f i (n) filter, and each filter follows a sample of at least 2. Each channel pair (l − 1) filter and inverse time version (i = 1, 2, …, 1) followed by a subtraction of 2i+1 . The filter had been applied. gi (n), f i (n), and hi (n) are multidimensional linear forms and can be expressed as: ai (n) = h i (n) = f i (n) =
a0,0 + a0,1 n, for n = 0, . . . , 2i−1 a0,0 + a0,1 n, for n = 2i , . . . , 2i+1 − 1
(1)
b0,0 + b0,1 n, for n = 0, . . . , 2i−1 b0,0 + b0,1 n, for n = 2i , . . . , 2i+1 − 1
(2)
c0,0 + c0,1 n, for n = 0, . . . , 2i−1 c0,0 + c0,1 n, for n = 2i , . . . , 2i+1 − 1
(3)
LT, on the other hand, has been applied in many fields such as image fusion [15], feature extraction [9], image classification [16], power system reliability [17], and water sample verification [10]. It can be concluded that the overall SLT performance is good. On the other hand, in this paper, I would like to investigate the application of SLT on watermarks using medical images.
746
E. H. Rachmawanto et al.
3.2 Particle Swarm Optimization (PSO) The PSO algorithm is simulated with a simple social model developed by Kennedy and Eberhart [18]. PSO takes several important parameters which can be defined as the startup process. Some of the important parameters include particle, speed, efficiency, and frequency, where each process is carried out by random particles. During the iteration process, each particle is built with the best individual. This is calculated by calculating a new speed term based on the distance from the individual vests. These are the following criteria for herd intelligence: • The swarm intelligent will be executed and had qualified requirement as follows: • Assume that the swarm is of size S. • Each particle pi , while (i = 1, 2, 3, …, S) from the swarm, is characterized by the following: • A current position pi (t) ∈ Rd, which refers to a candidate solution of the optimization problem at iteration t. • A velocity vi (t) ∈ Rd. • A best position pbi (t) ∈ Rd, which is found during its past trajectory. • Let pg (t) ∈ Rd be the best global position found over all trajectories that were traveled by the particles of the swarm. During the search process, the particles move according to the following formula: vi (t + 1) = wvi (t) + c1 .r1 (t)( pbi (t) − pi (t)) + c2 .r2 (t) pg (t) − pi (t) pi (t + 1) = pi (t) + vi (t)
(4) (5)
PSO has been applied to watermark optimization techniques such as mixing PSO and DWT [6] and [19], mixing PSO and DCT [20], and as a result. From these studies, it can be proven that PSO is proven to be effective.
4 Discussion Researchers have used PSO to increase expected accuracy results and address safety concerns. PSO can be used as a way to balance resistance and sensitivity which has been demonstrated in studies [21]. PSO can also be used to find precise scaling parameters as demonstrated in studies [22] and can provide efficient stability when using PSO to classify hyperspectral images [23]. In another study, Dan in his study conducted a study whether PSO can also be used as an extension to protect patient privacy with medical images. The idea of implementing and optimizing PSO SLT is expected to provide efficient results from the application of the watermarking method on images of medical needs. There have been no studies regarding the use of SLT to tag telemedicine images. In a PSO implementation, rounding errors can be used to
Medical Image Watermarking Using Slantlet Transform and Particle …
747
Fig. 2 Embedding process using 2L-SLT
(i) determine the best location for embedding, (ii) determine alternative parameters, and (iii) determine weighting points. In this article, SLT and PSO can be applied well to telemedicine watermarking, but the image results between those that have not been watermarked and those that have not cannot be distinguished (which means they are good) and without loss of robustness. Another evidence shown in this study is that SLT is more beneficial than DCT and DWT. In Fig. 2, the embedding process is shown. The first step is the SLT process on the original image that is processed by the filter bank, which is then carried out once again before performing the second step. In the second step, the gray watermark image is reconstructed into 0s and 1s vectors. In the third step, two irrelevant pseudo-random sequences are generated or can be called PN: one sequence is inserted with a watermark bit 0 (PN_0), and the rest of the sequence is inserted a watermark 1 (PN_1). In this step, each element is required to have the same amount as the SLT filter bank. In the fourth step, two pseudorandom sequences PN_O and PN_1 are included in the host image SLT filter bank with their weighting factors. This step aims to provide a mid-band coefficient matrix to the SLT filter bank. In the fifth step, an inverted SLT (ISLT) is applied which is then modified to the selected filter bank, and the watermark bit is included as in the previous step. The last step is to inverse SLT (ISLT) on the SLT converted image so that a watermarked host image can be generated. Figure 3 shows the extraction process that is carried out. The first step is the SLT process on the original image which is processed by the filter bank. This first step is done twice before doing the next step. In the second step, two pseudo-random sequences (PN_O and PN_1) are regenerated using the same seeds
748
E. H. Rachmawanto et al.
Fig. 3 Extracting process using 2L-SLT
used in the watermark embedding process. In the third step, the correlation between the filter bank coefficients and the resulting two pseudo-random numbers (PN_O and PN_1) is calculated for each filter bank. If the PN_O correlation result is higher than the PN_1 correlation, then the output bit watermark is considered O, other than that the recorded watermark is considered 1. The fourth step is to rebuild the recorded watermark and calculate the similarity between the original watermark and the extracted watermark using bit values. Furthermore, training to generate PSO weights can be represented as shown in Fig. 4. In Fig. 4, the SOP training workflow is shown. The first step is to determine the population cluster size, particle size, cognitive acceleration, and social acceleration and create a cluster equation as a way to determine the inertia weights. In the second step, there is an initialization term, and the initialization term is the value of the initial cluster, initial velocity, and initial fitness function. In the third step, the terms obtained from the process in the second step are obtained. In the fourth step, the embedding process is carried out to embed the watermark into the original image. The fifth step is to calculate whether the watermark value has improved. If the results obtained are not suitable, the process is carried out again starting from step 1 by adding the use of the PSNR value to evaluate beauty. The sixth step is to run the extraction process on the watermarked image and apply several attacks to the image. Experiment in this step uses JPEG compression, salt and paper, and Gaussian filtering. The seventh step used PSNR and NC to evaluate and verify the performance of the watermarked image. In the eighth step, the results are obtained, and the PSNR and NC exercises are started with the following formula: fitness function = PSNR +
3
NC
(6)
h=1
Furthermore, in the last step, the value of each individual is calculated. Individuals are selected by taking into account the best fitness value and new individual values that are found. Based on these results, (a) the optimal weight is selected.
Medical Image Watermarking Using Slantlet Transform and Particle …
749
Fig. 4 PSO training process
5 Conclusion This is the ability that we can see when using SLT and PSO together. SLT uses better time localization, signal compression, and two zero moments. Then, as an optimization technique, PSO has several advantages, namely it can determine the best location for the embedding process by determining the selection parameters, as well as the rounding error used to determine the weights. The join method applies to sequential processes, including embedding and extracting processes. With SLT and PSO, which outperform DWT or DWT–PSO, namely in terms of maintaining the robustness of the image. This study suggests the application of the watermarking method with medical images. Only 47.35 dB normal image is compared with the previous study in [6].
750
E. H. Rachmawanto et al.
References 1. Ali RR, Mostafa SA, Mahdin H, Mustapha A, Gunasekaran SS (2020) Incorporating the Markov chain model in WBSN for improving patients’ remote monitoring systems. In: International conference on soft computing and data mining. Springer, Cham, pp 35–46 2. Ali RR, Mohamad KM (2021) RX_myKarve carving framework for reassembling complex fragmentations of JPEG images. J King Saud Univ-Comput Inf Sci 33(1):21–32 3. Shrestha S, Wahid K (2010) Hybrid DWT-DCT algorithm for biomedical image and video compression applications. In: Transform, no Isspa, pp 280–283 4. Zhao M, Dang Y (2008) Color image copyright protection digital watermarking algorithm based on DWT & DCT. In: 2008 4th international conference on wireless communications, networking and mobile computing, pp 1–4 5. Keyvanpour M-R, Merrikh-Bayat F (2011) Robust dynamic block- based image watermarking in DWT domain. Procedia Comput Sci 3:238–242 6. Wang Y-R, Lin W-H, Yang L (2011) A blind PSO watermarking using wavelet trees quantization. In: 2011 international conference on machine learning and cybernetics, pp 1612–1616 7. Selesnick IW (1999) The slantlet transform. IEEE Trans Signal Process 47(5):1304–1313 8. Patil NB, Viswanatha VM, Sanjay Pande MB (2011) Slant transformation as a tool for preprocessing in image processing. Int J Sci Eng Res 2(4):1–7 9. Maitra M, Chatterjee A (2008) A novel scheme for feature extraction and classification of magnetic resonance brain images based on Slantlet transform and support vector machine. In: 2008 SICE annual conference, pp 1130–1134 10. Kundu PK, Chatterjee A, Panchariya PC (2011) electronic tongue system for water sample authentication: a Slantlet-transform-based approach. IEEE Trans Instrum Meas 60(6):1959– 1966 11. Kumar S (2011) Steganography based on Contourlet Transform. J Comput Sci 9(6):215–220 12. Maitra M, Chatterjee A (2008) Hybrid multiresolution Slantlet transform and fuzzy c-means clustering approach for normal- pathological brain MR image segregation. Med Eng Phys 30(5):615–623 13. Hsieh C-T, Lin J-M, Huang S-J (2010) Slant transform applied to electric power quality detection with field programmable gate array design enhanced. Int J Electr Power Energy Syst 32(5):428–432 14. Radhika N, Antony T (2011) Image denoising techniques preserving edges. In: Image processing, pp 1–3 15. Al-helali AHM et al (2009) Slantlet transform for multispectral image fusion. J Comput Sci 5(4):263–269 16. Maitra M, Chatterjee A (2006) A Slantlet transform based intelligent system for magnetic resonance brain image classification. Biomed Signal Process Control 1(4):299–306 17. Chatterjee A, Maitra M, Goswami SK (2009) Classification of overcurrent and inrush current for power system reliability using Slantlet transform and artificial neural network. Expert Syst Appl 36(2):2391–2399 18. Kennedy J, Eberhart R (1995) Particle swarm optimization. Engineering and Technology, pp 1942–1948 19. Ali RR, Al-Dayyeni WS, Gunasekaran SS, Mostafa SA, Abdulkader AH, Rachmawanto EH (2022) Content-based feature extraction and extreme learning machine for optimizing file cluster types identification. In: Future of information and communication conference. Springer, Cham, pp 314–325 20. Aslantas V, Ozer S, Ozturk S (2008) A novel fragile watermarking based on Particle Swarm Optimization. In: 2008 IEEE international conference on multimedia and expo, no 1, pp 269– 272 21. Lin W-h, Horng S-j, Kao T-w, Fan P, Lee C-l (2008) An efficient watermarking method based on significant difference of wavelet coefficient quantization. IEEE Trans Multimed 10(5):746–757
Medical Image Watermarking Using Slantlet Transform and Particle …
751
22. Rao VS, Shekhawat RS, Srivastava VK (2012) A DWT-DCT- SVD based digital image watermarking scheme using particle swarm optimization. In: 2012 IEEE students’ conference on electrical, electronics and computer science, pp 1–4 23. Daamouche A, Melgani F (2009) Swarm intelligence approach to wavelet design for hyperspectral image classification. IEEE Geosci Remote Sens Lett 6(4):825–829
Voice Email for the Visually Disabled Randeep Thind, K. Divya, Sahil Verma, Kavita, Navneet Kaur, and Vaibhav Uniyal
Abstract The increased use of technology and its unlimited possibilities have made it impossible for current generations to take full advantage of Internet technology. As a basic need, email is one of the most extensively utilized aspects of the Internet. Aside from the regular users, the face of a visually impaired person has a hurdle when it comes to surfing the Internet, despite the availability of many screen readers. This paper is therefore intended to provide voice assistance to them. As well as email help for simple but essential apps for everyday use like Calculator, Music, etc. Keywords Speech recognition · Voice-based email · Speech to text · Text to speech · Visually impaired
1 Introduction Visually challenged persons are unable to operate the most widely used messaging platforms in our everyday lives. To make these systems more accessible to visually impaired persons, they include technology such as screen readers, automated voice recognition, text to speech, braille keyboards, and so on. These technologies, however, are not particularly effective for these people since, unlike a regular system, they may not provide the correct response [1, 2].
R. Thind · K. Divya · N. Kaur Department of Computer Science and Engineering, Chandigarh University, Gharuan, India S. Verma (B) · Kavita · V. Uniyal Uttaranchal University, Dehradun, India e-mail: [email protected] Kavita e-mail: [email protected] V. Uniyal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4_60
753
754
R. Thind et al.
The goal of voicemail for the visually impaired is to make it easier and faster to challenge your relationships via email. The system will issue voice orders to the user to conduct certain activities, to which the user will answer. As a result, text to speech and speech to text technologies based on the .net framework are employed here. Also known as auto voice recognition, speech to text transforms speech to text, making email writing easier. The text-to-speech module enables audio output of received mail, with the system reading the sender, topic, and body of the letter. Simultaneously, the goal is to aid with the usage of fundamental apps such as My Computer, Word, Notepad, and so on the visually impaired [3, 4].
2 Literature Review Dasgupta et al. [1] Voice Mail Architecture was developed in desktop and mobile devices for blind people, and the development of computer-based accessibility technologies has brought up a plethora of opportunities for the visually impaired [5]. The vast majority of the world’s population is disabled. He has aided in the development of a virtual environment. It is extremely beneficial for blind persons to have access to Internet applications. However, there is a sizable population of visually challenged persons in several countries; particularly, in the Indian subcontinent, such systems would be ineffective. In this paper, the author discusses and outlines the architecture of the Voice Mail system for blind person to simply and efficiently access emails. It is recommended that blind persons use a voice-based email infrastructure to access their email. The existing technology for blind individuals is not user-friendly since it does not give audio reinforcement while performing material for individuals. Voice recognition, interactive voice response, and mouse click events are all used in the proposed system. The recognizer is also exploited for user verification for possible threats [6]. The preliminary form in this system is enrollment. This form will harvest the user’s entire information and prompt the user to fill in any details that are compelled. The second form is the access list, which permits the user to include a name and password [7]. This is accomplished through the use of voice commands. Additional vocal samples should be taken for voice assurance. After signing in, the user is taken to the mailbox tab. Users can do typical messaging system functions once logged in. Write, Mailbox, Shipped Stuff, and Discard are the system alternatives [8]. Input parameters can be used to switch between them. In contrast to the present message system, the proposed system is based on voice commands. The entire system is predicated mostly on automatic translation commands [9]. Once the whole system has been used, the operation can further trigger an alert to converse specific commands to use the respective services, and the user must speak to this command to access the services. This application makes use of Internet Message Access Protocol (IMAP) [10]. It is a common Internet protocol that email clients use to get email messages from
Voice Email for the Visually Disabled
755
a server through a TCP/IP integration [11]. When you promote the application, the essential operational screen is also the first interface you see. This message is displayed till the client pulls the trigger, at which time the engine will receive additional commands size symbol, which allows users to press anywhere on the screen. Users may then send and view emails using voice commands. The segment primarily includes three techniques: • Speech to text. • Text to speech. • Interactive Voice Response. When a user eventually tries to access the site, he must register using voice commands and can be documented and kept in the database as well. In addition, the user will be assigned a credential. The user can use the message option after logging in. Within this system, Adobe Dreamweaver CS3 was used to create the user interface. The thorough website is mostly concerned with appreciating efficiency [12]. There is also a contact page where the user may get in touch. There is indeed an interaction page where even the users may contribute remarks or ask for assistance if necessary. Overwriting, researchers recommended an email service that is simple enough for blind individuals to use [13–16]. The use of speech-to-text converters, text-to-speech converters, and the Viterbi algorithm is designed. The optimization algorithm engages the methodology in which the system detects the much more feasible word afterward when the user has entered one that matches the guessed word with the particular spoken word. The user must register on the site for the first time. This system makes it possible to reduce certain disadvantages of the current system; the problem of this system is that when the number of mistakes rises, the effectiveness of the Viterbi algorithm diminishes and more space is required. It allows blind people to access email in the voice of the email architecture. The present structure is not blind since it does not give a solid viewpoint on how its material should be interpreted [17]. Speech recognition, interactive voice response, and a mouse click are all used in the proposed system. Additionally, the device is utilized to authenticate the user for added security. In the system, the very first module will be added to the list. This module will gather all of the data from a user and notify them of the information you will be submitting. The second module is the login one. You will need more tests and voices to perform the voice of the check. After completing the registration, the user is forwarded to the mailbox page. After you have logged in, you can use the email system normally. The user can switch between them by using voice commands. On the contrary with previous email system, the paper proposes a system based on voice commands. In essence, the entire system is based on converting numbers into words. Once activated, the system will prompt the user to voice commands to permission to enter the appropriate services. It is vital to declare that this command will work if the user wishes to access the appropriate services. This software takes advantage of the IMAP protocol. This is a typical Internet protocol that an email employs to send
756
R. Thind et al.
an email using TCP/IP. The principal type of activity, the first screen presented from the start of the year [18–22]. The device will begin to hear your voice commands after the user presses a single button on this screen. It is a single full-size button that may be tapped anywhere on the screen. The user can then use voice commands to send an email and read it. The system employs three primary technologies Text-to-speech or Interactive Voice Response is used to transform a number into text. When you visit the website for the first time, it will ask you for the registration using voice commands. Additionally, once you register, the database will retain the user’s audio data as well as a remark. In such a system, the user will be enter a login id and password then after log in we can access an email. The user interface was created using Adobe Dreamweaver CS3 software. The site’s emphasis is on quality and productivity. There is also a “Contacts” page where the user may send any recommendations or requests for assistance. At this moment, several conveniently accessible E-Systems are available for the blinds. You may take advantage to the Viterbi method, as well as the voice-to-text and text-to-speech converters. As soon as the user speaks, it is pronounced as your suggested word for a specific phrase. The customer creates an account on the site that they are visiting for the first time. A system for the blind and uneducated is being developed to offer a way to improve their email system interaction; this approach eliminates the need for IVR technology. Screen readers and Braille keyboards are used. We made advantage of the conversion of speech to text and text to speech. Sound other than that, instruction is to be utilized in variety of purposes. 1. Messages from the Gmail system, referring to the sender’s email address. 2. Really Simple Syndication (RSS) is a type of news syndication that is very easy to use. 3. Now let us enjoy some music together. 4. The system’s reader and red book. 5. Use the bridge’s browser to look for your discs and files. It is a program infrastructure built for the blind that allows simple access to the operating system’s email and MMS messaging functionalities. Voice instructions and a mouse can be used to create a graphical user interface design, but the keyboard is required. RSS feeds are also used in conjunction with an email to deliver a list of headlines, as well as updates on innovative products. In addition, specialists also created program to blind. Other applications, in addition to email, can be accessed with a voice command. To combat the attractiveness and simplicity of email-based activities, the authors offer Tetra-Entry, a blind-friendly email client.
Voice Email for the Visually Disabled
757
3 Methods and Materials 3.1 Usability Issues for Visually Impaired Users User experience is a larger concept that is connected to real usability. Accessibility refers to users with impairments who can theoretically access technology. This research study is concerned with usability. When integrating technology and software, visually challenged persons confront several usability issues. Most software requires them to use assistive technology tools such as screen readers. A screen reader (such as JAWS or Window-Eyes) is used to read the image content on the computer screen for visually impaired users 198 Wentz, Rochester, and Lazar. Floating letters and floating magnates are another way for blind persons to use the software. The issue with floating devices is that they are typically the cost and rate of floating among extremely low blind persons (approximately 10–20% in the USA). Computer dissatisfaction impacts the capacity to accomplish a task, which might have an impact on the environment of blind users. It is also known that they have more than just the ability to avoid data, doing so will lead them to cause accessibility concerns, such as those frequently posed by Web dynamics. Blind people are frequently compelled to find a solution to ignore to finish a certain activity. A well-illustrated web optical character recognition or user annoyance study indicates mislabeled links and forms, and alt text is missing as one instance of the accessibility obstacles users confront.
3.2 Potential Email Problems for Visually Impaired Users While visitors can visually check their inboxes and discard irritating or unnecessary emails, visually impaired users should listen to their emails. Spam may also be a security risk because it is a popular carrier of viruses and worms. The obvious first line of defense against spam is to utilize active spam filtering software. The biggest disadvantage of using a spam filter is that, by definition (email filtering), it is still feasible to recognize fake authentication messages and unwelcome emails. It has been discovered that visually challenged users utilize a high level of spam filtering, capable of screening out valid inbound emails sent with Bcc. In studies over 10 years while blind visitors can visually scan and discard irritating or unnecessary emails in their inboxes, visually impaired users should listen to their email via email. Spam may also be a security risk because it is one of the most prevalent vectors for viruses and worms. The obvious primary option for dealing with spam is to utilize active spam filtering software. The biggest disadvantage of using a spam filter is that, by definition (email filtering), it is still feasible to discover bogus authentication messages and the negatives of unwelcome emails. It has been shown that visually challenged users tend to utilize a high level of spam filtering, being able to filter out valid inbound emails sent via Bcc.
758
R. Thind et al.
3.3 Existing System The mail services available today are not used by blind visitors. This is because these devices are useless to them because they cannot give any audible feedback play content to them. Since they cannot visualize the things displayed on the screen, they find it difficult to perform specific actions such as clicking. Although screen readers are available, they cause difficulty for them. Screen readers are reading what is on the screen to them, and to react, they must use the keyboard. To accomplish this, the user must be aware of the location of the keys on the keyboard. As a result, someone who has never used a computer before will be unable to utilize such a system. Cons: Users must use a mouse to connect to the computer and must perform mouse clicks to send and receive emails. In the existing system, they chose the web user interface as the interface for the system that is not easy to use for the disabled.
3.4 Proposed System Another very critical element in building this system is ensuring that user privacy is not jeopardized by including speech recognition for user authentication. Another key characteristic is that users of this system do not require any advanced knowledge userfriendly. The system will also ask the user what action to do on a regular schedule, making it easy for the user because there will be no need to recall the processes. The proposed alternative uses voice recognition to authenticate the user who has already signed into the messaging system. The automated diagnostic procedure depending on Personal Information stored in the waves of voice is known as speech processing. It has two sessions scheduled. Proceed to the task transition stage or information sheet initially and then on to the second. Revert toward the period process or application form first, then to the administrative period or test phase. During the training phase, individuals said that their vocal frequencies must be offered so that the program may generate or establish a reference model for this person. A specific speaker level is also determined from training samples in the case of the speaker verification system. The input voice is compared against the stored reference model(s) during the testing phase, and a recognition decision is made. Following input from the user’s mic, speech analysis is conducted. The system is designed to manipulate the input audio stream using MFCC feature extraction. At several stages, various actions on the input signal are done, such as pre-emphasis, framing, window closure, Mel Cepstrum analysis, and spoken word identification. The speech algorithm is divided into two steps. In the starting training sessions, the second is known as the operation session or DTW algorithm-based testing phase based on
Voice Email for the Visually Disabled
759
dynamic programmers. This algorithm includes measuring similarity between two series of time that can be changed over time or at speeds, meaning speaking. For the purpose of feature communication. If a sequence of times may be “deformed” nonlinearly by expanding or narrowing its time axis, this approach has also employed the ideal alignment between twice. This distortion between the two chronic strings may then be applied to detect the similarity between strings twice or to locate the equivalent regions between such a string of twice (Fig. 1). Pros: • No mouse click event is required to send and receive emails. • Based only on voice commands given by users. • Chatbots are used to smooth the conversation and are closer to human response.
Fig. 1 Flowchart of the system
760
R. Thind et al.
3.5 Design Developing a User Interface. Our user interface is merely a stand-alone system (desktop-based application) that is much easier to access than any other application or website. Then, you will see the stand-alone application.
3.6 Implementing Databases Our system can keep user validation in a database. We have constructed a table with includes the following elements: name, email address, password, and personal keyword as well as qualities. These particulars are saved in a database. (PhpMyAdmin) is used to validate users.
3.7 Design of the System Our system is based on the use of voice. When the user initial opens, it will send a separate message for that application about to be use all of the options those are listed in the system. While sending the email, for example, there is one feature provided in this email for reading all mails, reading unseen mails etc. Using voice command, we must select an option from the available alternatives (Fig. 2). Fig. 2 Design flow of the system
Voice Email for the Visually Disabled
761
Fig. 3 Registration page
4 Implementation 4.1 Registration The system’s first module is registration. Anyone wishing to take advantage of this system must first register to get a login id and pass. By asking the user, it will collect all of the user’s information with what information they need to enter. The user can then log in to the system after completing the registration process. The user must recite the details, and these details will be kept in the registry for language modeling. If the information is wrong, the user can re-enter it; otherwise, the prompt will describe the action to be performed to validate the information (Fig. 3).
4.2 Login When a user requests access to his or her account, this module will run an authentication check. It will accept speech-based usernames and passwords and modify them to text. After that the text will be used to verify the identities. When a customer verified as genuine, he or she is forwarded to their main page. Speaker identification is accomplished through feature extraction, identification, and matching. The system can only be accessed by registered users (Fig. 4).
4.3 Compose Perhaps that is the case the most crucial aspect of the total operation. When visitors hit up the module, the visitors may compose email that will be sent. The key
762
R. Thind et al.
Fig. 4 Login page
Fig. 5 Compose page
problem with the current system and ours is that unlike most other systems, ours will demand recording and transduce the recorded mp3 message toward the other end as an attachment. The mouse clicks action necessary to operate the system regularly will be recalled to the user (Fig. 5).
4.4 Inbox This feature helps to display all of the emails sent to the account. Once the customer decides to choose the feature, the application will continually alert the visitor as to choose the activity required to traverse the website and manipulate the received message. The user might also choose to delete the emails he or she has received. The trash section will be used to store the deleted emails (Fig. 6).
Voice Email for the Visually Disabled
763
Fig. 6 Inbox page
Fig. 7 Sending mail
4.5 Send Mail As a command, the user must say the option/send an email. The user must say their keyword to log in. The chatbot then requests the recipient’s email address, subject, and message. After that, the chatbot speaks the phrase “Mail Sent Successfully.” (Fig. 7).
4.6 Reading Unseen Mails The option/reading must be stated by the user. A user must say their keyword to log in. The chatbot then reads all of the unread emails and asks the user whether or not they want to respond. As an input command, the user must type YES/NO.
764
R. Thind et al.
Fig. 8 Trash
4.7 Reading All Mails The option/reading must be stated by the user. A user must say their keyword to log in. The chatbot then reads all of the messages and asks the user whether they want to move on to the next one. As an input command, the user must type YES/NO.
4.8 Trash This area will save any emails that the user deletes. Emails can be deleted from both the inbox and sent email folders. This feature assists you to retrieve emails that were previously the operator erased, yet they are now necessary (Fig. 8).
5 Authentication Authorization provides consumers with account information such as pass and log in, providing that the consumer always has the accurate password and username when signing in to the app. As a consequence, this data is going to be stored in a database and compared in a future perspective. The control system will be used to identify the user for identification. Saving the traces of login detail might be risky: Retaining a login straight may be problematic, while also showing them how to build a table in the database. Whenever the client accesses authentication, the server is summoned to check the current load and save the username and password. After that, the data will be forwarded to the password saved in the database (Fig. 9).
Voice Email for the Visually Disabled
765
Fig. 9 a, b Authentication
6 Result To assess and understand the performance of alternative workflow models for object detection based on background subtraction technique, this research project set up clouds with 20 virtual machines, each with dual-core processing elements. Each virtual computer has 512 MB of RAM and is connected at 1 Mbps. Five tasks make up the backdrop for subtraction-based object detection. The first step is to take a picture of the scene and divide it into frames as input. The background model is created from the single initial acquired image by the second task. The final step is subtracting future frames from the movie from the background model. To increase the quality of the removed image, this project uses Sobel filtering as a next task. The image is finally recognized (Figs. 10, 11, 12, and 13).
7 Perspective on the Future Emailing is not a big deal for individuals who can see, but it is a big deal for those who cannot see since it interacts with so many professional responsibilities. This voicebased email system is beneficial to blind people since it helps them to comprehend their environment. If the pointer moves to the Register symbol on the page, for example, the sound will be “Register Button.” There are several screen readers to pick from. Individuals, on the other hand, must keep track of their mouse clicks. Instead, because the mouse cursor will tell you where you are, this project will solve
766 Fig. 10 Result
Fig. 11 Task and its time of execution
Fig. 12 Experimental result
R. Thind et al.
Voice Email for the Visually Disabled
767
Fig. 13 Turn around versus waiting time
the problem. This system is intended to be user-friendly for a wide variety of users, including the general public, people who are blind or illiterate. This system is capable of email access and spam email functionalities in any language. This method may also be enhanced to transmit an attachment, which is very beneficial for people who have poor vision. It may be availed to all inhabitants in the geographic area and widely spread and will remain to be offered in additional languages; this program is easy to use. Consequently, the system makes use of sign language, which might be expanded to make it more scalable and reliable. Pros: 1. Visually challenged persons are thrashed in their disabilities. 2. This system gives disabled persons the impression that they are regular users. 3. They may listen to the most recent emails in their inbox, and IVR technology shows to be quite useful in terms of assistance for them. 4. People who are blind or visually handicapped can progress from one level to the next.
8 Conclusion According to this document, the development of persons with disabilities on the fringes of the hamlet will benefit the community. This initiative allows people with vision problems to participate in the development of Digital India and to communicate more easily through the Internet. This strategy is demonstrated when how to send and receive an email removes many of the limitations that people have. The developers may be influenced by the project’s success, motivating them to produce helpful items that can assist persons with low vision or who are blind. Because blind persons are unable to use the Internet and its features, voice commands have been developed for them. We were successful in receiving unseen emails and providing the sender’s mail id, subject, and message as voice output. We
768
R. Thind et al.
successfully developed text-to-speech and speech-to-text modules, as well as implementing a chatbot to facilitate effective communication between the user and the system; it can not only establish email conversation but also respond to the user’s questions. We have also designed a registration module to make it easier for the system relies on voice abilities, and this mail system will assist visually impaired people in overcoming all small issues. This will minimize the customer’s cognitive burden associated with learning keyboard shortcuts, as well as the software load associated with using screen readers and automated speech recognizers, and the system will walk the user through the process IVR for what operation should be performed to attain the desired outcomes, making the system much easier to use the present system will only work with desktop computers. Because the use of mobile phones is becoming more popular, there is potential for this service to be included as a mobile phone application as well. To make the system safer, security elements that are deployed during the login phase can also be altered.
References 1. Dasgupta T et al (2012) VoiceMail architecture in desktop and mobile devices for the Blind people. In: 2012 4th international conference on intelligent human computer interaction (IHCI). IEEE 2. Kaur N et al (2018) A survey of routing protocols in wireless sensor networks. IJET 7(4.12):20– 25. ISSN 2227-524X 3. Gudadhe A, Parbat A et al (2018) Voice e-mail. Int J Recent Eng Res Dev (IJRERD) 03(04):37– 40 4. Gupta R et al (2018) A comparative analysis of trust based applications in wireless sensor networks. Int J Eng Technol 7(4.12):73–77 5. Shanker R et al (2018) Analysis of information security service for internet application. Int J Eng Technol 7(4.12):58–62 6. Kavita et al (2018) Implementation and performance evaluation of AODV-PSO with AODVACO. Int J Eng Technol 7(2.4):23–25 7. Chandini et al (2020) A canvass of 5G network slicing: architecture and security concern. IOP Conf Ser Mater Sci Eng 993:012060 8. Shabana T et al (2015) Voice based email system for blinds. Int J Adv Res Comput Commun Eng 4(1) 9. Gaba S et al (2022) Clustering in wireless sensor networks using adaptive neuro fuzzy inference logic. In: Security handbook. CRC Press, USA 10. Shanker R et al (2022) Efficient feature grouping for IDS using clustering algorithms in detecting known/unknown attacks. In: Security handbook. CRC Press, USA 11. Agrawal S, Mishra P (2012) Authentication of speakers using Mel frequency cepstral coefficient and vector quantization. Int J Sci Eng Res 3(8) [Online] 12. Dash S, Verma S et al (2022) Guidance image-based enhanced matched filter with modified thresholding for blood vessel extraction. Symmetry 14(2):194. https://doi.org/10.3390/sym140 20194 13. Harshasri M, Bhavani MD, Ravikanth M (2021) Voice based email for blind. Int J Innov Res Comput Sci Technol (IJIRCST). ISSN 2347-5552 14. Klensin J et al (1993) SMTP service extensions. No. rfc1425 [online] 15. Mamatha A et al (2020) Voice based e-mail system for visually impaired. Int J Res Eng Sci Manage 3(8):51–54
Voice Email for the Visually Disabled
769
16. Dudhbale P, Narawade PS, Wakhade JS (2018) Desktop and mobile devices with a voice-based system for people who are blind. Sci Res Sci Technol [Online] 17. Rani P et al (2022) Robust and secure data transmission using artificial intelligence techniques in ad-hoc networks. Sensors 22(1):251. https://doi.org/10.3390/s22010251 18. Khedekar R, Gupta S, Voice based email system for blinds 19. Shanmathi R, Shoba G, Anusha G, Jeevitha V, Shoba G (2014) Visually impaired interactive email. Int J Adv Res Comput Commun Eng (IJARCCE) 3(1):5089–5092 [Online] 20. Bajaj R, Sharma V (2018) Smart Education with artificial intelligence-based determination of learning styles. Procedia Comput Sci 132:834–842 21. Suresh A et al (2016) Voice based email for blind. Int J Sci Res Sci Eng Technol (IJSRSET) 2:93–97 22. Sawant S et al (2018) Speech based e-mail system for blind and illiterate people. Int Res J Eng Technol (IRJET). e-ISSN 2395-0056
Author Index
A Aasha Nandhini S., 99 Abhigya Verma, 215 Abhishek Dwivedi, 329 Abhishek Tandon, 397 Aditya Naidu Kolluru, 303 Adrita Chakraborty, 283 Ahmed Alkhayyat, 411, 423, 445, 455 Akshat Ajay, 455 Aman Bahuguna, 713 Amanpreet Kaur, 727 Amita Jain, 193 Amit Kumar, 205 Ananapareddy V. N. Reddy, 483 Anjali Rajak, 53 Anu G. Aggarwal, 397 Anuj Agrawal, 293 Anu Saini, 267 Apeksha Mittal, 13 Apratim Shrivastav, 149 Arghyadeep Ghosh, 237 Aritra Nandi, 237 Arjun Thakur, 383 Arpit Namdev, 123 Ashok Reddy, P., 587 Asmita Hobisyashi, 237 Astha Jain, 267 Ayush Chaurasia, 329 Azhar Ashraf, 693, 713
B Babita Rawat, 693 Babu Reddy, M., 587 Basha, S. Mahabub, 137
Bharathi Mohan, G., 99 Bharath, M. S., 317 Bhavana Vennam, 357 Bhavya Dua, 435
C Chandra Prakash, 1 Chandra Shekhar Rai, 177 Chereddy Spandana, 99 Chitra Shashidhar, 435 Christy Atika Sari, 741
D Debahuti Mishra, 511 Deepak Gupta, 411 Deepak Prakash, 157 Deepali Virmani, 329 Deepa V. Jose, 81 Devender Kumar, 157 Dimple Sethi, 1 Dinesh Kumar, G., 587 Divya, K., 753 Duggi Manikanta Reddy, 527
E Eko Hari Rachmawanto, 741
F Faisal Ahmad, 667 Faraz Hasan, 667
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 537, https://doi.org/10.1007/978-981-99-3010-4
771
772 G Ginjupalli Raviteja, 555 Goberu Likhitha, 527 Gudivada Sai Priya, 555 Gurija Akanksha, 571
H Harshada, L., 483 Harsh Vardhan Pant, 29 Himansh Gupta, 727 Himanshu Sharma, 471 Hsiu Chun Hsu, 317
I Indrakant Dana, 455 Ippatapu Venkata Srisurya, 99 Ishu Kumar, 227
J Jalluri Geetha Renuka, 527 Janani, S., 137 Jayapriya, J., 39 Jeyalakshmi, V. S., 39 Jinsi Jose, 81
K Karuna Middha, 13 Kaushiki Kriti, 435 Kavita, 693, 713, 727, 753 Kavita Sharma, 471 Kavya, A., 483 Krishnan, N., 39 Krithika, S., 99
L Lahib Nidhal Dawd, 741 Laila Almutairi, 637 Lakshmi Sai Srikar Vadlamani, 149
M Mahabaleshwar Kabbur, 63 Malvika Madan, 423 Mamta Madan, 677 Manasi Gyanchandani, 91 Manoj Chandra Lohani, 29 Manoj Kumar Mishra, 657 Marcello Carvalho Reis dos, 435 Meenu Dave, 677
Author Index Meenu Vijarania, 111 Minni Jain, 193 Mitanshi Rastogi, 111 Mohammad Imran, 667 Mohammad Shahid, 667 Mohammed Sadhak, 571 Mohammed Ahmed Jubair, 741 Mukesh Dhetarwal, 693 Mukesh Kumar, 169
N Naeem Th. Yousir, 543 Nandini Baliyan, 215 Narasimha Rao, B., 483 Narayan Kumar, 205 Navneet Kaur, 753 Neeraj Kumar Srivastava, 657 Neha Goel, 111 Norfar Ain Mohd Fuzi, 499 Nupur Sudhakar, 157
P Phaneendra Kanakamedala, 587 Polani Veenadhari, 555 Pooja Gera, 215 Pooja Manral, 253 Poonam Negi, 713 Poonam Rani, 157 Poonam Rawat, 727 Prabhat Kumar, 293, 471 Pradip Kumar Sahu, 511 Prafull Pandey, 657 Pragya Pranjal, 423 Prapti Patra, 245 Prasanjeet Singh, 293 Prasanna Kumar, R., 99 Pratima Chaudhury, 445 Priyadharshini, A. R., 99
R Rabei Raad Ali, 499, 741 Rajeswara Rao Duvvada, 357 Rajni Jindal, 149, 193 Rakesh Tripathi, 53 Randeep Thind, 753 Rebanta Chakraborty, 227 Reem M. Abdullah, 703 Ritu Bibyan, 397 Rohit Anand, 123, 137 Rohith, B., 483 Roopchand Reddy Vanga, 317
Author Index Rozaida Ghazali, 499
S Saahil Mallick, 423 Sahil Verma, 293, 693, 713, 727, 753 Sai Pavithra Nandyala, 357 Salama A. Mostafa, 499, 543 Saleh Al Sulaie, 617 Sameer Anand, 397 Sanjiv Kumar Jain, 123 Sankalp Chordia, 383 Sanyam Shukla, 91 Sara A. Althubiti, 599 Sarvani Anandarao, 555 Satuluri Naganjaneyulu, 571 Satyavir Singh, 371 Saurabh Rastogi, 455 Saurabh Shrivastava, 91 Seeja, K. R., 253 Sejal Sahu, 227 Shahanawaj Ahamad, 123, 137 Shaik Shaheeda, 571 Shashwath Suvarna, 303 Shashwat Sinha, 445 Shaymaa Mohammed Abdulameer, 543 Shipra Swati, 169 Shivam Yadav, 237 Shiv Ram Meena, 177 Shravani Nalbalwar, 383 Shreya, J. L., 267 Shubham, 157 Shubham Shah, 383 Shweta Singhal, 215 Sibasish Choudhury, 283 Siddharth Dubey, 293 Sindhu, C., 303, 317 Sivarathinabala, M., 137 Smaraki Bhaktisudha, 423 Sonakshi Vij, 329 Sourabh Bharti, 1 Sourav Chakraborty, 245 Sreekant Nair, 303
773 Srinivasa Sesha Sai, M., 587 Sundos A. Hameed Alazawi, 703 Sunita Kumari, 267 Supratik Dutta, 283 Suryansh Bhaskar Talukdar, 123 Sushruta Mishra, 227, 237, 245, 283, 411, 423, 435, 445 Swati Jadhav, 383 Swaty Dash, 511 Swayam Verma, 445 Sweana Vakkayil Seaban, 81
T Tridiv Swain, 411
U Udit Agarwal, 455 Umar Farooq Khattak, 499 Usmani, M. Ahmer, 137
V Vadivu, G., 317 Vaibhav Malpani, 91 Vaibhav Uniyal, 753 Vamsi Krishna Modala, 527 Vani Nijhawan, 677 Veera Talukdar, 123 Victor Hugo C. Albuquerque de, 435 Vidhi Karnwal, 329 Vijaya Kumari Majji, 357 Vikas Chaudhary, 227, 237, 245, 283 Vikas Mishra, 657 Vinayaka Murthy, M., 63 Vinesh Raj, 81 Vishal Kumar, 341 Vishisht Ved, 245
W Wisam Subhi Al-Dayyeni, 741