141 78 12MB
English Pages 351
Lecture Notes in Networks and Systems 832
Om Prakash Verma Lipo Wang Rajesh Kumar Anupam Yadav Editors
Machine Intelligence for Research and Innovations Proceedings of MAiTRI 2023, Volume 1
Lecture Notes in Networks and Systems Volume 832
Series Editor Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Om Prakash Verma · Lipo Wang · Rajesh Kumar · Anupam Yadav Editors
Machine Intelligence for Research and Innovations Proceedings of MAiTRI 2023, Volume 1
Editors Om Prakash Verma Department of Instrumentation and Control Engineering Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Punjab, India Rajesh Kumar Department of Electrical Engineering Malaviya National Institute of Technology Jaipur, Rajasthan, India
Lipo Wang School of Electrical and Electronic Engineering Nanyang Technological University Singapore, Singapore Anupam Yadav Department of Mathematics Dr. B. R. Ambedkar National Institute of Technology Jalandhar, Punjab, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-8128-1 ISBN 978-981-99-8129-8 (eBook) https://doi.org/10.1007/978-981-99-8129-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
Preface
This book is the result of the discussion at the three-day International Conference on MAchine inTelligence for Research and Innovations (MAiTRI-2023) held at Dr. B. R. Ambedkar National Institute of Technology Jalandhar to provide a learnerfriendly text to promote the research advancement in the theory and realization of MAchine inTelligence covering topics such as Machine Learning, Deep Learning, Quantum Machine Learning, Real-Time Computer Vision, Pattern Recognition, Natural Language Processing, Statistical Modeling, Autonomous Vehicles, Human Interfaces, Computational Intelligence, and Robotics. Applications of MAchine InTelligence for Research and Innovations include the development and demonstration of Intelligent and Autonomous Robots (IAR), Unmanned Aerial Vehicle (UAV), Autonomous Underwater Vehicle (AUV), Autonomous Water Surface Vehicles (ASVs), Autonomous Land Vehicles (ALV), Swarm Robots, Humanoid Robots, and Autonomous Household Robots. This book provides a synergistic environment and platform to all the young entrepreneurs, researchers, and industry and academia people to utilize the recent advancements in computational approaches, intelligent techniques, and hardware Internet, especially using MAchine inTelligence. Finally, this book is a step ahead with the key focus on developing skilled manpower and R&D Thrust to create skilled manpower in the field of MAchine inTelligence. Jalandhar, India Singapore Jaipur, India Jalandhar, India
Om Prakash Verma Lipo Wang Rajesh Kumar Anupam Yadav
v
Contents
A Simple Algorithm to Secure Data Dissemination in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Veeramani Sonai and Indira Bharathi Analysis of Pollard Rho Attacks Over ECDLP . . . . . . . . . . . . . . . . . . . . . . . Aayush Jindal, Sanjay Kumar, and Aman Jatain
1 11
Modelling Networks with Attached Storage Using Perfect Italian Domination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agnes Poovathingal and Joseph Varghese Kureethara
23
Application of Varieties of Learning Rules in Intuitionistic Fuzzy Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. John Robinson and A. Leonishiya
35
Automated Tool for Toxic Comments Identification on Live Streaming YouTube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tuhin Tarafder, Harsh Kumar Vashisth, and Mamta Arora
47
Directional Edge Coding for Facial Expression Recognition System . . . . Pagadala Sandya, K. Venkata Subbareddy, L. Nirmala Devi, and P. Srividya A Cascaded 2DOF-PID Control Technique for Drug Scheduling of Chemotherapy System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bharti Panjwani, Vijay Mohan, Himanshu Gupta, and Om Prakash Verma Distinguishing the Symptoms of Depression and Associated Symptoms by Using Machine Learning Approach . . . . . . . . . . . . . . . . . . . . Akash Nag, Atri Bandyopadhyay, Tathagata Nayak, Subhanjana Banerjee, Babita Panda, and Sanhita Mishra
57
71
81
vii
viii
Contents
Harnessing the Power of Machine Learning Algorithms for Landslide Susceptibility Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shivam Krishana, Monika Khandelwal, Ranjeet Kumar Rout, and Saiyed Umer
95
The Effectiveness of GPT-4 as Financial News Annotator Versus Human Annotator in Improving the Accuracy and Performance of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Satyajeet Azad Machine Learning Method for Analyzing and Predicting Cardiovascular Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Yogendra Narayan, Mandeep Kaur Ghumman, and Charanjeet Gaba Rule-Based Learner Competencies Predictor System . . . . . . . . . . . . . . . . . 133 Priyanka Gupta, Deepti Mehrotra, and Sunil Vadera Exploring the Relationship Between Digital Engagement and Cybersecurity Practices Among College Students: A Survey Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Farha Khan, Shweta Arora, Saurabh Pargaien, Lata Pande, and Kavita Khati Secure and Energy Efficient Routing in VANETs Using Nature Inspired Hybrid Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Gurjot Kaur and Deepti Kakkar Performance Evaluation of Machine Learning Models for Intrusion Detection in Wireless Sensor Networks: A Case Study Using the WSN DS Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Aryan Rana, Sunil Prajapat, Pankaj Kumar, and Kranti Kumar Arduino Controlled 3D Object Scanner and Image Classification . . . . . . 181 Amoli Belsare, Sahishnu Wankhede, Nitin Satpute, Ganesh Dake, Vedang Kali, and Vishal Chawde Anomaly Detection for IoT-Enabled Kitchen Area Network Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Mohd Ahsan Siddiqui, Mala Kalra, and C. Rama Krishna Character-Level Bidirectional Sign Language Translation Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 K. Rajeswari, N. Vivekanandan, Sushma Vispute, Shreya Bengle, Anushka Babar, Muskan Bhatia, and Sanket Annamwar Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Bhavesh Bhagat and Mohit Dua
Contents
ix
Random Forest (RF) Assisted and Support Vector Machine (SVM) Algorithms for Performance Evaluation of EDM Interpretation . . . . . . . . 233 Vivek John, Ashulekha Gupta, Saurabh Aggarwal, Kawerinder Singh Sidhu, Kapil Joshi, and Omdeep Gupta COVID-19 Classification of CT Lung Images Using Intelligent Wolf Optimization Based Deep Convolutional Neural Network . . . . . . . . 245 Om Ramakisan Varma and Mala Kalra Parallelization of Molecular Dynamics Simulations Using Verlet Algorithm and OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Preksha Mathur, Hiteshwar Kumar Azad, Sai Harsha Varma Sangaraju, and Ekansh Agrawal Prediction of HDFC Bank Stock Price Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Yogesh Gupta Real-Time Applicability Analysis of Lightweight Models on Jetson Nano Using TensorFlow-Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Kamath Vidya, A. Renuka, and J. Vanajakshi An Efficient Fog Computing Platform Through Genetic Algorithm-Based Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Shivam Chauhan, Chinmaya Kumar Swain, and Lalatendu Behera Development of a Pixhawk-Based Quadcopter: A Bottom-Up Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Anuranjan Mishra, Varun Chitransh, Jitendra Kumar, and Navneet Tiwari Inattentive Driver Identification Smart System (IDISS) . . . . . . . . . . . . . . . 323 Sushma Vispute, K. Rajeswari, Reena Kharat, Deepali Javriya, Aditi Naiknaware, Nikita Gaikwad, and Janhavi Pimplikar Convolutional Neural Network in Deep Learning for Object Tracking: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Utkarsh Dubey and Raju Barskar
Editors and Contributors
About the Editors Dr. Om Prakash Verma is currently serving as Assistant Professor in the Department of Instrumentation and Control Engineering, Dr. B. R. Ambedkar NIT Jalandhar. His research interests includes: Machine Vision, Machine, Deep and Quantum Learning, Applied Soft-Computing and UAV Autonomous System. He has credit for publishing more than 90+ research publications including international peer-reviewed SCI journals, patent applications, edited books, conferences and book chapters. He has associated with six research projects as PI and Co-PI funded by various funding agencies such as ISRO, MeitY, CSIR, etc. He has supervised two Ph.D. and currently supervising three Ph.D. students. He is a Senior Member of IEEE, and Member of IEEE Computational Intelligence Society, IEEE Control Systems Society, Automatic Control and Dynamic Optimization Society and life time member of Instrument Society of India and STEM Research Society. He is an associate editor of the International Journal of Security and Privacy in Pervasive Computing. Dr. Lipo Wang is presently on the faculty of the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. His research interest is artificial intelligence with applications to image/video processing, biomedical engineering, communications, control, and power systems. He has 350+ publications, a US patent in neural networks and a patent in systems. He has co-authored two monographs and (co-)edited 15 books. He is/was Associate Editor/Editorial Board Member of 30 international journals, including four IEEE Transactions, and guest editor for 15 journal special issues. He was a member of the Board of Governors of the International Neural Network Society, IEEE Computational Intelligence Society (CIS), and the IEEE Biometrics Council. He served as CIS Vice President for Technical Activities and Chair of Emergent Technologies Technical Committee, as well as Chair of Education Committee of the IEEE Engineering in Medicine and Biology Society (EMBS). He was President of the Asia-Pacific Neural Network Assembly
xi
xii
Editors and Contributors
(APNNA) and received the APNNA Excellent Service Award. He was founding Chair of both the EMBS Singapore Chapter and CIS Singapore Chapter. Dr. Rajesh Kumar is working as a Professor in the Department of Electrical Engineering, MNIT, Jaipur. His research interests focus on Intelligent Systems, Machine Intelligence, Power Management, Smart Grid and Robotics. He has published over 550 research articles, has supervised 25 Ph.D. and more than 35 M.Tech. thesis. He has 14 patents to his name. He received three academic awards, 12 best paper awards, six best thesis award, four professional awards and 25-student award. He has received the Career Award for Young Teachers in 2002 from Government of India. He has been Associate Editor of IEEE Access, IEEE ITeN, Swarm and Evolutionary Computation, Elsevier, IET Renewable and Power Generation, IET Power Electronics, International Journal of Bio Inspired Computing and CAAI Transactions on Intelligence Technology, IET. He is a Senior Member IEEE (USA), Fellow IET (UK), Fellow IE (India), Fellow IETE, Life Member CSI, Senior Member IEANG and Life Member IST. Dr. Anupam Yadav is an Associate Professor at the Department of Mathematics, Dr. B. R. Ambedkar National Institute of Technology Jalandhar, India. His research area includes numerical optimization, soft computing, and artificial intelligence, he has more than ten years of research experience in the areas of soft computing and optimization. Dr. Yadav has done Ph.D. in soft computing from the Indian Institute of Technology Roorkee and he worked as a research professor at Korea University. He has published several research articles in journals of international repute. Dr. Yadav has authored a textbook entitled An Introduction to Neural Network Methods for Differential Equations. He has edited several books which are published by AISC, LNDECT Springer Series. Dr. Yadav was the General Chair, Convener and member of the steering committee of several international conferences. He is a member of various research societies.
Contributors Saurabh Aggarwal Department of Mechanical Engineering, Uttaranchal Institute of Technology, Uttaranchal University, Dehradun, India Ekansh Agrawal School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Sanket Annamwar Computer Engineering Department, Pimpri Chinchwad College of Engineering, SPPU, Pune, Maharashtra, India Mamta Arora Manav Rachna University, Faridabad, Haryana, India Shweta Arora Graphic Era, Hill University Bhimtal Campus, Nainital, Uttarakhand, India
Editors and Contributors
xiii
Hiteshwar Kumar Azad School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Satyajeet Azad AI Consultant, Excelinnova Consultancy Services Pvt. Ltd., New Delhi, India Anushka Babar Computer Engineering Department, Pimpri Chinchwad College of Engineering, SPPU, Pune, Maharashtra, India Atri Bandyopadhyay School of Computer Engineering, KIIT, Bhubaneswar, Odisha, India Subhanjana Banerjee School of Computer Engineering, KIIT, Bhubaneswar, Odisha, India Raju Barskar Department of Computer Science and Engineering, University Institute of Technology, Bhopal, Madhya Pradesh, India Lalatendu Behera Department of Computer Science and Engineering, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India Amoli Belsare Department of Electronics and Telecommunication Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India Shreya Bengle Computer Engineering Department, Pimpri Chinchwad College of Engineering, SPPU, Pune, Maharashtra, India Bhavesh Bhagat National Institute of Technology, Kurukshetra, India Indira Bharathi School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamilnadu, India Muskan Bhatia Computer Engineering Department, Pimpri Chinchwad College of Engineering, SPPU, Pune, Maharashtra, India Shivam Chauhan Department of Computer Science and Engineering, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India Vishal Chawde Department of Electronics and Telecommunication Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India Varun Chitransh IIT(BHU), Varanasi, India Ganesh Dake Department of Electronics and Telecommunication Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India Mohit Dua National Institute of Technology, Kurukshetra, India Utkarsh Dubey Department of Computer Science and Engineering, University Institute of Technology, Bhopal, Madhya Pradesh, India Charanjeet Gaba Department of CSE, Chandigarh University, Mohali, Punjab, India
xiv
Editors and Contributors
Nikita Gaikwad Department of Computer Engineering, Pimpri Chinchwad College of Engineering, SPPU, Pune, India Ashulekha Gupta Department of Management Studies, Graphic Era (Deemed to be University), Dehradun, India Himanshu Gupta Department of Computer Science, ABES Institute of Technology, Ghaziabad, India Omdeep Gupta School of Management, Graphic Era Hill University, Dehradun, India Priyanka Gupta AIIT, Amity University, Noida, Uttar Pradesh, India Yogesh Gupta School of Engineering and Technology, BML Munjal University, Gurugram, India Aman Jatain Amity University Haryana, Gurugram, India Deepali Javriya Department of Computer Engineering, Pimpri Chinchwad College of Engineering, SPPU, Pune, India Aayush Jindal Amity University Haryana, Gurugram, India Vivek John Department of Mechanical Engineering, Uttaranchal Institute of Technology, Uttaranchal University, Dehradun, India Kapil Joshi Department of Computer Science and Engineering, Uttaranchal University, Dehradun, India Deepti Kakkar Dr. B.R. Ambedkar National Institute of Technology, Jalandhar, India Vedang Kali Department of Electronics and Telecommunication Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India Mala Kalra Computer Science and Engineering Department, National Institute of Technical Teachers Training and Research, Chandigarh, India Gurjot Kaur Dr. B.R. Ambedkar National Institute of Technology, Jalandhar, India Mandeep Kaur Ghumman Department of ECE, Chandigarh University, Mohali, Punjab, India Farha Khan Graphic Era, Hill University Bhimtal Campus, Nainital, Uttarakhand, India Monika Khandelwal Department of CSE, NIT Srinagar, Hazratbal, Srinagar, J &K, India Reena Kharat Professor, Dept. of Computer Engineering, Pimpri Chinchwad College of Engineering, SPPU, Pune, India
Editors and Contributors
xv
Kavita Khati Graphic Era, Hill University Bhimtal Campus, Nainital, Uttarakhand, India Shivam Krishana Department of CSE, NIT Srinagar, Hazratbal, Srinagar, J &K, India Jitendra Kumar Center for Advance Studies, Lucknow, India Kranti Kumar Srinivasa Ramanujan Department of Mathematics, Central University of Himachal Pradesh, Dharamsala, India Pankaj Kumar Srinivasa Ramanujan Department of Mathematics, Central University of Himachal Pradesh, Dharamsala, India Sanjay Kumar SAG DRDO Ministry of Defense, Delhi, India Joseph Varghese Kureethara Department of Mathematics, Christ University, Bangalore, India A. Leonishiya Department of Mathematics, Bishop Heber College, (Affiliated to Bharathidasan University), Tiruchirappalli, India Preksha Mathur School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Deepti Mehrotra ASET, Amity University, Noida, Uttar Pradesh, India Anuranjan Mishra Center for Advance Studies, Lucknow, India Sanhita Mishra School of Electrical Engineering, KIIT, Bhubaneswar, Odisha, India Vijay Mohan Department of Mechatronics, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India Akash Nag School of Computer Engineering, KIIT, Bhubaneswar, Odisha, India Aditi Naiknaware Department of Computer Engineering, Pimpri Chinchwad College of Engineering, SPPU, Pune, India Yogendra Narayan Department of ECE, Chandigarh University, Mohali, Punjab, India Tathagata Nayak School of Computer Engineering, KIIT, Bhubaneswar, Odisha, India L. Nirmala Devi ECE Department, Osmania University, Hyderabad, India Babita Panda School of Electrical Engineering, KIIT, Bhubaneswar, Odisha, India Lata Pande Kumaun University Nainital, Nainital, Uttarakhand, India Bharti Panjwani Shri Madhwa Vadiraja Institute of Technology and Management, Udupi, Karnataka, India
xvi
Editors and Contributors
Saurabh Pargaien Graphic Era, Hill University Bhimtal Campus, Nainital, Uttarakhand, India Janhavi Pimplikar Department of Computer Engineering, Pimpri Chinchwad College of Engineering, SPPU, Pune, India Agnes Poovathingal Department of Mathematics, Christ University, Bangalore, India Sunil Prajapat Srinivasa Ramanujan Department of Mathematics, Central University of Himachal Pradesh, Dharamsala, India K. Rajeswari Computer Engineering Department, Pimpri Chinchwad College of Engineering, SPPU, Pune, Maharashtra, India C. Rama Krishna Department of CSE, NITTTR, Chandigarh, India Aryan Rana Srinivasa Ramanujan Department of Mathematics, Central University of Himachal Pradesh, Dharamsala, India A. Renuka Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India P. John Robinson Department of Mathematics, Bishop Heber College, (Affiliated to Bharathidasan University), Tiruchirappalli, India Ranjeet Kumar Rout Department of CSE, NIT Srinagar, Hazratbal, Srinagar, J &K, India Pagadala Sandya ECE Department, Osmania University, Hyderabad, India Sai Harsha Varma Sangaraju School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Nitin Satpute Technology Innovation Institute (TII), Abu Dhabi, UAE Mohd Ahsan Siddiqui Department of CSE, NITTTR, Chandigarh, India Kawerinder Singh Sidhu Department of Mechanical Engineering, Uttaranchal Institute of Technology, Dehradun, India Veeramani Sonai Department of Computer Science and Engineering, School of Engineering, Shiv Nadar University Chennai, Chennai, Tamilnadu, India P. Srividya ECE Department, Osmania University, Hyderabad, India K. Venkata Subbareddy ECE Department, Osmania University, Hyderabad, India Chinmaya Kumar Swain Department of Computer Science and Engineering, SRM University, Amaravati, Andhra Pradesh, India Tuhin Tarafder Manav Rachna University, Faridabad, Haryana, India Navneet Tiwari Center for Advance Studies, Lucknow, India
Editors and Contributors
xvii
Saiyed Umer Department of CSE, Aliah University, Kolkata, WB, India Sunil Vadera University of Salford, Salford, UK J. Vanajakshi Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India Om Ramakisan Varma Computer Science and Engineering Department, National Institute of Technical Teachers Training and Research, Chandigarh, India Harsh Kumar Vashisth Manav Rachna University, Faridabad, Haryana, India Om Prakash Verma Dr. BR Ambedkar NIT Jalandhar, Jalandhar, Punjab, India Kamath Vidya Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India Sushma Vispute Computer Engineering Department, Pimpri Chinchwad College of Engineering, SPPU, Pune, Maharashtra, India N. Vivekanandan Mechanical Engineering Department, Pimpri Chinchwad College of Engineering, SPPU, Pune, Maharashtra, India Sahishnu Wankhede Department of Electronics and Telecommunication Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India
A Simple Algorithm to Secure Data Dissemination in Wireless Sensor Network Veeramani Sonai and Indira Bharathi
Abstract Wireless sensor networks are heavily utilized in numerous applications in both industry and science. As the original data is transferred from the sensor node to the sink node, unauthorized individuals attempt to access it, breaching confidentiality. The first step in enhancing the security of data transmission in a wireless sensor network (WSN) is figuring out the best method for encrypting data as it moves from the sensors to the central node. In order to move encrypted data from different sensor nodes to sink nodes efficiently in wireless sensor networks, this paper introduces a simple data encryption technique. Keywords Sensor nodes · Coverage · Tree · Encryption
1 Introduction Wireless sensor networks are regarded to be essential for improving service quality, along with network longevity, coverage, and safe data transmission. Coverage is defined as the behavior of sensor detection in the deployment area, and a region is said to be fully covered if every point is covered by at least one sensor node [1]. Since it enables communication between sensors and other equipment, sensor connectivity, like coverage, is essential for wireless sensor networks. In order for the sensors to communicate with one another, the network connection must provide at least one link. Applications that demand a high level of accuracy always favor complete coverage. A wireless sensor network (WSN) has open characteristics and limited V. Sonai (B) Department of Computer Science and Engineering, School of Engineering, Shiv Nadar University Chennai, Chennai, Tamilnadu, India e-mail: [email protected] I. Bharathi School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_1
1
2
V. Sonai and I. Bharathi
resources, which make it simple for eavesdroppers to intercept data being transferred. Because of this, it’s critical to implement very effective security measures in WSN to ensure data confidentiality [2]. When transmitting data over WSNs, security is crucial to preventing attacks and unwanted access. When there is no suitable means to protect data, there is no point in transmitting data across a fully connected network. Therefore, secure data transmission needs to be taken into account in conjunction with the deployment of wireless sensor networks. Additionally, it can be difficult to send data gathered from several sensor nodes to the chosen sink node. This study helps shed light on this motivation. The remainder of the section is structured as follows: There is a discussion of related works Sect. 2, A brief description of the proposed solution is given in Sect. 3 and results Sect. 4, and conclusions in Sect. 5.
2 Related Works This section serves as an illustration of the different security-related research that has been undertaken. In contrast to conventional public key encryption, the approach does not require a general key certificate and does away with the necessity to handle certificates. The plan deals with the issues with key escrow and revocation, as opposed to identitybased public key encryption [3]. Compared to traditional symmetric encryption, the encryption speed is substantially faster. Data aggregation is a productive method of network data transmission. In the past, public-key-based homomorphic encryption was used to construct the majority of encrypted data aggregations. Public-key encryption requires laborious computation, hence these protocols are not appropriate for use in WSNs. The majority of data in WSNs is sent along a path that includes several sensor nodes from the source node to the receiver node [4]. More security is needed to ensure the integrity, validity, and secrecy of the data flowing via these networks in order to increase their efficiency. Encryption is one of the most often used strategies for providing security services to WSNs. We examine the most important methods that have been suggested in this survey to offer encryption-based security services for WSNs [2]. Proxy re-encryption is extended to WSN during the data encryption stage to defend against attacks from outside adversaries. At the source node, [5], random-numberbased re-encryption key fragments are generated using the threshold secret-sharing technique. Data security is an absolute necessity when sending data from sensor nodes to the base station. This paper examines a few of the data aggregation techniques employed by WSNs to conserve power and protect user privacy. The lightweight Secure Aggregation and Transmission Scheme (SATS) is described in this work as a means of enabling safe and quick computation and data transmission. Instead of the pricey multiplication process, SATS offers a light XOR approach for obtaining batch keys. Additionally, the AN offers the AN Receiving Message Algorithm (ARMA) to aggregate the data produced by sensor nodes. It is explained how to decode the mes-
A Simple Algorithm to Secure Data Dissemination …
3
sage and carry out batch verification at the fog server using the Receiving Message Extractor (RME) approach [6]. A privacy-preserving lightweight data aggregation protocol is recommended to increase security in HWSNs. With the suggested protocol, the medical service provider can quickly acquire the patient’s personal health information by the data collected by client while maintaining their privacy [7]. In order to address the security challenges in WSNs, safe sinkhole detection and transmission model integrating homomorphic encryption and watermarking techniques is presented in this study. The BS created and made the TEEN protocol available, and two schemes rely on it and its communication forms. Each data packet is watermarked to ensure data authenticity using a pseudo-random number generator and a message authentication technique. To protect the identity of sensor nodes in communications between distinct clusters, homomorphic encryption is combined with encrypted sensor node IDs. Simulation results have shown that this method has successfully protected the network’s [8]. Wireless networks are more susceptible to various security threats than directed data transmission because the unguided data transmission medium offers more opportunities for security attacks. As a result, secure data transport through erratic channels is becoming increasingly necessary. For wireless sensor networks to function, four security requirements—integrity, confidentiality, availability, and authenticity must be met [9–12].
3 Proposed Encryption Algorithm As seen in Fig. 1, numerous sensor nodes provide data in real time, which is then communicated to the sink node. When an adversary reads data, the secrecy property is broken. The data sent between the sink node and the sink node is encrypted to get around the issue. Every message generated from each sensor is encrypted as Ci = E(Wi )
.
(1)
where .i represents sensor nodes, .1 0 and nj=1 w j = 1. Furthermore,
Application of Varieties of Learning Rules in Intuitionistic Fuzzy …
P − I F W G w (a˜ 1 , a˜ 2 , ..., a˜ n ) =
n j=1
⎛ v j a˜ j = ⎝
n
(μ j )v j , 1 −
j=1
37 n
⎞ (1 − γ j )v j ⎠
j=1
T where j = 1, 2, ..., n), and ω j > 0, n ω = (ω1 , ω2 , ..., ωn ) is the weight vector of a˜ j ( n ω = 1, and a probabilistic weight p > 0, j j j=1 j=1 p j = 1, v j = βp j + (1 − β)w j , with β ∈ [0, 1] and v j is the weight that unifies probabilities and IFWGs in the same formulation. Especially, if β = 0, then the P-IFWG operator is reduced to IFWG operator. In this work, aggregation computations using P-IFWG, IFWG, P-IFWA, and IFWA operators are performed using Python programming, and the results are recorded for comparison.
3 ANN Approach to Group Decision-Making with Intuitionistic Fuzzy Information The steps proposed for the new ANN are as follows: Step 1 Use the P-IFWG/IFWG/P-IFWA/IFWA operators to aggregate the individual decision matrices into individual column matrices. Step 2 Defuzzify the intuitionistic fuzzy values of the individual column matrices. Step 3 To acquire the weight vector for the defuzzified individual column matrices, apply the learning rule (Delta/Perceptron/Hebb learning rule). Step 4 Determine the threshold value and compute the mean for the resulting weight vector. Step 5 Make use of the activation function to find values that meet the threshold value. Step 6 Utilizing the results of step 5, rank the best alternative to the MAGDM problem.
4 Numerical Illustration: Decision-Making of a Customer with Online Shopping Consider a company that offers online shopping and wants to influence its clients to select the best choice from the range of available alternatives. Five choices are presented to the customer on a panel for their consideration: The first product is A1, the second is A2, the third is A3, the fourth is A4, and the fifth is A5. The four factors that the buyer must consider while selecting the best alternative are G1 (the risk of the purchase), G2 (the market price), G3 (the product’s quality), and G4 (the product’s availability). The five possible alternatives A (i = 1,2,…,5) are to be evaluated using the intuitionistic fuzzy numbers by the three decision makers (whose weighting vector w = (0.45, 0.20, 0.35)T ) under the above four attributes (whose weighting
38
P. J. Robinson and A. Leonishiya
vector ω = (0.2, 0.1, 0.3, 0.4)T ), and construct, respectively, the decision matrices ˜ , k = (1, 2, 3) as follows: as listed in the following matrices Rk = r˜i(k) j m×n
⎛
⎛ ⎞ (0.4, 0.3)(0.5, 0.2)(0.2, 0.5)(0.1, 0.6) (0.5, 0.4)(0.6, 0.3)(0.3, 0.6)(0.2, 0.7) ⎜ (0.6, 0.2)(0.6, 0.1)(0.6, 0.1)(0.3, 0.4) ⎟ ⎜ (0.7, 0.3)(0.7, 0.2)(0.7, 0.2)(0.4, 0.5) ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ R˜ 1 = ⎜ (0.5, 0.3)(0.4, 0.3)(0.4, 0.2)(0.5, 0.2) ⎟, R˜ 2 = ⎜ (0.6, 0.4)(0.5, 0.4)(0.5, 0.3)(0.6, 0.3) ⎟ ⎜ ⎜ ⎟ ⎟ ⎝ (0.7, 0.1)(0.5, 0.2)(0.2, 0.3)(0.1, 0.5) ⎠ ⎝ (0.8, 0.1)(0.6, 0.3)(0.3, 0.4)(0.2, 0.6) ⎠ (0.5, 0.1)(0.3, 0.2)(0.6, 0.2)(0.4, 0.2) (0.6, 0.2)(0.4, 0.3)(0.7, 0.1)(0.5, 0.3) ⎛ ⎞ (0.4, 0.5)(0.5, 0.4)(0.2, 0.7)(0.1, 0.8) ⎜ (0.6, 0.4)(0.6, 0.3)(0.6, 0.3)(0.3, 0.6) ⎟ ⎜ ⎟ ⎜ ⎟ R˜ 3 = ⎜ (0.5, 0.5)(0.4, 0.5)(0.4, 0.4)(0.5, 0.4) ⎟ ⎜ ⎟ ⎝ (0.7, 0.2)(0.5, 0.4)(0.2, 0.5)(0.1, 0.7) ⎠ ⎞
(0.5, 0.3)(0.3, 0.4)(0.6, 0.2)(0.4, 0.4)
Then, utilize the new ANN approach developed to get the most desirable alternative(s). Step 1 Aggregate the Decision Matrices into Column Matrices Let the weights of the attributes be w j = (0.2, 0.25, 0.15, 0.4)T and the probability for the occurance of the attribute weights be given by p j = (0.1, 0.2, 0.3, 0.4)T . Now the P-IFWG operator with the calculation of vj is given as follows: v j = βp j + (1 − β)w j ,β = 0.40. Then we have v j = (0.16, 0.23, 0.21, 0.40)T . Now the computations of the P-IFWG are given by ⎛ ri j
=⎝
n j=1
v μjj,1 −
n
⎞ vj⎠
(1 − γ j )
j =1
= [(μ1 )ω1 · (μ2 )ω2 · (μ3 )ω3 · μ4 )ω4 , 1 − 1 − γ ω1 1 · 1 − γ ω2 2 · (1 − γ ω3 3 · 1 − γ ω4 4 r˜11 = [ 0.4)0.16 · (0.5)0.23 · (0.2)0.21 · 0.1)0.40 , [1−[(1 − 0.3)0.16 .(1 − 0.2)0.23 · (1 − 0.5)0.21 · 1 − 0.6)0.40 .
Hence, r˜11 = [0.20907, 0.46232]. Following the computations of P-IFWG, the collective overall preference values are ⎛
⎞ ⎛ ⎞ ⎛ ⎞ (0.20907, 0.46232) (0.32464, 0.56734) (0.20907, 0.67535) ⎜ (0.45470, 0.24905) ⎟ ⎜ (0.55958, 0.35113) ⎟ ⎜ (0.45470, 0.45404) ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ R˜ 1 = ⎜ (0.45322, 0.24061) ⎟; R˜ 2 = ⎜ (0.55373, 0.34085) ⎟; R˜ 3 = ⎜ (0.45322, 0.44119) ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎝ (0.22865, 0.34319) ⎠ ⎝ (0.34999, 0.43600) ⎠ ⎝ (0.22865, 0.54176) ⎠ (0.42247, 0.18478) (0.52483, 0.24614) (0.42247, 0.34673)
Step 2 Defuzzify the intuitionistic fuzzy values of the individual column matrices By the defuzzification method (rij = 1−μ−γ ), we get the defuzzified values of the collective overall preference values, and the defuzzified collective overall preference values are treated as the Input Training Vectors and denoted as follows:
Application of Varieties of Learning Rules in Intuitionistic Fuzzy …
39
⎛
⎛ ⎛ ⎞ ⎞ ⎞ 0.32861 0.10802 0.11557 ⎜ 0.29625 ⎟ ⎜ 0.08927 ⎟ ⎜ 0.09126 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ X 1 = ⎜ 0.30616 ⎟ X 2 = ⎜ 0.10541 ⎟ X 3 = ⎜ 0.10558 ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎝ 0.42816 ⎠ ⎝ 0.214 ⎠ ⎝ 0.22959 ⎠ 0.39274 0.22902 0.2308 Step 3 Apply the Learning rule (Delta Learning Rule) for the defuzzified individual column matrices to obtain the weight vector Assume the initial weight vector to be w 1 = (0.20.20.20.20.2)T . By applying delta learning rule to these weights, we get Case (i): When d1 = −1, d2 = 1, d3 = −1 net 1 =W 1t X 1 = (0.20.20.20.20.2)(0.328610.296250.306160.428160.39274)T = 0.35038 1 2 − 1 = 0.17341 & f net 1 = 1 − (o1 )2 = 0.48496. o1 = 1 + exp(−0.35038) 2 W 1 = (0.181310.183150.182580.175640.17766)T .
Proceeding to calculate W 4 = c d3 − o3 f net 3 X 3 + W 3 , we get W 4 = (0.180160.182430.181850.173340.17600)T Case (ii): When d1 = 1, d2 = −1, d3 = −1 W 4 = (0.201150.202170.200930.193320.19105) Step 4 Compute the mean for the weight vector to fix the threshold The threshold values of the given functions are obtained by averaging the two matrices resulting from Cases (i) and (ii), respectively. These threshold values are 0.17875 and 0.19772. Step 5 Applying Activation Function Case (i): Using the Binary step function f (x) =
1 f or x > 0.17875 , the decision 0 f or x ≤ 0.17875
variable of the matrix is (11000)T . Case (ii): Using the Binary step function f (x) =
1 f or x > 0.19772 , the 0 f or x ≤ 0.19772
decision variable of the matrix is (11000)T . Step 6 Deciding the final best alternative From Cases (i) and (ii): The best alternatives are A1 and A2.
40
P. J. Robinson and A. Leonishiya
5 Numerical Illustration of Solving MAGDM Problem Using ANN with Delta Method and Hidden Layer It uses the same ANN method described above with a hidden layer, and the computation for the hidden layer entails normalizing each individual intuitionistic decision matrix before using aggregation operations on the normalized decision matrix. The computations described above are used in all other steps of the ANN algorithm that have been proposed. Take into account the same three decision matrices from the previous numerical example. The hidden layer’s normalized individual intuitionistic decision matrices are obtained as follows: ⎞ ⎞ ⎞ ⎛ ⎛ 0.250.250.250.25 0.250.250.250.25 0.250.240.230.25 ⎜ 0.280.240.240.24 ⎟ ⎜ 0.280.240.240.24 ⎟ ⎜ 0.270.240.230.24 ⎟ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎟ ˜ ⎟ ˜ ⎟ ⎜ ⎜ ⎜ ˜ R1 = ⎜ 0.290.250.210.25 ⎟; R2 = ⎜ 0.280.250.220.25 ⎟; R3 = ⎜ 0.280.250.210.20 ⎟ ⎟ ⎟ ⎟ ⎜ ⎜ ⎜ ⎝ 0.310.270.190.23 ⎠ ⎝ 0.270.270.220.24 ⎠ ⎝ 0.270.280.200.23 ⎠ 0.240.200.320.24 0.260.220.260.26 0.240.220.250.26 ⎛
Applying the P-IFWG operator to the above-normalized matrices, the input training vectors for ANN are as follows (Figs. 1, 2, and 3): ⎛
⎞ ⎛ ⎞ ⎛ ⎞ 0.71245 0.71245 0.71245 ⎜ 0.71049 ⎟ ⎜ 0.70932 ⎟ ⎜ 0.70927 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ R˜ 1 = ⎜ 0.71054 ⎟; R˜ 2 = ⎜ 0.71116 ⎟; R˜ 3 = ⎜ 0.71115 ⎟ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎝ 0.70753 ⎠ ⎝ 0.70914 ⎠ ⎝ 0.70909 ⎠ 0.70967 0.71408 0.71401
Fig. 1 Data Comparison between decision makers 1 and 2
Application of Varieties of Learning Rules in Intuitionistic Fuzzy … Fig. 2 Data Comparison between decision makers 2 and 3
Fig. 3 Data Comparison between decision makers 1 and 3
Using the same computation as above, we arrive at (Figs. 4, 5, and 6): Case (i): Binary step function using the threshold from Case (i): f (x) =
1 f or x > 0.13971 0 f or x ≤ 0.13971
The decision variable of the matrix i s(01010)T . Hence the best alternatives are A2 and A4 .
41
42
P. J. Robinson and A. Leonishiya
Fig. 4 Training vectors comparison between 1 and 2
Fig. 5 Training vectors comparison between 2 and 3
Case (ii): Binary step function using the threshold from Case (ii): f (x) =
1 f or x > 0.13656 0 f or x ≤ 0.13656
The decision variable of the matrix is (01110)T . Hence the best alternatives are A2 , A3 , and A4 (Fig. 7).
Application of Varieties of Learning Rules in Intuitionistic Fuzzy …
43
Fig. 6 Training vectors comparison between 1 and 3
Fig. 7 Threshold value comparison between different aggregation operators
6 Discussion The threshold utilized by the new intuitionistic fuzzy ANN removes some of the less significant choice variables from the system of decision-making with the available number of total alternatives, as can be seen from the comparison Table 2. The decision dilemma initially had five alternatives, and it is clear which one should be selected based on its merits. Each of the five factors will be present in the final ranking procedure for the conventional MAGDM problems that are described in the literature
44
P. J. Robinson and A. Leonishiya
Table 1 Selection of the best Alternatives by Intuitionistic Fuzzy ANN and different aggregation operators with/without hidden layers Sl. no.
Learning rule
Hidden layer
1
Delta
No Yes
2 3
Perceptron
4 5
Hebb
6
No
Threshold P-IFWG
Threshold P-IFWA
Threshold IFWG
Threshold IFWA
Ranking
0.17875
0.19861
0.17885
0.18195
A1 , A2
0.19772
0.21238
0.20166
0.22589
A1 , A2
0.13971
0.13971
0.14204
0.14204
A2 , A4
0.13656
0.13656
0.13641
0.13641
A2 , A3 , A4
0.12992
0.13326
0.13017
0.2005896
A1, A2, A3
0.17017
0.17175
0.17094
0.197562
A1, A2 , A3
Yes
0.05797
0.05797
0.05774
0.05774
A4 , A5
0.05775
0.05775
0.057716
0.057696
A2 , A3 , A4
No
0.85408
0.81353
0.84579
1.39
A4 , A5
Yes
2.33259
2.33259
2.33647
2.33647
A1 , A3 , A5
Table 2 Selection of the best alternatives by intuitionistic fuzzy ANN MAGDM methods [9–15]
Ranking of alternatives
Method-1: Ranking with score and accuracy functions
A5 > A2 > A3 > A4 > A1 , Most desirable alternative is A5
Method-2: Ranking with Hamming distance function excluding intuitionistic degree
A1 > A4 > A3 > A2 > A5 Most desirable alternative is A5
Method-3: Ranking with Hamming distance function including intuitionistic degree
A1 > A4 > A3 > A5 > A2 , Most desirable alternative is A2
[9–15]. The benefit of the suggested ANN (Table 1) is that just the crucial decisionmaking variables are output, eliminating the influence of the less crucial variables.
7 Conclusion The arithmetic and geometric aggregation operators, which accept argument pairs, are provided in this work. One component of these operators is employed to create an ordering over the second components, which are intuitionistic fuzzy values, and these components are then aggregated. As a component of ANN, the operators have been used in group decision-making with intuitionistic fuzzy information. A novel algorithm was ultimately solved using some learning rules for ANN that was proposed in this paper. The method suggested in this research was compared to the conventional MAGDM method for handling the same choice problem. Since it eliminates the
Application of Varieties of Learning Rules in Intuitionistic Fuzzy …
45
unnecessary alternatives to decisions from the system and gives the possibility for the inputs to be preserved in their intuitionistic fuzzy set nature, the new technique employing intuitionistic fuzzy ANN proves to be more successful than the prior ways. Future studies will take into account ANN built using advanced MAGDM techniques.
References 1. Atanassov K (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20:87–96 2. Atanassov K (1989) More on intuitionistic fuzzy sets. Fuzzy Sets Syst 33:37–46 3. Atanassov K, Sotirov S, Angelova N (2020) Intuitionistic fuzzy neural networks with interval valued intuitionistic fuzzy conditions. Stud Comput Intell 862:99–106. https://doi.org/10.1007/ 978-3-030-35445-9_9 4. Atanassov K, Sotirov S, Pencheva T (2023) Intuitionistic fuzzy deep neural network. Mathematics 11(716):1–14. https://doi.org/10.3390/math11030716 5. Hájek P, Olej V (2015) Intuitionistic fuzzy neural network: the case of credit scoring using text information. In: Iliadis L, Jayne C (eds) Engineering applications of neural networks (EANN 2015). Communications in Computer and Information Science, vol 517. Springer, Cham. https://doi.org/10.1007/978-3-319-23983-5_31 6. Kuo RJ, Cheng WC (2019) An intuitionistic fuzzy neural network with gaussian membership function. J Intell Fuzzy Syst 36(6):6731–6741. https://doi.org/10.3233/IJFS-18998 7. Kuo RJ, Cheng WC, Lien WC, Yang TJ (2019) Application of genetic algorithm-based intuitionistic fuzzy neural network to medical cost forecasting for acute hepatitis patients in emergency room. J Intell Fuzzy Syst 37(4):5455–5469. https://doi.org/10.3233/jifs-190554 8. Petkov T, Bureva V, Popov S (2021) Intuitionistic fuzzy evaluation of artificial neural network model. Notes Intuitionistic Fuzzy Sets 27(4):71–77. https://doi.org/10.7546/nifs.2021.27.4. 71-77 9. Robinson JP, Jeeva S (2019) Intuitionistic trapezoidal fuzzy MAGDM problems with sumudu transform in numerical methods. Int J Fuzzy Syst Appl (IJFSA) 8(3):1–46. https://doi.org/10. 4018/IJFSA.2019070101 10. Robinson JP, Jeeva S (2019) Application of integrodifferential equations using sumudu transform in intuitionistic trapezoidal fuzzy MAGDM problems. In: Rushi Kumar B et al (eds) Applied mathematics and scientific computing. Trends in mathematics. Birkhäuser, Cham. https://doi.org/10.1007/978-3-030-01123-9_2 11. Robinson JP, Amirtharaj ECH (2016) Multiple attribute group decision analysis for intuitionistic triangular and trapezoidal fuzzy numbers. Int J Fuzzy Syst Appl 5(3):42–76. https://doi. org/10.4018/IJFSA.2016070104 12. Robinson JP, Indhumathi M, Manjumari M (2019) Numerical solution to singularly perturbed differential equation of reaction-diffusion type in MAGDM problems. Springer Nature Switzerland AG, Rushi Kumar B et al (eds) (2019) Applied mathematics and scientific computing. Trends in mathematics, vol II, pp 3–12. https://doi.org/10.1007/978-3-030-01123-9_1 13. Verma OP, Manik G, Jain VK (2018) Simulation and control of a complex nonlinear dynamic behavior of multi-stage evaporator using PID and Fuzzy-PID controllers. J Comput Sci 25:238– 251 14. Xu ZS, Yager RR (2006) Some geometric aggregation operators based on intuitionistic fuzzy sets. Int J Gen Syst 35(4):417–433 15. Yager RR, Filev DP (1999) Induced ordered weighted averaging operators. IEEE Trans Syst Man, Cybern Part B 29:141–150 16. Zhao J, Lin LY, Lin CM (2016) A general fuzzy cerebellar model neural network multidimensional classifier using intuitionistic fuzzy sets for medical identification. Comput Intell Neurosci:1–9. https://doi.org/10.1155/2016/8073279
Automated Tool for Toxic Comments Identification on Live Streaming YouTube Tuhin Tarafder, Harsh Kumar Vashisth, and Mamta Arora
Abstract The necessity for content moderation on social media websites is increasing every day. The reason behind this is the anonymity of an individual which is provided by the internet and that they can exercise on any streaming website like YouTube. This project aids the creators/moderators in maintaining and stabilizing the toxicity of comments on their channel/page. A moderator has the authority to delete/hide any inappropriate comment posted by any user. During the pandemic of Covid-19, a significant, large user base immigrated to social media websites particularly streaming websites like YouTube. This surge in users resulted in an increased need for moderators. Any user can post any hateful/toxic or obscene comment that moderators can miss due to the huge volume of comments. Using NLP (Natural Language Processing), a model can be implemented directly onto any live-streaming chat session which can identify any toxic/obscene or threat comment and can flag them under the respective category. This model is real time and formulated using NLP techniques that are used in order to achieve the task. Keywords Deep learning · NLP · Text classification
1 Introduction Live streaming is becoming more common, and more sites are implementing support. This also involves a live chat section where people can comment in real time. It is important to moderate these chat sections to keep discussions civil, and inclusive. Comments can be toxic, or positive, and in either way, it is essential for the creator to analyze his audience’s reaction to his content. On most popular platforms, the large number of comments are out of human moderators’ capability to sort through. T. Tarafder · H. K. Vashisth · M. Arora (B) Manav Rachna University, Faridabad, Haryana, India e-mail: [email protected] T. Tarafder e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_5
47
48
T. Tarafder et al.
Some platforms provide automated content moderation tools based on simple rules. But these often block legitimate use cases of words that may be offensive in other contexts, or any other language used. This paper aims to create an automated moderation tool to filter comments in live chats, based on their context. An NLP model would be used for identifying if a given comment is toxic, obscene, threat, hate speech, etc. The novelty of this paper will be the integration of the NLP model with the YouTube API for live streams so that the streams can be moderated automatically and filter out toxic comments. The objectives formulated based on the aim of this project are: 1. To perform extensive literature review on sentiment analysis on English text, and to identify the current state-of-the-art methods and any research gaps and limitations of these methods. 2. To build a model that performs sentiment analysis on English comments and evaluate it through relevant metrics. 3. To compare the developed model with existing models. 4. To integrate developed models with live-streaming platforms to help moderators maintain a healthy conversation.
2 Related Research Prior to training a model, preprocessing and feature extraction are first carried out on the raw text. Tang et al. [1] fine-tuned the BERT Model to perform sentiment analysis in codeswitching text. They proposed BERT-MSAUC, a model for multi-label sentiment analysis to deal with unbalanced, code-switching text. The proposed model performs better than BERT-M with an average f1-score of 0.62. BERT-Base Chinese model performed better for code-switching Chinese text. Chakravarthi [2] curated a multilingual dataset and created baseline models on the same. The dataset contains 59,354 comments collected from YouTube. The comments are user-generated and are in three languages English, Tamil, and Malayalam, with each language containing 28,451, 20,198, and 10,705 comments, respectively. The comments are labeled as Hope Speech and Not Hope Speech. Hope speech is categorized as speech that contains inspiration, instills confidence and optimism, offers support, and encourages healthy practices, among other criteria. A limitation of the current dataset is that it is imbalanced. This caused the baseline models to perform poorly. SVM performed the worst F-Scores 0.32, 0.21, and 0.28 for English, Tamil, and Malayalam, respectively. Rupapara et al. [3] used SMOTE to address data imbalance and proposed an ensemble approach for classification, called RVVC (Regression Vector Voting Classifier). The dataset used is highly imbalanced in favor of non-toxic comments. To reduce bias in the trained model, SMOTE has been used to synthetically balance the dataset. For feature extraction, two methods: TF-IDF (term frequency-inverse document frequency) and BoW (Bag of Words) have been applied and compared. Different machine learning models are then applied and compared based on precision,
Automated Tool for Toxic Comments Identification on Live Streaming …
49
recall, accuracy, and F1-Score. The proposed ensemble method combines Logistic Regression, and Support Vector Classification, and achieves the best performance when applied with TF-IDF features, with an accuracy of 0.97. A limitation with the current implementation is the computational complexity of the ensemble method over the individual methods. Also, using SMOTE may have some influence on accuracy. Asif et al. [4] collected a multilingual dataset containing texts and comments from news agencies. The data has been scraped from the news agencies’ websites. The dataset contains texts in 3 different languages: English, Urdu, and RomanUrdu. A lexicon dictionary was then developed that assigns a weight to each lexicon ranging from −5 to +5, representing moderation and extremism, respectively. This lexicon dictionary was validated by domain experts. TF-IDF was used for feature selection. MNV (Multinomial Naïve Bayes) and Linear SVC (Support Vector Classifier) were then used for classifying the texts based on their degree of extremism: neutral, moderate, low-extreme, and high-extreme. Linear SVC performed the best with 82% accuracy. Kanfoud and Bouramoul [5] proposed SentiCode, a new mode of representation for multilingual sentiment analysis. It is very common to switch between languages when commenting on social media sites. Most current methods use machine translation to handle switching between different languages. Instead, it is a language-independent representation. This representation is generated using the SentiCoder algorithm. The vocabulary of SentiCode includes only 7 words- ADJ, ADV, NOUN, VERB, NOT, POS, and NEG. 4 state-of-the-art machine learning algorithms and an MLP were used to evaluate the efficacy of SentiCode. This representation performed consistently with 70% accuracy across the different languages that it was evaluated on. Some future works include improving the negation handling algorithm and applying the representation in other languages. Salminen et al. [6] curated a dataset for the task of classifying online hate speech. The curated dataset contains comments from four different platforms YouTube, Reddit, Wikipedia, and Twitter. Due to this, the dataset contains different types of comments as the 4 sites have different audiences. The authors then evaluated multiple different feature extraction techniques and classification techniques on the dataset. The feature extraction techniques include TF-IDF, Word2Vec, BoW, and BERT. The different classification techniques applied include Naïve Bayes, Support Vector Classification, XGBoost, Logistic Regression, and some deep learning models. The models were evaluated using F1-Score and ROC-AUC. The experimental results showed that XGBoost has the best performance, when used on BERT-extracted features. Kobs et al. [7] curated a dataset by querying the Twitch API periodically for current streams over 3 months. Over 3 billion comments were collected over the 3-month period. From this, a subset of 14.4 million comments were selected from the top-5 most commented English-speaking channels. The data was then labeled through crowdsourcing into 3 labels, positive, negative, and neutral, which represent the commenters’ sentiment. Word2Vec embeddings were trained on a dataset that contains words that had at least 100 occurrences. This word2vec representation is used as input to the model. A variant of CNN, Sentence CNN, was used for training, where the word embeddings were fine-tuned as well along with the weights. Sentence CNN performed the best with a macro-f1-score of 62.6%. The current implementation is bad at dealing with
50
T. Tarafder et al.
unknown words. Exploring better generalization methods is a future work. R et al. [8] have curated a dataset by gathering 2200 student feedback. This feedback was collected from the institute’s educational portal. A total of 2848 comments were taken. After the collection, labels like positive, negative, and neutral were manually labeled. This dataset was then pre-processed using tokenization, stop words removal, and POS tagging. Overall, the result was that 60% of the comments were positive and 40% of them were negative. One of the demerits of this paper is that labeling was done manually, and it is very tough to do on a large scale. One more disadvantage is also that sentiments are classified into 2 categories, and we cannot conclude any results about it. Xu et al. [9] gathered a dataset of 15,000 hotel reviews that included an equal number of positive and negative comments, which they collected using a web crawler. To analyze the sentiment of these comments, the authors employed a method that involved first using TF-IDF vectorization to obtain weights for the comments and then inputting those weights into a BiLSTM model to better understand the context. Using this approach, the authors were able to achieve an F1 score of 92%, indicating a high level of accuracy. In comparison, the authors evaluated five other models, including RNN, CNN, LSTM, and Naive Bayes, and found that their maximum score was only 88%. However, the use of BiLSTM for sentiment analysis requires a lengthy training process, so the authors suggest that future research could explore ways to effectively accelerate this process. Alhujaili and Yafooz [10] collected their dataset by manually annotating comments as positive, negative, or neutral in Arabic. Positive comments and negative comments in Arabic were manually labeled as positive, while all neutral and unrelated comments, including spam and advertisements, were ignored. They also eliminated duplicated and meaningless comments. The dataset was then pre-processed using techniques such as stop words removal, tokenization, lemmatization, and vectorization. The authors found that the SVC, RF, and DL models using oversampling and SMOTE techniques achieved the best accuracy, with a result of 96% in all experiments. On the other hand, Hasan et al. [11] obtained their dataset by extracting Twitter tweets using an API. They collected 100 and 1000 tweets from iPhone and Samsung devices, respectively, and then cleaned the data and applied vectorization using TF-IDF. The authors used the bag of words approach for text classification, and they proposed a generalized approach that could be applied to any website using an API. However, this approach lacks novelty, as the tweets were broadly classified into just two classes, containing positive and negative tweets. In their study, Mehedi Shamrat et al. [12] conducted sentiment analysis on tweets related to COVID-19 vaccines using natural language processing (NLP) techniques and a supervised k-nearest neighbor (KNN) classification algorithm. The main objective of their research was to understand public sentiment regarding COVID-19 vaccines by analyzing tweets on the subject. To achieve this, the authors utilized a dataset of COVID-19 vaccine-related tweets and applied various NLP techniques, including text preprocessing and feature extraction, to prepare the data for analysis. They then employed the KNN algorithm to classify the tweets into positive, negative, or neutral categories. The results of the study demonstrate the effectiveness of using NLP and KNN for sentiment analysis of tweets related to COVID-19 vaccines. By analyzing public sentiment in this
Automated Tool for Toxic Comments Identification on Live Streaming …
51
way, researchers can gain valuable insights into how people feel about vaccines and potentially use this information to inform public health policy and communication strategies. Basiri et al. [13] introduce a new deep learning model for sentiment analysis called ABCDM. The model combines the strengths of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to analyze the sentiment of a given text. The proposed Attention-based Bidirectional CNN-RNN Deep Model uses both forward and backward information from the text to improve the sentiment analysis results. In addition, an attention mechanism is implemented in the model to help it focus on the most important features in the input text. The authors conducted experiments on various datasets and found that the proposed ABCDM model outperformed several state-of-the-art models for sentiment analysis. They conclude that their model can effectively capture the sentiment of a text and has the potential to be a useful tool for sentiment analysis in the future. The ABCDM model’s combination of CNN and RNN techniques and attention mechanism makes it a promising approach for sentiment analysis tasks.
3 Methodology This section outlines the different steps involved (Fig. 1). From reviewing, related research, methods used in different stages of text classification were identified. Some techniques used for feature extraction are word2vec [6, 7], bag of words [3, 6, 11], TF-IDF [3, 6, 9], BERT [1, 6] etc. The choice of which feature extractor to use depends on the use case. Salminen et al. [6] found BERT features to be best performing, while Kobs et al. [7] used word2vec and Asif et al. [4] created a custom lexicon dictionary for their vocabulary. R, S et al. used POS (Parts
Fig. 1 Working methodology
52
T. Tarafder et al.
of speech) tagging and then implemented NLP to classify the collected dataset into two classes of positive and negative comments [8]. Various research has also been performed to perform sentiment analysis on multilingual and code-switching text. Tang et al. [1] proposed BERT-MSAUC, a model based on BERT-Base Chinese, to perform sentiment analysis on code-switching text containing Chinese and English mixed. Kanfoud and Bouramoul [5] suggested a language-independent representation of text, called SentiCode [5]. SentiCode vocabulary contains only 7 tokens, has to be trained only once, and can be used on text in multiple languages and codeswitched text. Asif et al. [4] created a lexicon containing vocabulary in English, Urdu, and Roman-Urdu. Each word in the vocabulary was assigned a weight from -5 to 5, representing extreme and moderate, respectively. Classic machine learning models and deep learning models, both are used for sentiment analysis. Datasets used in sentiment analysis, specifically those for toxic comments, are usually heavily unbalanced. Rupapara et al. [3], and Alhujaili and Yafooz [10] used SMOTE to synthetically balance their dataset before training. Classification techniques that are used include XGBoost [6], Naïve Bayes [2, 4, 6, 9], Logistic Regression [2, 3, 6], KNN (K-Nearest Neighbor) [2, 12], and Decision Tree [2]. Deep learning models used include BERT [5], Sentence CNN [10], BiLSTM (Bidirectional Long ShortTerm Memory) [9], CNN [9, 13, 14], RNN [9, 13], and MLP [5], etc. Xu et al. [9] proposed using TF-IDF for vectorization and BiLSTM for sentiment analysis. This combination provided better results than previous methods.
4 Experiments 4.1 Data Collection Dataset used for training is from Jigsaw Multilingual Toxic Comment Classification Challenge [15]. The dataset consists of English comments collected from the Wikipedia Talks page. It contains ~159,000 comments in the train set and ~153,000 comments in the test set. These comments are labeled into six classes toxic, severe toxic, obscene, threat, insult, and identity hate.
4.2 Text Cleaning and Tokenization Text cleaning includes the removal of stop words or any punctuation. Tokenization as the next step covers lemmatization and stemming to extract the root words. First, all emojis and emoticons were replaced with their corresponding text descriptions. Then HTML tags, IP addresses, email addresses were removed. The text was then tokenized and lemmatized.
Automated Tool for Toxic Comments Identification on Live Streaming …
53
4.3 Vectorization and Feature Extraction After the cleaning, BOW (bag of words) and TF-IDF vectorization are applied which converts text into numerical numbers based on their weightage in the sentence.
4.4 Model Training The vectorized data was then used for model training. In this paper, we implemented MNB and BiLSTM models.
4.5 Model Evaluation As this is a classification task, common classification metrics including Accuracy, F1-Score, Precision, and Recall are used for evaluating the model.
4.6 Deploying Model After evaluating the model, it is deployed to perform sentiment analysis on YouTube live-stream chat sessions. The integration is made using APIs provided by YouTube. The integration automatically deletes comments that potentially contain toxic/HATE speech in real time.
5 Implementation The algorithm has been implemented in a manner that will allow the YouTube channel creator to delete toxic comments automatically and in real time. YouTube API is used to integrate seamlessly. This allows us to create a safe environment for users. The implementation focuses on the smooth integration of the developed machine learning algorithm with the YouTube API. The toxic comments determined by the algorithm will be automatically deleted from the live chats as shown in Fig. 2., the person who posted that comment can still see it but to all the other viewers, the comment is deleted as seen in Fig. 3. The channel creator has to install the API to the local machine in order for the system to work.
54
T. Tarafder et al.
Fig. 2 Working of API
Fig. 3 YouTube comments deleted from live stream
6 Results The two models used in this paper resulted in different accuracies. BiLSTM gave a 97% accuracy while Multinomial Naive Bayes gave an accuracy of 87.44%. For vectorization, TF-IDF vectorization is used in both approaches. The precision, recall, and f-1 score are 0.97, 0.95, and 0.96, respectively. The AUC score is 84% and the AU curve is shown in Fig. 4. Furthermore, the current result of this paper has been compared with the existing papers in Table 1.
7 Conclusion and Future Work This paper revolves around creating a safe environment for users over live-streaming websites like YouTube. Deleting the toxic comments allows both the channel creator and the viewers to peacefully communicate over the medium. A successful implementation of API has been done and the results are positive and have an accuracy of 97%. Currently, this tool works in English language only, the
Automated Tool for Toxic Comments Identification on Live Streaming …
55
Fig. 4 Area under the curve graph
Table 1 Current model used in this paper compared with previous work
S. no
Previous work
F1 score (%)
1
Ours
96
2
Kobs et al. [7]
62.6
3
Xu et al. [9]
92
future work is to upgrade the model so that it can be applied for multilingual bridges as well. Hopefully, with the implementation of such tools, toxicity over social media can be reduced.
References 1. Tang T, Tang X, Yuan T (2020) Fine-tuning BERT for Multi-label sentiment analysis in unbalanced code-switching text. IEEE Access 8(2020):193248–193256. https://doi.org/10.1109/ ACCESS.2020.3030468 2. Chakravarthi BR (2020) HopeEDI: a multilingual hope speech detection dataset for equality, diversity, and inclusion. In Proceedings of the third workshop on computational modeling of people’s opinions, personality, and emotion’s in social media, Association for Computational Linguistics, Barcelona, Spain (Online), 41–53. Retrieved February 3, 2023. https://aclanthol ogy.org/2020.peoples-1.5 3. Rupapara V, Rustam F, Shahzad HF, Mehmood A, Ashraf I, Choi GS (2021) Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access 9(2021):78621–78634. https://doi.org/10.1109/ACCESS.2021.3083638 4. Asif M, Ishtiaq A, Ahmad H, Aljuaid H, Shah J (2020) Sentiment analysis of extremism in social media from textual information. Telemat Inform 48:101345. https://doi.org/10.1016/j. tele.2020.101345 5. Kanfoud MR, Bouramoul A (2022) SentiCode: a new paradigm for one-time training and global prediction in multilingual sentiment analysis. J Intell Inf Syst 59(2):501–522. https:// doi.org/10.1007/s10844-022-00714-8
56
T. Tarafder et al.
6. Salminen J, Hopf M, Chowdhury SA, Jung S, Almerekhi H, Jansen BJ (2020) Developing an online hate classifier for multiple social media platforms. Hum Centric Comput Inf Sci 10, 1:1. https://doi.org/10.1186/s13673-019-0205-6 7. Kobs K, Zehe A, Bernstetter A, Chibane J, Pfister J, Tritscher J, Hotho A (2020) Emotecontrolled: obtaining implicit viewer feedback through emote-based sentiment analysis on comments of popular twitch.tv channels. ACM Trans Soc Comput 3, 2:7:1–7:34. https://doi. org/10.1145/3365523 8. Nandakumar R, Pallavi MS, Harithas PP, Hegde V (2022) Sentimental analysis on student feedback using NLP & POS tagging. In 2022 International conference on edge computing and applications (ICECAA), 309–313. https://doi.org/10.1109/ICECAA55415.2022.9936569 9. Xu G, Meng Y, Qiu X, Yu Z, Wu X (2019) Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7:51522–51532.https://doi.org/10.1109/ACCESS.2019.2909919 10. Alhujaili RF, Yafooz WMS (2022) Sentiment analysis for youtube educational videos using machine and deep learning approaches. In 2022 IEEE 2nd international conference on electronic technology, communication and information (ICETCI), 238–244. https://doi.org/10.1109/ICE TCI55101.2022.9832284 11. Hasan MR, Maliha M, Arifuzzaman M (2019) Sentiment analysis with NLP on twitter data. In 2019 International conference on computer, communication, chemical, materials and electronic engineering (IC4ME2), 1–4. https://doi.org/10.1109/IC4ME247184.2019.9036670 12. Javed Mehedi Shamrat FM, Chakraborty S, Imran MM, Naeem Muna J, Billah M, Das P, Rahman O (2021) Sentiment analysis on twitter tweets about COVID-19 vaccines usi ng NLP and supervised KNN classification algorithm. Indones J Electr Eng Comput Sci 23, 1:463. https://doi.org/10.11591/ijeecs.v23.i1.pp463-470 13. Basiri ME, Nemati S, Abdar M, Cambria E, Rajendra Acharya U (2021) ABCDM: an attentionbased bidirectional CNN-RNN deep model for sentiment analysis. Future Gener Comput Syst 115:279–294. https://doi.org/10.1016/j.future.2020.08.005 14. Pavel MI, Razzak R, Sengupta K, Niloy DK, Muqith MB, Tan SY (2021) Toxic comment classification implementing CNN combining word embedding technique. In Inventive computation and information technologies (Lecture Notes in Networks and Systems). Springer, Singapore, 897–909. https://doi.org/10.1007/978-981-33-4305-4_65 15. Toxic Comment Classification Challenge. https://kaggle.com/competitions/jigsaw-toxic-com ment-classification-challenge. Accessed 9 May 2023 16. Mahara T, Josephine VLH, Srinivasan R, Prakash P, Algarni AD, Verma OP (2023) Deep vs. shallow: a comparative study of machine learning and deep learning approaches for fake health news detection. IEEE Access 11:79330–79340. https://doi.org/10.1109/ACCESS.2023. 3298441
Directional Edge Coding for Facial Expression Recognition System Pagadala Sandya, K. Venkata Subbareddy, L. Nirmala Devi, and P. Srividya
Abstract Because the discriminative edges cannot currently be encoded, the local appearance-based texture descriptors used in facial expression recognition are only partially accurate. The main cause is the existence of noise-induced distortion and weak edges. We suggest a new local texture descriptor called as Weighted Directional Edge Coding for facial expression recognition system (DEC-FERS) to solve these issues instead of using the traditional local descriptors. DEC-FERS looks at neighboring pixels support for determining facial expression attributes including edges, corners, lines, and curved edges. DEC-FERS extracts weaker edge responses through edge detection masks and discards them after encoding only more robust edge responses. Robinson Compass Mask and Kirsch Compass Mask were two makes used for edge detection considered for the extraction of edge features. In addition, the DEC-FERS lessens the redundancy by removing redundant pixels those won’t have any contribution for center pixel. Keywords Face expression recognition · Edge detection · Gaussian weight · Compass mask · Directional encoding · Accuracy
1 Introduction Automatic human emotion recognition has become a key component of the more natural HCI in recent years, mostly as a result of the universal applicability of Human– Computer Interaction (HCI) and drawing the interest of numerous researchers from various domains [1, 2]. A person’s emotion is what causes various signs to appear in one’s head. The emotion serves as a signal for a variety of goals and aids in understanding how the individual is feeling. Emotion supports HCI in a number of self-regulating activities, including marketing, automatic feedback therapies, and self-monitoring [3]. Additionally, a number of intelligent systems, such as those used P. Sandya (B) · K. V. Subbareddy · L. Nirmala Devi · P. Srividya ECE Department, Osmania University, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_6
57
58
P. Sandya et al.
for online gaming, medical treatment, and quality assessment in educational societies, have the potential to significantly benefit from the use of automatic emotion identification. For illustration, health as a result, an effective Face Expression Recognition (FER) System helps to improve people’s mental health by examining their behavior [5, 6]. The face descriptor in FER depicts the subtle characteristics of the facial image’s expression of the subtle characteristics of the expression shown in the face image [7]. If the face descriptor can guarantee a larger variation between other classes, such as different expressions, different ages, different persons, etc., it is deemed to be considerably more successful. A good facial descriptor has previously been developed through research. All prior methods are appearance- and geometricbased methods per their topic [8, 9]. In the first group, the researchers altered the entire or a piece of the facial image using various filters. Methods employed are, however, susceptible to facial misalignments for a variety of reasons, including poor capture, titled faces, etc. Additionally, they are ineffective when faced with various ambient factors including lighting, color, textures, etc. Because they focus on the qualities of the facial image, appearance-based tactics are more successful than geometric ones. Even in appearance-based forms, the Local Binary Pattern (LBP) has grown in favor [10]. It is the most efficient way to describe the face picture through the changes in texture. However, this approach has limits when it comes to various face image positions, ages, and illuminations. Additionally, LBP is not designed to encode facial muscle movements that are made to communicate emotion. In this paper, we attempt to resolve these issues below is a list of this work’s major contributions. 1. To extract edge responses in various directions, we use two different edge extraction masks namely the Robinson Compass Mask and the Kirsch Compass Mask. These two masks convolve the input face image in a total of eight directions and reveal the dominating edges in those directions. 2. To encode the canter pixel, we assess the support of pixels that are situated nearby and have an orientation similar to it. To achieve this, we calculate the weight of the Gaussian distribution for the surrounding eight pixels in eight directions. 3. To identify and eliminate the pixels that are a part of flat zones, this study adapts to an adaptive thresholding mechanism based on the gradient magnitudes. 4. This approach suggests that each pixel be encoded with an edge-directional coding system.
2 Related Work Face Expressions on the face are the human face is an intricate structure that demands a lot of consideration. The face is the human body part most associated with emotion because of the way the muscles work together. Without seeing a person. it can be difficult to comprehend their innermost feelings without seeing a person’s face before playing a crucial role in non-verbal communication. According to the Universality Hypothesis, people of all ages and backgrounds can perceive and express facial expressions as emotions in the same way. Charles Darwin wrote the first article
Directional Edge Coding for Facial Expression Recognition System
59
on facial expressions and their effects [11]. According to Darwin, facial emotions have an evolutionary purpose for survival and are intrinsic, meaning they cannot be learned. Paul Ekman conducted observational research to determine whether people are similar in appearance everywhere. Emotions and feelings are differentiated in psychology. While emotions can persist for hours, their lifespan is only 5 s. The expressions of emotions can also be further divided into voluntarily made and unprompted. All voluntary actions are controlled by the person. They are simple to fake and typically stay longer. Conversely, spontaneous expressions are condensed. From a physiological perspective, facial muscle activation results in facial expression. These muscles are also known as mimetic or facial expression muscles. They are a component of the group of head muscles, which also include the tongue, muscles of the scalp, and muscles of mastication, which move the jaw. The Facial nerve, which has branches in the face, innervates the facial muscles. When this nerve is activated, facial muscles contract, producing a variety of discernible movements. The muscular activities that are typically evident are blocks of skin motion, such as wrinkles on the nose, and forehead, between the eyebrows, and between the lips and cheeks. More specifically, the human face is made up of around 20 flat skeletal muscles. The group is in charge of intricate mouth movements and allows for sophisticated mouth shaping, such as encircling the mouth, controlling the angle of the mouth, elevating or lowering the lower and upper lips separately, lifting and lowering the left and right corners, or moving with the cheeks [1, 3, 12]. The compression and opening of the nostrils are controlled by the nasal group. Between the eyebrows, one of the muscles that are important for facial expressions pushes the eyebrows down and generates creases over the nose [1, 3, 12]. The movements of the eyelid and protection of the eyes are principally controlled by the three muscles that make up the orbital group. Each muscle inserts into the skin between the brows, creating vertical lines showing Muscle Cooperation During Facial Expression as depicted in Fig. 1.
3 Methodology The suggested face expression recognition system is fully explained in this section. Figure 2 depicts the general architecture of the suggested technique. The Viola-Jones algorithm is first used in this figure to identify the face in an input facial expression image. Next, we employ the Kirsch Compass mask and the Robinson Compass mask, two edge detection masks. These two edge masks are distinct and have various filter coefficients for various directions. Each mask is applied to the image in eight different ways, giving each pixel a gradient with eight different magnitudes and directions. They are also used to recognize sharp edges before encoding. Our primary contribution in this paper is the creation of a new encoding method that accurately captures all aspects (edges, curves, corners, lines, etc.). This type of representation makes it easier to distinguish between various expressions. Directional Edge Coding for Facial Expression Recognition System (DEC-FERS), a recently proposed encoding method, encodes the pixel with the aid of analogous neighboring pixels. Following
60
P. Sandya et al.
Sad
Surprise
Fear
Anger
Happy
Disgust
Neutral
Fig. 1 The seven expressions from one subject
the DEC- FERS image representation, our technique identifies the histograms that are supplied to a classifier. Support Vector Machine (SVM) was used in this case for categorization.
Fig. 2 Overall block schematic of the proposed recognition system
Directional Edge Coding for Facial Expression Recognition System
61
3.1 Compass Masks The edge responses from the image must first be extracted in order to represent the facial image with DEC- FERS for removing edge replies, a variety of filters are available, including Sobel, Prewitt, Canny, etc. They lack direction, though. We must identify the edge reactions in many directions since facial expressions are directional varieties. The collected edge responses must then be filtered to only include dominant and significant responses that offer pertinent facial expression data. The two most popular compass masks with an array of eight directional filters are the Robinson and Kirsch compass masks. As a result, they are employed by the majority of computer vision researchers. The facial image was converted into a gradient space prior to processing in order to ascertain the edge reactions. Additionally, the gradient space investigates more details regarding the structural contour of a facial image. As a result, we changed the image’s domain from image to gradient. We put two compass masks over the gradient image their specifics are discussed in the following subsections. A. Kirsch Compass Mask (KCM) A derivative mask called KCM was typically used to find edges in images. KCM is a non-linear edge detector that can identify the highest edge strength in a certain direction. It is also known as the Kirsch Compass Kernel (KCK). The kernels for eight alternative techniques, including (1) North East (N.E.), (2) East (E), (3) South East (S.E), (4) South (S), (5) South West (S.W), (6) West (W), (7) North West (N.W.), and (8) North, are derived using this mask by rotating one kernel in increments of 45° (N) as shown in Fig. 3a and b. The maximum magnitude in all directions is used to compute the edge magnitude. Each pixel will receive eight replies after applying KCM, and the highest value among them is taken into account for calculating the edge/gradient magnitude.
Fig. 3 a Different Kernels of KCM b Edge responses of KCM in eight directions
62
P. Sandya et al.
Fig. 4 a Different Kernels of RCM, b Edge responses of RCM in eight directions
B. Robinson Compass Mask (RCM) Like KCM, RCM is one of the most important and well-liked compass masks for identifying image edges. The RCM, also known as a directional mask, spins a single kernel in eight directions to produce eight distinct kernels for eight directions, including N, N.W., W, W.S., S, S.E., E, and N.E as shown in Fig. 4a and b. Any edge detection mask generally uses the same methodology, using a single mask as a starting point to create a variety of masks by varying the angle at which the seed mask is rotated. The rotation is carried out based on the direction of the zero columns. Since the only symmetrical values in the filter coefficients are 0, 1, and 2, RCM is similar to KCM but easier to apply. Because we applied both compass masks in directional edge coding for the facial expressions recognition system. Another sort of derivative mask used for edge detection is the Robinson compass mask. A direction mask is another name for this operator. This operator rotates a single mask in each of the eight major compass directions listed below.
3.2 Directional Attributes We convolve a facial image input, denoted by X, with a total of eight kernels. The remaining four kernels of the eight kernels are orthogonal to the first four kernels. Therefore, if the first four kernels are regarded as horizontal kernels, the second four kernels are regarded as vertical kernels. The kernel in the exact orthogonal direction, i.e., the West orientation, is considered to be a Vertical kernel in KCM, for instance, if the kernel in the North direction is considered a Horizontal kernel. The edge ij ij magnitude (E m ) and edge direction (θi j ) (E m ) and (θi j ) are viewed as directional attributes) are defined based on the obtained edge responses in these two directions as follows: / ( )2 j 2 E mi + (E m ) E mi j = (1)
Directional Edge Coding for Facial Expression Recognition System
63
And ( θi j = tan
−1
j
Em E mi
) (2)
j
where E mi and E m . These are the magnitudes of edge responses at the image convolution with the ith mask and the jth mask, respectively. They are obtained as E mi = I ∗ (K i )&E mj = I ∗ (K j )
(3)
where K i and K j are, respectively, the kernels in the first (ith) and second ( jth) directions. These directions are illustrated in this way, requiring that they be orthogonal. In this scenario, we have two orthogonal coverings for each mask, one facing upward or forward and the other facing downward or backward. So, using two alternative values of j’s, we use Eq. (1) twice to obtain the pre-final edge magnitude. Think about an edge magnitude moving eastward. The edge magnitude becomes orthogonal at both the north and south trends. The pre-final edge magnitude is calculated in this scenario as the average of two edge magnitudes. On the development of Eq. (1). / 2
2
It may be expressed as E mEast,N or th = (E mEast ) + (E mN or th ) and E mEast,South = / 2 2 (E mEast ) + (E mSouth ) . Based on these values, the pre-final edge magnitude at East
= (E mEast,N or th + E mEast,South )/2. SimiEm larly, the pre-final edge magnitude the at west direction is calculated as E mW est = (E mW est,N or th + E mW est,South )/2. This process continues for all the directions, and we will get eight pre-final edge magnitudes as θmE , θmW , θmN , θmS , θmN E , θmN W , θmS E , θmSW . Among the available eight pre-final edge magnitudes and eight directions, only one volume, and one direction are considered, and they are obtained as the east direction is calculated as
W N S G m = max(E mE , E m, E m, E m, E mN E , E mN W , E mS E , and E mSW )
(4)
(θ d = max(θmE , θmW , θ mN , θ mS , θ mN E , θmN W , θ mS E , θ mSW )
(5)
And
Gm denotes the gradient’s magnitude, while θd denotes its direction. Eq (4) extracts the maximum gradient magnitude. Eq (5) extracts the maximum gradient direction. As a result, the gradient magnitude and gradient direction for each pixel in the facial image are identical. As a result, the gradient image is the same size as the original. The proposed encoding mechanism is then used to process the gradient image.
64
P. Sandya et al.
3.3 Encoding The suggested encoding method searches for characteristics of an edge going through the center pixel. We take into account the gradient’s amplitude and orientation for this reason. The boundary that passes across a pixel can be identified by the gradient’s magnitude and direction. We calculate the Directional Edge Score (DES) at each of a pixel’s eight neighbors in order to determine the DEC-FERS of that pixel. In comparison to the pixel in flat areas, the edge pixel ensures a more pronounced gradient. As a result, we apply a threshold to the gradient magnitude values to identify and remove weak magnitudes that mimic flat pixel features. We delete other pixels, such as flat pixels that do not contribute much, and only compute EDS at the pixels that meet the threshold. Further, we must encode only the consistent edges and restrict the gradient orientations at noisy textures. Further, we must encode only prominent edges and neglect the pixels with some local distortions. Toward assessing such consistency of a boundary passing through a center pixel, we look for the support of neighbor pixels through DEC. DEC-FERS looks for edge segments that have gone through the center and neighboring pixels. Neighbor edge pixels are the ones in question. The focus of the suggested technique is the determination of an edge pixel’s support, which is done with the aid of gradient orientation. Eight template orientations were created by DEC-FERS to resolve these types of edges [13, 14]. These orientations were chosen based on compass masks and are preferably orthogonal to the direction of the relevant edge. Brighter pixel intensities are on the other side, while darker pixel intensities make up the border. In this situation, the gradient magnitudes are the same but the orientations are different. A pixel is regarded as a supporting pixel by the DEC-FERS if the gradient orientation at nearby edge pixels is comparable to the template orientation. In contrast, this study introduced a new weight mechanism based on Gaussian distribution to ensure a more significant weight for the neighbor pixel closer to the template orientation. The Gaussian weight for a gradient orientation at the ith neighbor pixel is measured as Dθi =
2 1 √ e−(θi −μ) /2σ 2 σ 2π
(6)
where θi is the gradient orientation at ith neighbor pixel and μ the mean of gradient orientations of the given template. It is obtained as μ = (θ1 + θ2 + θ3 +θ8 )/8, where θi = 1 to 8 refers to the gradient orientations of eight neighbor pixels. is the standard deviation of gradient orientations of the given template of size 3× 3. As was previously discussed, the gradient magnitude is an important criterion to represent a pixel’s edges since a sharper edge is indicated by a greater gradient magnitude value. As a result the DEC of a neighboring pixel is determined by combining the gradient magnitude with the Gaussian weight of the gradient direction ( DSi =
G im × Dmi , i f G im ≥ ϕ 0 other wise
(7)
Directional Edge Coding for Facial Expression Recognition System
65
where G im is the weight of the gradient orientation of the ith neighbor pixel and Dθi is the gradient magnitude of the ith neighbor pixel. Here, a ϕ threshold called is used to identify the edge pixels that are close by. is utilized to filter the edge responses from featureless, flat regions that contribute much less to the detection of facial expressions. For a neighboring pixel with a gradient magnitude greater than the ϕ threshold and gradient orientation matching template orientation, Eq. (7) yields a more significant value. Conversely, neighboring pixels with no edges will receive insignificant weighted direction scores. In general, line-end, corners, curved edges, and straight edges are among the textures used to represent facial expressions. For instance, the mouth and eyes seem like corners, as opposed to the brows, lips, and eyelids, which appear as curved or merely straight edges. Because different expressions have different shapes at certain textures, the proposed DEC-FERS aims to encode these properties. The proposed DEC-FERS encodes the facial expression as edge segments linking various pixels with a central pixel, using the previously calculated WSias a reference. We require a minimum of two of these pieces to characterize an edge, and a minimum of one segment to characterize a line end. In light of this, two conspicuous edge segments can describe patterns like line-end, corner, and curving edges. Using the acquired DES, we select the d p = argmax2(DSi; i = 0 to 7)
(8)
Equation (8) above returns indices for the top two weights from the available set of eight weights. These two indices represent the directions of adjacent pixels within the given template. Take into account that d p is actually as d 1pand d 2p , the d 1p indicates a primary direction and indicates a secondary direction. We need to apply a direction indicator because, as previously mentioned, the edges have darker pixels on one side and lighter pixels on the other. To accomplish this, we add some important information to the direction dp . In order to do that, we categorize top posing orientation as ⎧ 1 d p = 0, d 2 p = 4, i f θ j ∈ [00 , 1800 ] ⎪ ⎪ ⎨ 1 d p = 1, d 2 p = 5, i f θ j ∈[[450 , 2250 ]] j Sθ ← ⎪ d 1 p = 3, d 2 p = 7, i f θ j ∈ 1350 , 3150 ⎪ ⎩ 1 d p = 2, d 2 p = 6, i f θ j ∈ [900 , 2700 ]
(9)
The brighter and darker sides of the edge correspond to the mapping of d 1pand d 2p , j
to Sθ . Prior to d p ’s Most Significant Bit (MSB), the sign information is added as [ d p = d p + sign; Sign =
j
1 i f Sθ = positi ve j 0 i f Sθ = negati ve
(10)
Because we took into account eight neighboring pixels with eight directions, the direction dp is here represented with 3-bits. The total bit code size increases by
66
P. Sandya et al.
one to make 8- bits. The first bit denotes the sign, followed by the second three directions (main) and the final three directions (secondary). Since the sign of the secondary direction is frequently the same as the sign of the primary direction, code redundancy is achieved by adding a sign bit for the primary direction. Finally, the central pixel’s DEC-FERS code is determined to D EC − F E RS = 23 × d p
(11)
An 8-bit code called directional edge coding (DEC) eliminates the impact of inconsequential false codes during the classification stage. We give the pixels in flat areas the default value of 256 = 28 . To maintain uniformity, the flat sections are also encoded with 8 bits, giving them a decimal value comparable to 256. This is done since the remaining edge pixels are encoded using an 8-bit code pattern.
4 Simulation Experiments In this section, we examine the specifics of simulation experiments performed on the created FER model. For the experimental validation of the created method, we employed the MATLAB tool and various widely used face expression databases. We first go over the specifics of databases in this section before moving on to the specifics of performance indicators. After that, the simulation’s findings and the calculated performance indicators are explained. Finally, we present a thorough comparison using recognition accuracy between the suggested and a number of currently used approaches.
4.1 Database The Osmania University Facial Expression (OU-FER) and Japanese Female Face Expression (JAFFE) database are used for the experimental validation [14]. One of the most well-known face expression databases, OU-FER, has 330 sequences and was created with the assistance of 55 participants. In the total of 55 members, the numbers of boys are 20 and the rest of them are girls, their account is 35. OUFER includes both spontaneous (non-posed) and staged facial expressions. Each sequence in this database begins with a neutral expression and concludes with a peak expression. Each image in the series is 512 × 512 in size. All of the images are in PNG format, and Fig. 1 displays some of the images from the OU-FER database. Another well-known face expression database, JAFFE, was compiled with the aid of ten Japanese female subjects. It has a total of 180 pictures in it. The model cares for the emotive markers in every image; for this reason, they tied their hair back. Each image measures 256 by 256 pixels and is taken in frontal view on all occasions. All
Directional Edge Coding for Facial Expression Recognition System
67
of the photographs are in the .TIFF format and Fig. 5 display some of the images from the JAFFE database.
4.2 Performance Metrics A number of indicators, including Detection Rate, Positive Predictive Value (PPV), F1-measure, Miss Rate or False Negative Rate (FNR), False Discovery Rate (FDR), and Accuracy are taken into account while evaluating the performance of the suggested approach. These performance metrics’ mathematical representations are displayed here; DR =
TP T P + FN
(12)
PPV =
TP T P + FP
(13)
FN FN + T P
(14)
TP +TN T P + T N + FP + FN
(15)
Miss Rate = Accuracy =
F1 − Measure =
2 ∗ Recall ∗ pr ecision Recall + Pr ecision
FDR =
FP FP + T P
(16) (17)
These four metrics—True Positives (T.P.s), True Negatives (T.N.s), False Negatives (F.N.s), and False Positives are the foundation from which all other metrics are formed (F.P.s). As illustrated below, they are derived from the confusion.
4.3 Results Next, we take into consideration a total of 197 photos from 10 participants for the experimental validation of the suggested strategy using the JAFFE database. We chose 20 of these 120 photos for training and the remaining 60 for testing. The 120 training images are broken down into 20 angry and 20 neutral categories. disgust (20), fear (20), happiness (20), sadness (20), and surprise (20). The full set of 60 test
68
P. Sandya et al.
Table 1 Confusion Matrix of Propsoed method on JAFFE dataset Surprise Surprise
Sad
8
0
Disgust 1
Anger 0
Happy 0
Fear
Total
1
10
Sad
1
6
1
0
1
1
10
Disgust
1
1
7
0
0
1
10
Anger
1
0
1
7
0
1
10
Happy
0
0
1
0
8
1
10
Fear
1
1
0
0
2
6
10
Total
12
8
11
7
11
11
60
Table 2 Average performance metrics expression recognition in the OU dataset Emotion/Metric
Dr (%)
PPV (%)
F1-measure (%)
Miss rate (%)
FDR (%)
Surprise
80
60
68.57
50
33.33
Sad
60
75
66.66
40
25.25
Disgust
70
63
66.31
27
36.36
Anger
70
100
82.35
30
0
Happy
80
66
72.32
50
33.33
Fear
60
54
56.84
40
40
photos is divided into the following groups: Surprise—10, Neutral—10, Disgust— 9, Fear—12, Happy—10, Sad—10, and Angry—10. We conducted simulated trials using this configuration, and the outcomes are displayed here (Tables 1 and 2). Since happy and surprise are composed of rigid muscle movement over proposed method can describe them effectively through a compass mask. Hence, we experienced more miss rate for these expressions.
5 Conclusion In this project, we created an automatic FER system that analyzes face images to detect a person’s expression. We provide a brand-new texture encoding scheme called DEC-FER that precisely and uniquely describes facial expressions. The edge answers are viewed by DEC-FER as only the stronger edge magnitudes are entered, while the smaller edge magnitudes are discarded. To clarify the ambiguity in the classifier, the weaker magnitudes are eliminated because they are less significant, and up the ambiguity in the classifier, the weaker magnitudes are eliminated because they are less significant and less instructive. For the extraction of edge responses, we used RCM and KCM in eight orientations with various kernels. The DEC-FER encoded image is finally described using histograms before being fed into an SVM for classification
Directional Edge Coding for Facial Expression Recognition System
69
for the simulation study, we used two databases, including OU-FER and JAFFE, and we carried out different.
References 1. Bovik AC (2010) Handbook of image and video processing. Academic Press 2. Babu CR, Sreenivasa Reddy E, Prabhakara Rao B (2015) Age group classification of facial images using rank based edge texture unit (RETU) 3. Shan C, Gong S, McCowan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27(6):803–816 4. Darwin C, Ekman P, Prodger P (1998) The expression of the emotions in man and animals. Oxford University Press, USA, p 5 5. Chen M, Zhang Y, Qiu M, Guizani N, Hao Y (2018) SPHA: smart personal health advisor based on deep analytics. IEEE Commun Mag 56(3):164–169 6. Cowie R, Cowie ED, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18(2001):33–80 7. McClure EB, Pope K, Hoberman AJ, Pine DS, Leibenluft E (2003) Facial expression recognition in adolescents with mood and anxiety disorders. Am J Psychiatry 160:1172–1174 8. Fragopanagos N, Taylor JG (2005) Emotion recognition in human-computer interaction. Neural Netw 18:389–405 9. Goyani MM, Patel N Recognition of facial expressions using the local mean binary pattern. Electron Lett Comput Vision Image Anal 10. Mandal J, Satapathy S, Kumar Sanyal M, Sarkar P, Mukhopadhyay A (eds) Information systems design and intelligent applications. Advances in intelligent systems and computing, vol 340. Springer, New Delhi 11. FaridulHaqueSiddiqui M, Javaid AY (2020) A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images. Multimodal Technol Interact 4(46):1–21 12. Wallace S, Coleman M, Bailey A (2008) An investigation of basic facial expression recognition in autism spectrum disorders. Cogn Emot 22:1353–1380 13. Zhao W, Chellappa R, Phillips PJ, Rosenfeld A (2003) Face recognition: a literature survey. ACM Comput Surv 35(4):399–458 14. Tian YL, Kanade T, Cohn JF (2005) Facial expression analysis, Handbook of face recognition. Springer, pp 247–276 15. Zhang Z, Pantic M, Roisman GI, Huang TS (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
A Cascaded 2DOF-PID Control Technique for Drug Scheduling of Chemotherapy System Bharti Panjwani, Vijay Mohan, Himanshu Gupta, and Om Prakash Verma
Abstract This paper proposes a cascade control approach with a 2DOF-PID controller for finding the optimal drug schedule during chemotherapeutic treatment. The cell-cycle nonspecific model of tumor growth for breast cancer is considered. Drug concentration and toxicity are regulated by individual controllers. The main objective of the controller design is to reduce the tumor size with minimum harm to the patient. The problem of diverse objectives in chemotherapy is solved by using the NSGA-II optimization technique. The response by 2DOF-PID and conventional PID approach are compared for the designed model. In addition, the performance of controllers in dealing with parametric uncertainty is also included in this work. The findings suggest that the proposed 2DOF-PID technique performs better. Keywords Chemotherapy · NSGA-II · Cascade control · 2DOF-PID
1 Introduction Chemotherapy is the cyclic course of treatment where some toxic agents are utilized to kill the cancer cells. A chemo-drug schedule defines drug combination, duration, period, and the amount of drugs given to the patient. The proliferation of malignant cells is exponential in nature [1]. A mathematical model shows the relation between
B. Panjwani (B) Shri Madhwa Vadiraja Institute of Technology and Management, Udupi, Karnataka, India e-mail: [email protected] V. Mohan Department of Mechatronics, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India H. Gupta Department of Computer Science, ABES Institute of Technology, Ghaziabad 201009, India O. P. Verma Dr. BR Ambedkar NIT Jalandhar, Jalandhar, Punjab 144011, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_7
71
72
B. Panjwani et al.
the cancer cell population, normal cell population, and toxicity. Various models for tumor growth with chemotherapeutic treatment are discussed in the literature [2, 3]. Chemotherapeutic treatment is based on data from drug development during clinical trials [4]. In the present scenario, finding the optimal dose that provides effective therapy while minimizing adverse effects is of primary importance. For the complete course of therapy, it is important to keep the concentration and overall toxicity below the threshold levels. Closed-loop chemotherapeutic treatment control has also been implemented by several researchers [5] for regulating the drug dose. In this work, individual controllers are used in a cascade technique for regulating drug concentration and drug toxicity. A patient model with tumor growth under the effect of chemotherapy drugs is simulated in a MATLAB environment. 2DOFPID controllers are used in a cascaded structure for determining the optimal drug dose. To obtain optimum controller performance, a non-dominated sorting genetic algorithm (NSGA-II) is used for tuning of the controller. A cascaded structure with PID controllers is also designed for comparative analysis. The efficacy of the designed schemes is also tested in dealing with uncertainty in the model. The following sections are presented in this paper: Sect. 2 presents a mathematical model for the tumor growth and the chemo-effects. Control design and optimization of controllers are discussed in Sect. 3. The results are presented in Sects. 4 and 5 provides the conclusion of the research work.
2 Mathematical Modeling The tumor growth under the effects of chemotherapy with time is described by differential equations as given by Martin [1]. Exponential model is considered for tumor growth as described below N (t) = 1012 ∗ e−Y (t)
(1)
d Y (t) = −αY (t) + k[D(t) − β] ∗ H (D(t) − β) dt
(2)
H (D(t) − β) = {1i f D(t) ≥ β0i f D(t) ≤ β
(3)
Equation (1) shows the exponential growth of tumor with time ‘t’, with an initial cancer cell population of 1012 cells, and Eq. (2) describes the effect of the toxic drugs on cancer cell population by a logistic equation with intermediate variable Y(t). The threshold of the drugs effect is specified by a Heaviside function of drug concentration in Eq. (3). Drug concentration variation in the body after injection of drug dose for the complete treatment period is specified in Eq. (4). Further, the rate of change of toxicity in the patient body depending upon drug concentration is given by Eq. (5).
A Cascaded 2DOF-PID Control Technique for Drug Scheduling …
73
Table 1 Parameters of tumor under chemotherapy Symbol
Quantity
N (t)
Cancer cell population at time t
Value
D(t)
Drug concentration in patient body
u(t)
Input drug dose
T (t)
Toxicity level in patient body
α
Growth rate of cancer cells
1.5*10–4 units per day
k
Cells killing rate
9.9*10–3 per day per drug unit
β
Drug threshold level
10 drug units
γ
Drug decay rate
0.27 per day
η
Toxicity elimination rate
0.4 per day
It is assumed that at the start of chemotherapy, the drug concentration, toxicity, and drug dose present in the patient body is 0. The variables of the designed models are described in Table 1 [6]. d D(t) = u(t) − γ D(t) dt
(4)
d T (t) = D(t) − ηT (t) dt
(5)
A model for chemotherapeutic treatment of cancer is simulated in MATLAB Simulink using the above dynamics. Different drug schedules and drug regulation methods can be tested and the optimal drug schedule is determined.
3 Controller Design Optimal chemotherapy requires maximum cancer cell killing while avoiding the adverse effects on normal cells. This can be achieved by the use of advanced automation and optimization techniques in drug administration. As the primary target is to obtain and maintain the desired toxicity of 100 units [7], a cascaded control structure is implemented, with an individual controller for toxicity and drug concentration regulation. Figure 1 shows the proposed control structure. The controller selected to manipulate the drug infusion rate is a Two Degree of Freedom-Proportional Integral Derivative Controller (2DOF-PID), which is an extension of a classical PID (Fig. 2). In the above cascade structure for the chemotherapy drug control system, the inner controller C1 regulates the drug dosage infused to the patient body. The outer loop is the primary loop which regulates the drug concentration according to the required toxicity and actual toxicity.
74
B. Panjwani et al.
Fig. 1 Cascaded control scheme
Fig. 2 Structure of 2DOF-PID controller
3.1 Design of Two Degree of Freedom-Proportional-Integral-Derivative (2DOF-PID) Controller A 2DOF-PID control technique is employed as it can efficiently address multiple constraints of the drug delivery system. The 2DOF-PID controller comprises two closed loops for set point tracking and disturbance rejection. It has the capability to mitigate the disturbance before it can affect the control signal hence performing better than conventional PID. The structure of the 2DOF-PID controller is defined as U2D O F P I D = K P (b ∗ R − Y ) +
KI (R − Y ) + K D ∗ Q(s) ∗ (c ∗ R − Y ) s
(6)
A Cascaded 2DOF-PID Control Technique for Drug Scheduling …
75
where K P , K I and K D are proportional, integral, and derivative gain. R is the set point and Y is the system response. The value proportional and derivative gain are determined by weights b and c, respectively. Q(s) is the filter, to impede the derivative action for sudden change in set point, thus completely shunning the effect of derivative kick [8]. The structure of the filter considered is described below Q(s) =
N 1+
N s
(7)
The Eq. (6) can be simplified as KI (R − Y ) s + K D ∗ Q(s) ∗ (c ∗ R − c ∗ Y + c ∗ Y − Y )
(8)
KI (R − Y ) s + K D Q(s)c(R − Y ) + K D Q(s)(c ∗ Y − Y )
(9)
U2D O F P I D = K P (b ∗ R − b ∗ Y + b ∗ Y − Y ) +
U2D O F P I D = K P b(R − Y ) + (K P ∗ b ∗ Y − K P ∗ Y ) +
Putting, ‘E = R − Y ’, in the above equation and rewriting it in the form of error and output, KI ∗ E + K D c ∗ Q(s) ∗ E − K P (1 − b) ∗ Y s − K D (1 − c) ∗ Q(s) ∗ Y (10)
U2D O F P I D = K P b ∗ E +
KI + K D c ∗ Q(s) ∗ E − {K P (1 − b) U2D O F P I D = K P b + s +K D (1 − c) ∗ Q(s)} ∗ Y
(11)
The final control action U2D O F P I D , is utilized to manipulate the infusion of drug dosages. The treatment period is considered as 84 days. The control structure of the 2DOF-PID is depicted in Fig. 2. The PID controller is also designed to compare the performance based on the surviving cancer cell population after the treatment duration.
3.2 Optimization of Controller Parameters The controllers perform best when tuned properly, therefore an evolutionary algorithm NSGA-II is used to determine the optimal values of controller gains. For the safety of patient, constraints are considered in drug concentration and toxicity level
76
B. Panjwani et al.
Fig. 3 Flow chart representing the NSGA-II Multi-objective genetic algorithm
in the patient body. 10 ≤ D(t) ≤ 50 T (t) ≤ 100 The drug input dosage should kill the cancer cells with the least harm to normal cells. So, the objective functions are defined as Obj1 =
|e1| & Obj2 =
|e2|
(12)
NSGA-II is employed to determine various satisfying solutions by trading off between various objectives while satisfying the constraints. NSGA-II is an evolutionary algorithm that produces better solutions from the parent solution and works on the principle of survival of the fittest. The steps to implement NSGA-II are detailed in the literature [9, 11]. The design steps for implementing NSGA-II are shown in Fig. 3 [10]. The algorithm is executed in MATLAB for chemotherapy drug schedule optimization.
4 Simulation and Results A first-order Euler differential equation solver is used for simulation with a sampling time of 10 ms. There are 10 design variables that are optimized by NSGA-II with a population size of 50. Figure 4 shows the Pareto Front of optimized solutions
A Cascaded 2DOF-PID Control Technique for Drug Scheduling …
77
Fig. 4 Pareto front
Table 2 Optimized controller parameters Parameters
2DOF-PID (C2)
2DOF-PID (C1)
PID (C2)
PID (C1)
P
0.1962
0.530824
0.328
0.3
I
0.0865382
0.473904
0.055
0.081
D
0.55
0
0.328
0
b
0.50171
0.9995
–
–
c
0.739292
0.9126
–
–
Obj 1
61,435
–
72,367
–
Obj 2
2776.95
–
5293.61
–
for various controllers. The gains of all the controllers determined by the NSGA-II algorithm are described in Table 2. Tracking response for reference toxicity by designed 2DOF-PID and conventional PID controller is depicted in Fig. 5. The performance indices comparison for the same is described in Table 3. The performance comparison of both the controllers is carried out for the cancer cell population surviving at the end of treatment. At the end of 84 days of treatment, the 2DOF-PID scheme has 0.579 cells whereas for PID it is 1.21. Hence, it is proved that the proposed 2DOF-PID controller produces better results as compared to PID.
4.1 Uncertainty Analysis The robustness of the controllers is evaluated against variation in system parameters. Each parameter is perturbed from its nominal value by increasing and decreasing the value by 15% at an interval of 5%. The response of the controllers in dealing with uncertainty is presented in Table 4. The results show that both the controllers perform
78
B. Panjwani et al.
Fig. 5 Response by 2DOF-PID and PID a Drug Dose b Drug Concentration c Toxicity d Cancer cell population
Table 3 Performance index comparison for tracking toxicity Control technique
Rise time (days)
Settling time (days)
Overshoot (%)
2DOF-PID
12
22
0.018
PID
17
33
0.00
efficiently under parametric uncertainty but 2DOF-PID exhibits better performance as compared to traditional PID controllers. The 2DOF-PID controllers work better as they have the capability to reduce the influence of changes in the reference signal on the control signal.
A Cascaded 2DOF-PID Control Technique for Drug Scheduling … Table 4 Uncertainty analysis
% Uncertainty in α, k, γ , η
79
PID N (84)
2DOF-PID N (84)
Increase 5
0.09
0.05
10
0.07
0.05
15
0.08
0.05
Decrease 5
13
8.5
10
131
93
15
1180
890
5 Conclusion A cascaded structure with a specific controller for toxicity and drug concentration is proposed in this work. NSGA-II optimized 2DOF-PID controller is designed for drug delivery of anti-cancer drugs to patient model. Comparative analysis is carried out from the existing PID control method. Results prove that the proposed 2DOFPID provides superior performance. The presented 2DOF-PID control scheme is also more efficient in dealing with uncertainty in the system. A more detailed model for chemotherapy with dynamics of the tumor, drugs, and immunological response of the body for highly accurate and precise regulation of the drug dose and thorough analysis can be done in the near future.
References 1. Martin RB (1992) Optimal control drug scheduling of cancer chemotherapy. Automatica 28(6):1113–1123 2. Gardner SN (2002) Cell cycle phase-specific chemotherapy: Computation methods for guiding treatment. Cell Cycle 1(6):369–374 3. Shi J, Alagoz O, Erenay FS, Su Q (2014) A survey of optimization models on cancer chemotherapy treatment planning. Annal Operat Res 221(1):331–356 4. Panjwani B, Singh V, Rani A et al (2021) Optimum multi-drug regime for compartment model of tumour: cell-cycle-specific dynamics in the presence of resistance. J Pharmacokinet Pharmacodyn 48:543–562 5. Sofiane K et al (2016) A measurement-based control design approach for efficient cancer chemotherapy. Inf Sci 333:108–125 6. Tse S-M, Liang Y, Leung K-S, Lee K-H, Mok TS-K (2007) A memetic algorithm for multipledrug cancer chemotherapy schedule optimization. IEEE Trans Syst Man Cybern Part B Cybernetics 37(1):84–91 7. Panjwani B, Mohan V, Rani A, Singh V (2019) Optimal drug scheduling for cancer chemotherapy using two degree of freedom fractional order PID scheme. J Intell Fuzzy Syst 36(3):2273–2284 8. Mohan V, Chhabra H, Rani A, Singh V (2019) An expert 2DOF fractional order fuzzy PID controller for nonlinear systems. Neural Comput Appl 31(8):4253–4270
80
B. Panjwani et al.
9. Deb K et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evolut Comput 6.2:182–197 10. Mohan V, Chhabra H, Rani A, Singh V (2018) Robust self-tuning fractional order PID controller dedicated to non-linear dynamic system. J Intell Fuzzy Syst 34(3):1467–1478 11. Pachauri N, Panjwani B, Vigneysh T, Mohan V (2023) A cascaded NPID/PI scheme for the regulation of stack voltage in proton exchange membrane fuel cell. Int J Hydrogen Energy. ISSN 0360–3199
Distinguishing the Symptoms of Depression and Associated Symptoms by Using Machine Learning Approach Akash Nag, Atri Bandyopadhyay, Tathagata Nayak, Subhanjana Banerjee, Babita Panda, and Sanhita Mishra
Abstract Mental health has become an important aspect of the daily lives of people these days. As days are becoming hectic, work pressure on them increases, and as a result, people suffer from various mental health conditions. Depression is one such mental health condition, which is affecting the youths at large. Our aim is to provide a machine learning approach so as to detect the signs of mental health conditions so that the victims can be treated as soon as possible. The main machine learning model uses 9 algorithms, k nearest neighbor, Logistic regression, Decision tree, Stacking, Random forest, Bagging, Boosting, Xgboost, SVM, and compared for their accuracy. The feature selection method is used to identify the eight most important and correlate attributes from the 27 attributes identified from the text. Classifiers were made to act on the eight attributes and ROC curves were generated in order to predict their accuracy. This would go a long way in diagnosing and treating patients for much healthier well-being. Every life is meaningful. There is no shame in admitting someone suffers from mental health conditions and should immediately consult an expert to overcome their problems. Keywords Depression · Mental health · Machine learning · Decision tree
1 Introduction Machine learning is a computational paradigm that aims to train our computer in such a manner to explicitly make decisions of its own without human intervention. It makes the machine learn through a training dataset about what to do and then tests its performance. If a significant accuracy is achieved the machine can be said to take decisions of its own. It is a computational approach that enables machines A. Nag · A. Bandyopadhyay · T. Nayak · S. Banerjee School of Computer Engineering, KIIT, Bhubaneswar, Odisha, India B. Panda (B) · S. Mishra School of Electrical Engineering, KIIT, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_8
81
82
A. Nag et al.
to observe and then learn from the training dataset, identify trends, and then take measures to improve its accuracy by various measures. Machine learning applies the involvement of models or algorithms that can be trained on a dataset to recognize patterns or relationships. These models are designed to generalize from the training data and make predictions or take actions on new, unseen data. Machine learning algorithms can be broadly categorized into three main types. Supervised Learning: This is an approach where we tend to use the labeled dataset. The training Dataset has its output labels based on which, the output classifier detects the test dataset labels. Unsupervised Learning: As opposed to the supervised learning methods, we have an unlabeled dataset in case on unsupervised learning. The machine tends to follow certain patters and certain trends in order to generate a relation between them. This relation helps in clustering or classifying the test data into various classes which are then used by the user. The main aim of the method is to reveal the hidden trends and patterns present in the dataset. Reinforcement Learning: in this method, there is an agent who interacts with the environment to prospect an ability to take actions and encompass optimal decision-making that maximizes the reward signal. This mainly performs by trial and error and receives a reward signal that judges the effectiveness of the action performed by the agent [1]. Machine learning has carved itself a niche in the domain to computer science. Image recognition and speech detection are some of the main applications. Natural language processing helped us in identifying the sentiment of a statement. Recommendation system began to form a way of our life. Fraud detection and healthcare diagnostics are relatively new niches of ML that are being worked on. The models have the capacity to extract useful information from the data and use them to predict future events. To achieve successful machine learning outcomes, key steps include data collection and preprocessing, feature engineering, model selection, and training, evaluation, and deployment. The performance of a machine learning model is often measured using metrics such as accuracy, precision, recall, and F1 score, depending on the specific problem and objectives. Overall, machine learning has become a powerful tool in solving complex problems, enabling computers to learn from data and make intelligent predictions or decisions, thus driving advancements in various fields and industries. The current situation of a person’s thinking and how they engage with the world are both indicators of their mental health. Mental Health has been defined by the World Health Organization (WHO) as the condition of a person who can effectively manage their stress in life while carrying on with their normal, productive lives and contributing back to society. It can be inferred that anxiety and tension can differ the personality and have a significant effect on the personalities of people of different age groups starting from youngsters to elders of diverse backgrounds. It is essential to find the health issues of various categories at various times to prevent major sicknesses. An individual’s way of living or life could be the root cause of aspects that affect mental health, including employment stress, poor financial circumstances, family problems, interpersonal difficulties, violence, and environmental concerns In the near future, it is assured that healthcare professionals have to take into account
Distinguishing the Symptoms of Depression and Associated Symptoms …
83
the mental health of the patients in order to provide the medication and recovery process of them [2, 3]. A number of serious mental health conditions, like chronic illnesses, bipolar disorder, and schizophrenia, develop gradually over time and have early-stage signs that can be identified. These health conditions could be prevented or better regulated successfully. A massive 450 million individuals globally suffer from mental illness, resulting in up to 13% of all diseases. According to the WHO, one in four people will encounter mental problems at some stage in their lives. WHO’s recommendation on managing the physical conditions of individuals having severe mental health issues was published in 2018. Those with serious mental disorders such as schizophrenia, bipolar disorder (BD), psychotic disorder, and depression often pass away earlier than the rest of the population. Initial detection and treatment of mental health issues are essential. People with mental health issues can find relief through early detection, proper diagnosis, and efficient treatment. The effects of mental illness can be severe for the affected people, their families, and society at large. In general, interviews, self-reporting, or the distribution of surveys are used in traditional methods of mental health detection. Traditional techniques, however, are typically time-consuming.
2 Literature Review There are several research papers that have conducted thorough research on the prediction of mental health by using machine learning algorithms. T. Nagar [4] studied the effect of eight machine learning models on a dataset that aimed at pointing out various mental health issues among individuals. The results that came out showed evident results that LAD tree and multiclass classification showed more accuracy over other models on the dataset. Shatte et. al. [5] also contributed a paper on this topic. The main aim of the paper was to work on machine learning algorithms that were mainly focused on some specific mental health diseases like Alzheimer’s disease, depression, and schizophrenia. The models predicted the chances of the patients having the diseases and also could project on ayes of their well-being and sustainable living in their daily life. Srividya et al. [6] explains how mental health analysis may be presented in a way that is simple to comprehend for a variety of target audiences. The researchers created a framework for assessing a person’s mental health and then used this framework to create prediction models. Clustering techniques were utilized to establish the ideal number of clusters prior to model creation. The created class labels underwent MOS validation before being applied to classifier training. According to the experimental findings, KNN, SVM, and Random Forest, all produced comparable performance are described in [7, 8]. Additionally, it was shown that the inclusion of ensemble classifiers considerably improved mental health prediction accuracy, obtaining a rate of 90%. The main aim of Iliou et. al. [9] is to show how the ILIOU preprocessing method works and how it may improve the accuracy of the classification and clustering algorithms. This technique may
84
A. Nag et al.
also be used to foresee different types of depressive illnesses. The ability to accurately predict depression is crucial because it enables patients to quickly obtain the best care. Konda Vaishnavi et al [10] in their paper discussed how Artificial intelligence can be utilized in order to detect various mental disorders like depression and schizophrenia. They inferred that machine learning models do have enough capacity to predict the mental state of a patient with greater accuracy. Also, they can be used for clinical purposes if modeled properly using a bit diverse dataset and complex algorithms. Machine learning, an emerging technology that has great potential for boosting mental health research and practice, is progressively penetrating the healthcare industry. For a diverse community of experts involved in mental health research and care, including researchers, healthcare professionals, patients receiving treatment, and regulatory bodies, to fully realize the potential of these technologies, effective communication channels must be established and collaborative interactions fostered. In order to acquire deeper insights into the patterns of representation within the field of human-centered machine learning (HCML), the researchers undertook an evaluation using discourse analysis. They were able to compile a dataset comprising 55 trans-disciplinary researches by analyzing social media data, especially in regard to predicting mental health conditions. Their research findings indicated disparate discourses about interaction present across the data set, which influenced and assigned agency to human players. These findings demonstrate how the five discourses provide opposing viewpoints on the human as both object and subject, unintentionally creating a dehumanization danger.
3 Methodology 3.1 Dataset Overview The dataset which is being considered here [5] consists of details of the patients which may help us in deciding the factors that can be considered to infer that the person is suffering from mental depression. It has several columns including: 1. 2. 3. 4. 5. 6.
Age of the patient Gender of the patient if the patient is self-employed family history of the patient treatment record work interfere
Our target variable is the Treatment column, after plotting the heat map of the 10 most correlated variables in Fig. 1, we conclude that work interference plays a major role, followed by a family history of depression. There were several other interpretations that were needed to be made before predicting the results. Most of them helped us in reaching various insightful conclusions regarding the patients.
Distinguishing the Symptoms of Depression and Associated Symptoms …
85
Fig. 1 Heatmap of ’Treatment’
1. Age: Most of the people who responded were between 14 and 16 years of age which can be easily figured out from Fig. 2.
Fig. 2 Distribution and density by age
86
A. Nag et al.
Fig. 3 Probability by age
2. Gender: Transgender people showed a greater chance of mental health conditions, except for the age range of 66–100, where females showed the most probability of the same as evident from Fig. 3. 3. Family history: People who had a family history of mental health condition, showed to inherit the same most of the time. Hence, their probability was more than people who did not have a family history of mental health condition as evident from Fig. 4. 4. Work interfere: Work interfere also played an important role. It was evident that people having often work interference tend to be more susceptible to mental health conditions. Feature importance analysis was carried out, which suggested that the most important feature is the age of the person, followed by gender and family history.
Fig. 4 Probability by Family history
Distinguishing the Symptoms of Depression and Associated Symptoms …
87
4 Methods Used 4.1 Decision Tree Classifier In machine learning, a decision tree classifier is an algorithm that builds a tree-like model to make predictions. It learns from labeled data, where each data point has input features and corresponding target labels. The decision tree is constructed by recursively partitioning the data based on feature values. The goal of the decision tree classifier is to create subsets of data that will be as pure as possible in comparison to the target variable. At each internal node of the tree, a feature is selected, and a splitting criterion is used to determine how to divide the data based on that feature. This process is repeated until a threshold criterion is obtained, like reaching a maximum tree. During the artifact of the decision tree, different metrics are used to evaluate the quality of splits [11]. These metrics include Gini impurity, entropy, or information gain. They measure the homogeneity or impurity of the subsets created by each split. The algorithm aims to find the splits that maximize the homogeneity or information gain. Once the decision tree is constructed, it can be used to foretell the target description for new instances. The instance is traversed through the tree from the root to a leaf node, based on the characteristic values. The foreseen label at the leaf node becomes the output of the decision tree classifier. One advantage of decision tree classifiers is their interpretability. The resulting tree structure can be easily understood and visualized, making it valuable for gaining insights into the decision-making process. However, decision trees are inclined to overfitting issues, especially if they become very complex or if the dataset contains noisy or irrelevant features. Techniques like pruning, which removes unnecessary nodes, or ensemble methods like random forests can help address overfitting and improve generalization performance. In summary, decision tree classifiers create a tree-like model by recursively partitioning data based on feature values. They are interpretable and handle both categorical and numerical features. However, they can overfit and require techniques to mitigate this issue [12].
4.2 Random Forest A random forest is a popular machine learning algorithm that leverages the power of ensemble learning to improve predictive accuracy and reduce overfitting. It harvester aggregate decision trees into a “forest” and makes enunciation based on the average or bulk vote of the single trees. All decision trees in the random forest are trained on an ergodic subset of the training data and an ergodic subset of the input features. This randomness presents diverseness among the trees helps to reduce the variance and enhances the overall performance of the model [13]. During prediction, all trees in the random forest severally make their own prediction, and the final anticipation is determined by aggregating the results of all the trees.
88
A. Nag et al.
For classification tasks, the class with the bulk vote is selected as the concluding anticipation, while for regression tasks; the mean of the foreseen values is taken. Random forests have several advantages. They are robust against overfitting and can handle high-dimensional datasets with a larger number of property. They are also capable of capturing complex nonlinear relationships between features and target variables. Additionally, random forests provide importance scores for each feature, which can be used for feature selection and understanding the relative importance of different variables [14]. Overall, random forests are a powerful and versatile algorithm widely used in various domains. They offer improved accuracy, robustness, and interpretability, making them a valuable tool in machine learning.
4.3 Stacking Stacking in machine learning is a method that combines predictions from various models through a meta-model. It improves quality and robustness by leveraging the capability of different models. Basic models are trained on subsets of data, and a meta-model is disciplined to make enunciation based on the outputs of the base models. Stacking enhances model performance by capturing higher level insights and enabling ensemble learning. Careful model selection and consideration of overfitting are important in stacking [15].
4.4 Logistic Regression Logistic regression is yet another machine learning algorithm that finds its use in classification tasks. Being a supervised learning method, it predicts the probability that an event will occur in the future based on its occurrence in the training data. In logistic regression, the algorithm models the relationship between the features and the probability of an event to happen using the function, also called the logistic function. The main aim of the logistic function is to transform the event between 0 and 1, representing the probability. This allows us to classify instances into one of the two classes based on a threshold value [16, 17]. The algorithm estimates the parameters of the logistic function by optimizing a cost function, typically using techniques like maximum likelihood estimation or gradient descent. The aim of the classifier is to find the appropriate parameters that will best fit the training data and minimize the errors. Logistic regression can handle both numerical and categorical features, making it versatile for a wide range of datasets. It’s relatively fast to train and can make predictions quickly, especially when the number of features is not excessively large.
Distinguishing the Symptoms of Depression and Associated Symptoms …
89
4.5 K—Nearest Neighbor The KNN classifier is a powerful algorithm that is used in machine learning. It can be used for both classification and regression problems. It is also called a nonparametric algorithm, since, unlike its counterparts, it doesn’t decide any parameter for prediction. It makes predictions based on the similarity of the k number of training instances for the new unseen data [18–20]. The “K” signifies the number of the closest instances from the training set to consider while predicting the new unseen instance. When a new data is given to the model for prediction, it first determines the similarity of the various data in the training set to that of the new data. It is mostly based on distance-based methods, mostly Euclidean distance. After that, the top k most similar data are chosen for the prediction of the new data. For the classification problems, the output label that has most occurrences in the k-selected datasets is assigned as the output label of the new data. However, in regression problems, a weighted average is calculated for all the k number of selected datasets and this is assigned as the output label of the new data. KNN is a lazy learning algorithm, meaning it doesn’t involve a separate training phase. Instead, it stores the entire training dataset and performs computations at the time of prediction. This makes it efficient for small to moderate-sized datasets but can become computationally expensive for large datasets [21–23].
4.6 Support Vector Machine An SVM is a popular machine learning classifier that categorizes the data in the best possible way. It is a representation of examples as points in space that are mapped so that the points of different categories are separated by a gap as wide as possible [24, 25]. The confusion matrixes of some of the used methods are shown above in Figs. 5, 6, 7, and 8. Fig. 5 Bagging
90
A. Nag et al.
Fig. 6 Boosting
Fig. 7 Xgboost
Fig. 8 SVM
5 Results The study to detect the mental state of a patient involved several steps which were needed to be in proper order to be executed. The last and probably the most important step was to find which type of algorithm was best suited to our need. The study mainly used nine algorithms. We used the feature selection method to identify the eight most important and correlate attributes from the 27 attributes identified from the text. Classifiers were made to act on the eight attributes and ROC curves were generated in order to predict their accuracy. The accuracy of any classifier depended
Distinguishing the Symptoms of Depression and Associated Symptoms … Table 1 Accuracy measure
Methods
Accuracy (i n %)
Bagging
78.58
Logistic regression
79.63
k -nearest neighbor
80.42
Decision tree
80.69
Random forest
81.22
Boosting
81.75
Stacking
82.01
Xgboost
82.54
SVM
83
91
on how well they were able to perform on the test set. Upon completion, we found the accuracy of the five classifiers as in Table 1. From the accuracy table, it was clear that SVM performed better than any other algorithm on our test data, followed by Xgboost and stacking. Bagging was the worst to perform with an accuracy of just 78.58%. Since all of the classifiers had an AUC between 0.8 and 0.9, we can conclude that these classifier models were best fitted for our dataset. We also used the ROC area (receiver operating curve area). In the ROC, a perfect algorithm that fits perfectly will give an area under the curve of 1, whereas the least fitted curve will give an AUC of 0.5. The Accuracy measures of different classifiers are shown here in Fig. 9. Since all of the classifiers had an AUC between 0.8 and 0.9, we can conclude that these classifier models were best fitted for our dataset. The figures have been also provided for reference. Here SVM performs best with positive class at the cost of high FNR (False negative Rate). The result is compared with Konda Vaishnavi
Fig. 9 Accuracy of different classifiers
92
A. Nag et al.
et al. [10] predicting mental health illness using machine learning algorithms, and the accuracy, AUC measures are found better.
6 Conclusion With the evolution of machine learning and deep learning models, we have got access to several algorithms that could predict the mental state of a patient and diagnose him in proper manner. This becomes a fact of utter importance since the middle class people often forget about their mental health in between their daily lives. More effective algorithms and the usage of neural network with deep learning can provide a much better accuracy on the dataset. One of the ways being implementing DCNN with DNN, these algorithms can be worked upon in future. Also, another approach to get a better accuracy can be expanding our database or having a dataset that focuses on a certain age group, rather than on all mixed up. Also, preprocessing plays a major role in accuracy analysis. Better processing of the raw dataset and extracting attributes or generating new attributes with the existing ones can alter the performance of the models on the dataset. All of the algorithms used had an accuracy of greater than 79% which justifies that these models can be used for preliminary testing procedures.
References 1. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260 2. Gao S, Calhoun VD, Sui J (2018) Machine learning in major depres- sion: from classification to treatment outcome prediction. CNS Neurosci Ther 24(11):1037–1052 3. Priya A, Garg S, Tigga NP (2020) Predicting anxiety, depression and stress in modern life using machine learning algorithms. Proced Comput Sci 167:1258–1267 4. T Nagar. Prediction of mental health problems among children using machine learning techniques 5. https://github.com/Het21/Employee-Mental-Health-Treatment-Prediction/blob/main/Mental_ Health_Tech_Survey.csv 6. Shatte ABR, Hutchinson DM, Teague SJ (2019) Machine learn- ing in mental health: a scoping review of methods and applications. Psychol Med 49(9):1426–1448 7. Srividya M, Mohanavalli S, Bhalaji N (2018) Behavioral modeling for mental health using machine learning algorithms. J Med Syst 42(5):1–12 8. Chancellor S, Baumer EPS, Choudhury MD (2019) Who is the” human” in human-centered machine learning: the case of predicting mental health from social media. Proceed ACM Human Comput Interact 3(CSCW):1–32 9. Graham S, Depp C, Lee EE, Nebeker C, Tu X, Kim H-C, Jeste DV (2019) Artificial intelligence for mental health and mental illnesses: an overview. Curr Psychiatr Rep 21(11):1–18 10. Iliou T, Konstantopoulou G, Ntekouli M, Peropoulou CL, Assimakopoulos K, Galiatsatos D, Anastassopoulos G (2019) Iliou machine learning preprocessing method for depression type prediction. Evolv Syst 10(1):29–39
Distinguishing the Symptoms of Depression and Associated Symptoms …
93
11. Vaishnavi K et al (2022) Predicting mental health illness using machine learn- ing algorithms. J Phys Conf Ser 2161(1). IOP Publishing 12. Kumar R, Anil M, Parihar DS, Garhpale A, Panda S, Panda B (2022) A cross-sectional assessment of gwalior residents’ reports of adverse reactions to the COVID-19 immunization. Int J Sci Technol 10:2386–2392 13. Myles AJ et al (2004) An introduction to decision tree modeling. J Chemom J Chemom Soc 18.6:275–285 14. Song Y-Y, Ying LU (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatr 27.2:130 15. Cacheda F et al (2019) Early detection of depression: social network analysis and random forest techniques. J Med Internet Res 21.6:e12554 16. Xiao M et al (2020) Risk prediction for postpartum depression based on ran- dom forest. Zhong nan da xue xue bao. Yi xue ban= J Central South Univ Med Sci 45.10:1215–1222 17. Pramanik A, Bijoy MHI, Rahman MS (2022) Depression- level prediction during COVID-19 pandemic among the people of Bangladesh using ensemble technique: MIRF stacking and MIRF voting. In Proceedings of international conference on fourth industrial revolution and beyond 2021. Springer Nature Singapore, Singapore 18. Vilaseca R, Ferrer F, Olmos JG (2014) Gender differences in pos- itive perceptions, anxiety, and depression among mothers and fathers of children with intellectual disabilities: a logistic regression analysis. Quality Quantity 48:2241–2253 19. Kumar R, Anil M, Panda S, Panda B, Nanda L, Jena C (2022) A psychological study on accepting and rejecting Covid-19 vaccine by college students in India. In 2022 IEEE international conference on distributed computing and electrical circuits and electronics (ICDCECE), pp 1–4. IEEE 20. Jiang H et al (2018) Detecting depression using an ensemble logistic regres- sion model based on multiple speech features. Computational and mathematical methods in medicine 2018 21. Islam MR et al (2018) Detecting depression using k-nearest neighbors (knn) classification technique. In 2018 International conference on computer, com- munication, chemical, material and electronic engineering (IC4ME2). IEEE 22. Tirtopangarsa AP, Maharani W (2021) Sentiment analysis of depression detection on twitter social media users using the K-nearest neighbor method. Seminar Nasional Informatika (SEMNASIF). vol 1, no. 1 23. Yang Le et al (2017) DCNN and DNN based multi-modal depression recognition. In 2017 Seventh international conference on affective computing and intelligent interaction (ACII). IEEE 24. Panda S, Dhaka RK, Panda B, Pradhan A, Jena C, Nanda L (2022) A review on application of machine learning in solar energy & photovoltaic generation prediction. In 2022 International conference on electronics and renewable systems (ICEARS), Tuticorin, India, 2022, pp 1180– 1184. https://doi.org/10.1109/ICEARS53579.2022.9752404 25. Wani S, Yadav D, Verma O (2020) Development of disaster management and awareness system using twitter analysis: a case study of 2018 Kerala floods. https://doi.org/10.1007/978-981-150751-9_107
Harnessing the Power of Machine Learning Algorithms for Landslide Susceptibility Prediction Shivam Krishana, Monika Khandelwal, Ranjeet Kumar Rout, and Saiyed Umer
Abstract In landslide-prone mountainous regions, accurate susceptibility prediction is crucial to mitigate dangers. Classical and probabilistic approaches have limited prediction capability. Hence, computational methods, specifically machine learning, are being employed to enhance accuracy. Our study aims to predict landslide susceptibility in the Kashmir Himalayas, specifically Muzaffarabad and the Azad Kashmir region. Initially, we established eleven distinctive features for prediction. We trained and tested seven machine learning models, comparing susceptibility predictions in two classes: susceptible and not susceptible. Evaluating classification performance, we achieved test accuracies ranging from 69.30 to 78.71%. The K-Nearest Neighbors (KNN) algorithm outperformed other models, yielding superior accuracy with an optimal k-value for the chosen dataset. Keywords Susceptibility prediction · Grid search · Extreme gradient boosting
1 Introduction The term “natural hazard” refers to a condition and/or process in nature that gives rise to economic losses or loss of human life. Among all natural hazards, landslides rank as one of the most destructive [1]. An erroneous assumption about landslides is that they are simply slides of land [2]. In other words, landslides are not limited S. Krishana · M. Khandelwal · R. K. Rout (B) Department of CSE, NIT Srinagar, Hazratbal, Srinagar, J&K, India e-mail: [email protected] S. Krishana e-mail: [email protected] M. Khandelwal e-mail: [email protected] S. Umer Department of CSE, Aliah University, Kolkata, WB, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_9
95
96
S. Krishana et al.
to the land or sliding as we know it today. Glossary of terms [3] defines landslide as a “general term for mass-movement of rock material down a slope (includes rock falls, landslips, mudflows, etc.)”. This phenomenon could be caused by a variety of external factors, including (but not limited to) intense rainfall, earthquake shaking, and changes in water levels. Further, as population and urbanization pressure develop in unstable hillslope areas, deforestation and hillslope excavation for developing infrastructures have become primary triggers for the sliding events. A large number of casualties and huge economic losses have been caused by landslides in mountainous areas across the globe [4]. Studies have shown that damages caused by landslides outweigh damages caused by earthquakes, volcanic eruptions, and floods. One of the most severe instances of landslides in Vietnam took place on October 18, 2020, in Huong Phung, Huong Hoa, Quang Tri. It caused the greatest number of injuries as a result of rainfall during the 2020 typhoon season [5]. As a result of heavy downpours on August 6, 2020, another devastating landslide took place at 22:45 hrs. in Kerala. With 66 victims, including four missing, this was the worst landslide event in the state. In Atami, Shizuoka Prefecture, Japan, torrential rainfall caused a landslide that swept the Izusan neighborhood. City residents were warned of life-threatening conditions due to 310 mm (12.4 in.) of rain in 48 h. The landslide claimed the lives of 19 people and left nine people missing. It is also worth noting that one of the most recent catastrophic landslides occurred in Petrópolis, Brazil, on February 15, 2022. Within three hours, the city received an unusually high rain, totaling 258 mm (10.2 in.). It was reported that at least 231 people died with 5 people still unaccounted for. Since landslide phenomena encompass an extraordinary breadth of occurrences, no single method is available for identifying and mapping landslide vulnerability. A wide range of research and extensive studies have been conducted on how deep learning and machine learning can be used to predict disasters. However, only a few studies have examined the possibility of landslide susceptibility prediction. The reason is the limited availability of geological data and the difficulty in organizing the data into a meaningful dataset for training the models. Nguyen et al. 2019 [6] presented novel hybrid machine learning models for landslide spatial prediction. Models included an adaptive neuro-fuzzy inference system (PSOANFIS), artificial neural networks optimized by particle swarm optimization (PSOANN), and best first decision tree-based rotation forest (RFBFDT). The proposed models were evaluated using receiver operating characteristic (ROC) curves, mean square errors (MSEs), and root mean square errors (RMSEs). Results indicated that the RFBFDT is the most effective method compared to other hybrid architectures for landslide prediction. Kuradusenge et al. 2020 [7] proposed two prediction modeling approaches, namely random forest (RF) and logistic regression (LR). This method utilized rainfall datasets along with various other internal and external parameters to predict the landslides, improving the prediction accuracy. Moreover, the accuracy was further enhanced using antecedent cumulative rainfall statistics. In the approach proposed by Aslam et al. 2021 [4], machine learning (ML) classification techniques, such as support vector machines (SVM), LR models, and RF classifiers, were combined
Harnessing the Power of Machine Learning Algorithms …
97
with a convolutional neural network (CNN) to generate a hybrid classifier. The study proved that in comparable geo-environmental regions with similar geophysical conditions, hybrid models could be applied constructively for landslide susceptibility predictions. Other artificial intelligence-based approaches have been proposed to predict landslide susceptibility [8–18]. Our study examines the application of commonly known machine learning methods (SVM, LR, KNN, decision tree (DT), RF, naive Bayes (NB), and extreme gradient boosting (XGB)) for landslide susceptibility prediction. The problem is approached as a binary classification problem. The second section of the paper describes research methods as well as the chosen region of interest for our study. In the third section, experimental results are analyzed. Finally, the paper concludes with an analysis of current challenges and future opportunities associated with the study.
2 Methodology 2.1 Geographical Area This study focuses on the causes of landslides in mountainous regions of Pakistan and predictions of their occurrence. Muzaffarabad, which is a part of the lesser Himalayas (Fig. 1), is the subject of the present study. As far as the geological systems go (Fig. 2), the Kashmir Himalaya is the youngest and most dynamic [19]. There is a high probability of landslides triggering in the rainy season, particularly during the monsoon period (July–September). The study area selection is based on Muzaffarabad’s vulnerability as a hilly area to landslide hazards and the consequent risk to lives and property. The majority of people living in Muzaffarabad live on the slopes of fragile mountains, and therefore, they are highly susceptible to the risk of
Fig. 1 Location of Muzaffarabad, Pakistan, on map [20]
98
S. Krishana et al.
Fig. 2 Geology of Muzaffarabad and surroundings [19]
landslides. Around 08:50 in the morning on October 8, 2005, an earthquake of magnitude 7.6 struck the southwestern slopes of the Himalayas. Muzaffarabad and other regions of Pakistan-administered Kashmir were hit the hardest [4]. In the immediate aftermath of the main earthquake, many landslides occurred and continue to pose a threat to public health. To improve accuracy in predicting landslide susceptibility, data collected in the landslide-prone region of Muzaffarabad has been analyzed and used for training machine learning models. Therefore, Muzaffarabad is a perfectly justified case for our research.
2.2 Dataset and Preprocessing The dataset for this research has been taken from Kaggle [21]. Kaggle is a subsidiary of Google that enables users to find, publish, explore, and train models on large datasets. Landslide susceptibility prediction in Muzaffarabad, Pakistan, is the context of the chosen dataset. It contains 1212 rows and 13 columns. The column containing class labels was separated from the remaining 12 columns (Aspect, Curvature, Earthquake, Elevation, Flow, Lithology, NDVI, NDWI, Plan, Precipitation, Slope, and Profile) that serve as features in the model training process. Figure 3 illustrates the interrelation among these features, where a lower correlation is preferable for optimal results. Furthermore, a data split ratio of 2:1 was employed, with 66.67% of the total data being utilized for training the model and the remaining 33.33% being allocated for testing
Harnessing the Power of Machine Learning Algorithms …
99
Fig. 3 Heatmap showing a correlation in dataset
purposes. Furthermore, the dataset was checked for null values and duplicates, but no abnormalities were found.
2.3 Model Description In our study, the landslide susceptibility prediction was based on seven different machine learning approaches: 1. SVM: As a supervised learning model, SVM analyzes data for classification and regression analysis using learning algorithms [22, 23]. The objective of this algorithm is to identify a hyperplane that effectively separates the data points within an . N -dimensional space. SVM was used as one of the machine learning models and tuned using Grid Search. 2. LR: LR can be used to model binary response variables, which have a value of either 0 or 1. LR provides a useful means for modeling the effect of explanatory variables on a binary response variable [24, 25]. 3. KNN: The KNN classifier categorizes unlabeled observations by assigning them to the class of the most similar labeled examples. It relies heavily on the parameter .k [26]. K-Nearest Neighbors (KNN) classifies by assigning a sample . x to the majority class among its .k nearest neighbors in the feature space. The Euclidean distance .d(x, xi ) measures similarity:
100
S. Krishana et al.
[ |∑ | n (x j − xi j )2 .d(x, x i ) = |
(1)
j=1
where .xi are the .k neighbors’ feature vectors. Optimal .k is chosen via crossvalidation for best accuracy. For the final decision, let .Ci be the class of the .i-th neighbor. The majority class among .Ci determines the predicted class for .x:
.
yˆ = arg max cj
k ∑
I (Ci = c j )
(2)
i=1
where . yˆ is the predicted class and . I is the indicator function. 4. DT: DT are algorithms that predict a particular target variable based on other variables. When the tree is not too large, using a tree structure for decisionmaking is understandable and transparent. The basic theory is that future data can be predicted from past data, assuming the underlying system is stable enough to allow accurate predictions [27, 28]. 5. RF: A Random Forest classifier comprises an extensive assembly of treestructured classifiers, wherein each individual tree contributes a unit vote toward the most prevalent class [29, 30]. 6. NB: NB is a statistical classifier based on Bayes’ theorem. It assumes class conditional independence, meaning an attribute value’s effect on a class is independent of the value of other attributes. The classifier is considered naive as it simplifies the computation [31]. 7. XGB: XGB is a decision tree implementation technique based on gradient boosting approaches. Weights play a crucial role in this approach. All independent variables are assigned weights and later fed to the decision tree for result prediction. Individual classifiers or predictors are combined to form a stronger model with greater precision [32].
2.4 Evaluation Metrics A quantitative evaluation can be conducted using a variety of metrics. For discriminating and obtaining an optimal classifier, selecting an appropriate evaluation metric is crucial [33]. Based on the following metrics, our models have been evaluated [34, 35]: 1. Accuracy: A model’s accuracy can be mathematically defined as the proportion of correct predictions to its total predictions: .
Accuracy =
TP+TN TP+TN+FP+FN
(3)
Harnessing the Power of Machine Learning Algorithms …
101
2. Precision: The meticulous scrutiny of a model’s precision revolves around its affirmative predictions. Precision is elegantly quantified by the division of true positives by the cumulative count of positive predictions, which encompasses both true positives and false positives: .
Pr ecision =
TP TP+FP
(4)
3. Recall: Recall captures a model’s prowess in identifying positive samples. Mathematically, it is defined as the quotient of true positives divided by the sum of true positives and false negatives: .
Recall =
TP TP+FN
(5)
4. F1-Score: F1-score represents the weighted average of precision and recall. It represents the harmonic mean of precision and recall: .
F1 − Scor e = 2 ×
Pr ecision × Recall Pr ecision + Recall
(6)
3 Experimental Results and Analysis This section provides a representation of classification results in both numerical and tabulated forms. It was decided to split the chosen dataset into two parts for training and testing with a 2:1 train-to-test ratio. For SVM, we achieved test accuracies of 77.97% and 78.46% without and with tuning, respectively. Grid Search was used for tuning, and optimal parameters were obtained (C: 1; gamma: 0.1; kernel: RBF). With the use of LR, the test accuracy of 72.02% was achieved. In KNN classification, varied k-values were assessed (Table 1). The optimal k was determined by analyzing error rates and consulting the k-value curve (Fig. 4a), guaranteeing precise parameter selection to enhance classification accuracy. An accuracy of 69.30% was achieved using the decision tree algorithm. A curve comparing accuracy versus n_estimator value (Fig. 4b) was used to select the optimal value of n_estimators for the RF algorithm. Compared to the decision tree algorithm, it resulted in a better accuracy of 77.72% (n_estimators = 139). Additionally, the use of NB (statistical model) and XGB resulted in test accuracies of 75.24% and 76.23%, respectively (Table 2).
Table 1 K-values and corresponding accuracy for KNN classifier 1 5 10 11 k-values Accuracy
69.55%
75.00%
78.46%
78.71%
15 76.23%
102
S. Krishana et al.
(a) Error Rate vs. K-value for KNN Classifier
(b) Test Accuracy vs. n-estimator value for RF Classifier
Fig. 4 Analysis of classification performance for different classifiers by variation of parameters Table 2 Overall analysis of all seven models Precision Recall Model SVM LR KNN DT RF NB XGB
0.78 0.72 0.79 0.70 0.78 0.75 0.76
0.78 0.72 0.79 0.70 0.78 0.75 0.76
F1-score
Accuracy (%)
0.78 0.72 0.79 0.70 0.78 0.75 0.76
78.46 72.02 78.71 69.30 77.72 75.24 76.23
4 Conclusion In our research, we focused on evaluating the performance of different machine learning techniques for the prediction of landslide susceptibility. As a component of our assessment, we utilized a range of metrics, including accuracy, precision, recall, and F1-score. The problem was approached from a classification point of view. We tried to tune the parameters wherever possible to increase the performance of our models. After analyzing experimental results, the class imbalance was the primary reason behind a significant difference in the F1-scores of the two classes for all the models. Moreover, among all the machine learning models trained and tested as part of our research, the highest accuracy was found to be 78.71%. In real-time catastrophes, however, an accuracy close to 80 percent may not be enough, so we believe improvements can be made in several aspects. Our future work will focus on (1) Expanding the dataset to include landslide-prone areas around the world; (2) Development of hybrid datasets to include natural disasters related to landslides; (3) Adding more classes to the dataset to provide a more precise prediction; (4) Implementing contemporary data preprocessing techniques; (5) Testing deep learning algorithms on the
Harnessing the Power of Machine Learning Algorithms …
103
given dataset with the aim of significant improvement in test accuracy and because good predictions can save lives.
References 1. Tariq S, Gomes C (2017) Landslide environment in Pakistan after the earthquake-2005: information revisited to develop safety guidelines for minimizing future impacts. J Geogr Nat Disasters 7(3). https://doi.org/10.4172/2167-0587.1000206 2. U.S. Geological Survey, What is a landslide and what causes one? https://www.usgs.gov/faqs/ what-landslide-and-what-causes-one 3. Geological Society, Geological Society—Glossary of Terms. https://www.geolsoc.org.uk/ks3/ gsl/education/resources/rockcycle/page3451.html. Accessed 21 Oct 2022 4. Aslam B, Zafar A, Khalil U (2021) Development of integrated deep learning and machine learning algorithm for the assessment of landslide hazard potential. Soft Comput 25(21):13493– 13512. https://doi.org/10.1007/s00500-021-06105-5 5. van Tien P et al (2021) Rainfall-induced catastrophic landslide in Quang Tri Province: the deadliest single landslide event in Vietnam in 2020. Landslides. https://doi.org/10.1007/s10346021-01664-y 6. Nguyen VV et al (2019) Hybrid machine learning approaches for landslide susceptibility modeling. Forests 10(2). https://doi.org/10.3390/f10020157 7. Kuradusenge M, Kumaran S, Zennaro M (2020) Rainfall-induced landslide prediction using machine learning models: the case of Ngororero district, Rwanda. Int J Environ Res Public Health 17(11):1–20. https://doi.org/10.3390/ijerph17114147 8. Hussain MA et al (2022) Landslide susceptibility mapping using machine learning algorithm. Civil Eng J(Iran) 8(2):209–224. https://doi.org/10.28991/CEJ-2022-08-02-02 9. Huang F et al (2020) Landslide susceptibility prediction considering regional soil erosion based on machine-learning models. ISPRS Int J Geoinf 9(6). https://doi.org/10.3390/ijgi9060377 10. Madawala CN, Kumara BTGS, Indrathilaka L (2019) Novel machine learning ensemble approach for landslide prediction. In: Proceedings of the IEEE international research conference on smart computing and systems engineering, SCSE 2019, pp 78–84. https://doi.org/ 10.23919/SCSE.2019.8842762 11. Zhu L et al (2020) Landslide susceptibility prediction modeling based on remote sensing and a novel deep learning algorithm of a cascade-parallel recurrent neural network. Sensors 20(6):1576. https://doi.org/10.3390/S20061576 12. Chen W et al (2017) A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China 8(2):1955-1977. http://www.tandfonline.com/ action/journalInformation?show=aimsScope&journalCode=tgnh20#.VsXodSCLRhE, https:// doi.org/10.1080/19475705.2017.1401560 13. Rout RK, Hassan SS, Sheikh S, Umer S, Sahoo KS, Gandomi AH (2022) Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences. Comput Biol Med 141:105024 14. Korup O, Stolle A (2014) Landslide prediction from machine learning. Geol Today 30(1):26– 33. https://doi.org/10.1111/GTO.12034 15. Saha A, Saha S (2022) Integrating the artificial intelligence and hybrid machine learning algorithms for improving the accuracy of spatial prediction of landslide hazards in Kurseong Himalayan Region. Artif Intell Geosci 3:14–27. https://doi.org/10.1016/J.AIIG.2022.06.002 16. Al-Najjar HAH, Pradhan B, Beydoun G, Sarkar R, Park HJ, Alamri A (2022) A novel method using explainable artificial intelligence (XAI)-based Shapley Additive Explanations for spatial landslide prediction using Time-Series SAR dataset. Gondwana Res. https://doi.org/10.1016/ J.GR.2022.08.004
104
S. Krishana et al.
17. Umer S, Mondal R, Pandey HM, Rout RK (2021) Deep features based convolutional neural network model for text and non-text region segmentation from document images. Appl Soft Comput 113:107917 18. van Phong T et al (2019) Landslide susceptibility modeling using different artificial intelligence methods: a case study at Muong Lay district, Vietnam 36(15):1685–1708. https://doi.org/10. 1080/10106049.2019.1665715 19. Rahman A, Khan AN, Collins AE (2014) Analysis of landslide causes and associated damages in the Kashmir Himalayas of Pakistan. Nat Hazards 71(1):803–821. https://doi.org/10.1007/ s11069-013-0918-1 20. Ahmad MN et al (2022) Landslide hazard, susceptibility and risk assessment (HSRA) based on remote sensing and GIS data models: a case study of Muzaffarabad Pakistan. Stoch Environ Res Risk Assess. https://doi.org/10.1007/s00477-022-02245-8 21. Landslide Prediction for Muzaffarabad-Pakistan|Kaggle. https://www.kaggle.com/datasets/ adizafar/landslide-prediction-for-muzaffarabadpakistan. Accessed 20 Oct 2022 22. Steinwart I, Christmann A (2022) Support vector machines. Springer, New York. www.springer. com/series/3816. Accessed 21 Oct 2022 23. Khandelwal M, Rout RK, Umer S (2022) Protein-protein interaction prediction from primary sequences using supervised machine learning algorithm. In: 2022 12th International conference on cloud computing, data science & engineering (Confluence). IEEE, pp 268–272 24. Bewick V, Cheek L, Ball J (2005) Statistics review 14: logistic regression. Crit Care 9(1):112. https://doi.org/10.1186/CC3045 25. Umer S, Mohanta PP, Rout RK, Pandey H (2021) Machine learning method for cosmetic product recognition: a visual searching approach. Multimedia Tools Appl 80(28):34997–35023 26. Zhang Z (2016) Introduction to machine learning: k-nearest neighbors. Ann Transl Med 4(11). https://doi.org/10.21037/ATM.2016.03.37 27. Biehler R, Fleischer Y (2021) Introducing students to machine learning with decision trees using CODAP and Jupyter Notebooks. Teach Stat 43(S1):S133–S142. https://doi.org/10.1111/ TEST.12279. Jul 28. Khandelwal M, Shabbir N, Umer S (2022) Extraction of sequence-based features for prediction of methylation sites in protein sequences. In: Artificial intelligence technologies for computational biology. CRC Press, pp 29–46 29. Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 7473 LNCS, pp 246–252. https://doi.org/10.1007/978-3-64234062-8_32/COVER 30. Khandelwal M, Rout RK, Umer S, Mallik S, Li A (2022) Multifactorial feature extraction and site prognosis model for protein methylation data. Brief Funct Genomics 31. Leung KM (2007) Naive Bayesian classifier 32. Chen T, He T: xgboost: eXtreme gradient boosting 33. Hossin M, Sulaiman MN (2020) A review on evaluation metrics for data classification evaluations. IJDKP ) Int J Data Mining Knowl Manage Process (IJDKP) 5(2). https://doi.org/10. 5121/ijdkp.2015.5201 34. Dalianis H (2018) Evaluation metrics and evaluation. https://doi.org/10.1007/978-3-31978503-5_6 35. Kumar S, Yadav D, Gupta H, Verma OP, Ansari IA, Ahn CW (2021) A novel YOLOv3 algorithm-based deep learning approach for waste segregation: towards smart waste management. Electronics 10(14). https://doi.org/10.3390/electronics10010014
The Effectiveness of GPT-4 as Financial News Annotator Versus Human Annotator in Improving the Accuracy and Performance of Sentiment Analysis Satyajeet Azad
Abstract Artificial intelligence (AI) has revolutionized various industries and has become a crucial component in the development of intelligent systems. One of the main challenges in the development of AI systems is data annotation, which involves labeling unlabelled data sets to train AI algorithms and Systems. The traditional manual data annotation method is time-consuming and requires skilled human annotators, which makes it expensive. Therefore, there is a need for effective and efficient data annotation tools. Recently, Generative Pre-Trained Transformer (GPT) models have shown remarkable success in various natural language processing tasks. GPT-4 is the latest version of these models and is expected to be a game-changer in data annotation tasks. This research aims to evaluate the effectiveness of GPT-4 as a data annotation tool versus traditional methods in improving the accuracy and performance of AI systems. The study will utilize a comparative experimental design in which both GPT-4 and traditional methods will be applied to the same data set, and the accuracy and performance of the AI system will be measured. In this paper, I evaluate how high-quality data annotation can help machine learning models make more accurate and reliable Sentiment Predictions. The findings of this research will provide insights into the feasibility and effectiveness of GPT-4 as a data annotation tool for the implementation of AI systems. If GPT-4 is proven to be more effective, it will revolutionize the data annotation industry. It will reduce the time and cost required and will open doors to more comprehensive data annotation tasks. Keywords Data annotation · GPT-4 · Text classification
S. Azad (B) AI Consultant, Excelinnova Consultancy Services Pvt. Ltd., New Delhi, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_10
105
106
S. Azad
1 Introduction The development and implementation of AI systems require a massive amount of data, which is annotated to improve the performance of these systems. Data annotation is a process that involves labeling as well as classifying corpus to enable AI systems to learn and make predictions accurately. It is a crucial aspect of machine learning and AI that can impact the accuracy, efficiency, and reliability of the AI systems. In recent years, there have been several advancements in data annotation techniques, including the development of GPT-4, a neural-based large language models. The main purpose of this research is to investigate the effectiveness of GPT-4 and traditional data annotation methods in improving the accuracy and efficiency of data annotation in AI system design. This research aims to evaluate the effectiveness of GPT-4 as a data annotation tool and compare it to traditional data annotation methods available. The research methodology involves conducting experiments using the two data annotation methods (GPT-4 and traditional methods) and analyzing their effectiveness. The experiments involve annotating datasets using both methods and comparing the accuracy and efficiency of the results through sentiment analyzer ecosystems. The experiment was carried out on a dataset with a predefined set of classes, and the annotations were done by human annotators and GPT-4. The findings of the study indicated that GPT-4 is potentially less effective at this stage than traditional data annotation methods in improving the accuracy and efficiency of data annotation in AI system design. The study shows that GPT-4 achieved lower accuracy in annotating data, and it was slower in achieving the final result. However, the study also found that traditional data annotation methods such as manual annotation by human annotators still offer some advantages. Human annotators provided more nuanced and detailed annotations, which GPT-4 was not able to match. Manual annotation also allows more diverse datasets to be created, which is important in more complex AI systems. The research study also identified some limitations of GPT-4 in data annotation, which need to be addressed for it to form a more effective data annotation tool. For instance, GPT-4’s language capability is limited compared to human language. Also, GPT-4 has an inbuilt problem of generating biased data if not trained correctly. In conclusion, the study indicates that GPT-4 is a potentially less effective data annotation tool for improving the accuracy and efficiency of Financial News Sentiment Analysis. Still, traditional data annotation methods such as manual annotation provide more detailed and nuanced annotations. Both methods have their advantages and limitations and can be used resourcefully depending on the project requirements. Therefore, AI system designers need to consider using multiple data annotation methods, including GPT-4, in their projects for optimal results.
The Effectiveness of GPT-4 as Financial News Annotator Versus Human …
107
2 Literature Review The growing importance of Artificial Intelligence (AI) systems and Large Language Models (LLMs) has led to the increasing demand for accurate and reliable data annotation tools. Data annotation is the process of making a large volume of data usable and accessible for training machines by labeling or categorizing data points with information that the AI system needs to learn and operate effectively (Table 1). Traditional methods of data annotation such as manual labeling, crowd-sourcing, and rule-based approaches have limitations. In recent years, new data annotation tools such as GPT-4 have been developed that claim to enhance the accuracy and performance of AI systems. Several studies have focused on evaluating the effectiveness of GPT-4 as a data annotation tool versus traditional annotation methods. Some research by El-Beltagy and Rafea [10] compared the performance of GPT-4 with other data annotation tools, and the results showed that GPT-4 outperformed conventional tools in terms of accuracy, consistency, and scalability. Another study by Huang et al. [3] evaluated the impact of GPT-4 on the accuracy of computer vision and image classification systems and found that GPT-4 generated high-quality annotations resulting in increased accuracy of the models. However, some researchers argue that GPT-4 still has limitations, such as the inability to handle complex data sample types and the lack of transparency and explainability in its decision-making process. Research by Ding et al. [1] suggested that GPT-4 may not be suitable for all data annotation tasks and requires a hybrid approach that combines both traditional and AI-based methods that may lead to better results.
3 Methodology See Fig. 1. I have studied the data annotation approach using GPT-4 and this approach involves the formulation of prompts to instruct GPT-4 in annotating unlabeled data. In Financial news labeling, GPT-4 is producing different labels as compared to human labels. The resulting GPT-4 annotated data is used to train machine learning models to compare the accuracy and performance of the models [1].
3.1 Corpus Corpus in text classification is an essential concept in natural language processing (NLP) that refers to a collection of raw facts and figures or body of text documents. It is used as a training data source, testing, and evaluating machine learning models for text classification and generation tasks. A corpus can be created from various
108
S. Azad
Table 1 Systematic literature review Literature
Data used
Method
Extracted features Results from textual data
Nayak et al. [1]
Twitter, Yahoo Finance, News
Boosted decision Positive and Accuracy for bank tree, Logistic negative sentiment sector–0.548, for regression, SVM, mining Sentiment analysis sector–0.76, for oil sector–0.769
Nemes and Kiss [2]
Economic news,
TextBlob, NLTK-VADER
Positive, negative, Concluded that neutral the sentiments
Vijh et al. [3]
Yahoo Finance
Artificial neural network, Random forest
No sentiment analysis carried out
Lowest RMSE–0.42 Highest RMSE–3.40
Martin [4]
Twitter, CAC40 French stock data
Neural network
Aggregate sentiment score
Accuracy score–80%
Zhang et al. [5]
Financial news data, Shanghai composite stocks data, and Xueqiu data
Artificial neural networks
Positive polarity and negative polarity
Accuracy score–60%
Zhang et al. [5]
Hang Seng300 index and news and posts from Sina Weibo
Sentiment dictionary and double-layer recurrent NN
Keywords of two MAE–0.625 types–positive and MAPE–9.381 negative RMSE–0.803
Shastri et al. [6]
Daily news headlines and daily Apple’s stocks data
Multi-level perceptron artificial neural network (ANN)
Sentiment score
MAPE–8.21 Accuracy score–98%
Kolasani and Assaf [7]
Tweets and historical data from Yahoo Finance
Support vector regression
Positive and negative sentiments
Accuracy score–83% Lowest RMSE–1.37
Khedr and Yaseen [8]
Daily index data of 3 random NASDAQ companies and financial news
K nearest neighbors and Naive Bayes
Positivity, negativity, and equal sentiments
Accuracy score–90%
Li et al. [9]
Forum posts of investors and daily CSI300 stocks index data
Naive Bayes and LSTM
Positive, negative, Accuracy and neutral score–87.86% sentiments
The Effectiveness of GPT-4 as Financial News Annotator Versus Human …
109
Fig. 1 Methodology for labeling finance news data using GPT-4 and human as annotator
sources, including books, articles, websites, social media posts, or any other textbased content. The size and diversity of a corpus play a crucial role in the performance of text classification Tasks [11]. A larger corpus provides more data for the training of the model, which can lead to better model accuracy and generalization. Moreover, a diverse corpus helps models understand a wide range of contexts and linguistic variations, making them more adaptable and effective for real-world applications (Table 2). When building a corpus for text classification and sentiment analysis, several considerations should be taken into account. These include the size of the corpus, the diversity of documents, the representativeness of different labels or categories, and the quality of annotations. Additionally, pre-processing techniques such as tokenization, stop word removal, stemming, and lemmatization may be applied to clean and standardize the text data for analysis. Corpora play a crucial role in the training and evaluation of sentiment analysis models. Text classifiers learn patterns, relationships, and context from the corpus data, allowing them to make predictions on new, unseen text [12]. The performance of sentiment analysis models heavily relies on the quality and representativeness of the corpus, along with the selection and fine-tuning of the machine learning algorithms. Therefore, constructing an appropriate corpus is a critical step in NLP applications to achieve accurate and efficient sentiment analysis.
110
S. Azad
Table 2 Financial news corpus with human and GPT-4 annotation SNo
News headlines
Human annotation
GPT-annotation
1
A Purchase agreement for 7,200 gasoline with delivery at the Hamina terminal, Finland was signed with Neste Oil OYJ at the average Platts index for this september plus eight US dollar per month
Positive
Neutral
2
A Trilateral agreement on investment in the construction Positive of a technology park in St Petersberg was to have been signed in the course of the forum, Days of the Rusian Economy, that opened in Helsinki today
Neutral
3
Nyrstar has also agreed supply to Talvivaara up to 150,00 tons of sulphuric acid per annum for use in Talvivaara’s leaching process during the period of supply of the zinc in concentrate
Positive
Neutral
4
Within the framework of the partnership, Nokia Siemens Networks has signed an initial framework purchase agreement with sitronics subsidary, JSC Mikron, which is the largest manufacturer and exporter of microelectronic components in Russia
Positive
Neutral
3.2 Prompt Prompt design plays a crucial role in open-ended large language models, such as GPT-4, by providing a starting point for generating contextually relevant and coherent responses. This study explores the concept of prompt engineering, highlighting its significance in influencing the output of models like ChatGPT and GPT-4 When interacting with open-ended language models, users often begin a conversation with a prompt that outlines their intention or query [2]. An effectively designed prompt can help guide the Text-based AI models, enabling it to generate responses that align with the end user’s expectations. The prompt serves as the initial context for the model to process and generate outputs that are relevant, coherent, and linguistically appropriate (Fig. 2). Framing an optimal prompt involves a balancing act between providing explicit instructions and offering enough flexibility for the AI model to exercise its own creativity. Experimentation with prompts can often lead to significant variations in model response. For example, modifying the prompt’s wording and sentences, adding an example, or injecting additional context can result in divergent and completely different responses from GPT-4. Thus, prompt engineering offers an avenue for finetuning the model’s behavior [5]. The choice of prompt may influence various aspects of AI model output, such as style, content, Context, and even biases. A prompt explicitly expressing desired attributes, such as politeness or specificity, can steer GPT-4 to respond in a more courteous or precise manner. Through well-crafted prompts, users can encourage context-based narratives, continuous storytelling, or even ask the model to take the perspective of a specific character, producing tailored responses and
The Effectiveness of GPT-4 as Financial News Annotator Versus Human …
111
Fig. 2 GPT-4 prompt process
innovating the conversational experience. Prompt engineering is an iterative process that takes input from user feedback and human evaluation. Analyzing examples of model outputs based on different prompts helps researchers understand the model’s limitations and biases [4]. The integrated feedback loop can be used to refine and define prompts, develop new methods to optimize responses for specific practical applications and address concerns such as offensiveness, misinformation, or excessive wordiness. Moreover, integrating user feedback fosters the iterative refinement of the model’s default prompt and aids in mitigating potential ethical concerns.
3.3 GPT-4 GPT, which stands for Generative Pre-Trained Transformer, is a type of large language model that has revolutionized natural language processing (NLP). GPT is based on the transformer architecture, designed to generate a human-like response based on the given input. It has significantly advanced the capabilities of large language models and has become one of the most sought-after and powerful models in the field of NLP [3]. The transformer architecture introduced attention mechanisms, which allow the model to focus on relevant parts of the input when generating the response. This attention-based approach greatly improves the model’s ability to understand and generate coherent responses. By utilizing self-attention layers, transformers can capture dependencies between different words in the input sequence and create context-aware representations. GPT is pre-trained on large-scale text corpora and learns to predict the next word in a sentence based on the preceding context. The pre-training phase of GPT allows the model to develop a deep understanding of
112
S. Azad
Fig. 3 Inner workings of GPT-4
language patterns, context, grammar, and semantics. GPT is trained using unsupervised learning, meaning it does not require human-labeled data during pre-training (Fig. 3). Once pre-training is completed, GPT can be fine-tuned on specific tasks or domains using supervised learning. This fine-tuning stage is critical to adapting the model to perform specific tasks such as question-answering or neural machine translation. By adjusting the model’s parameters during fine-tuning, GPT can be customized to provide more accurate and contextually fitted responses. Its ability to generate coherent and contextually relevant responses has made it invaluable for chatbots, virtual assistants, content generators, and other applications that require humanlike text production [2]. However, GPT has its limitations. It sometime generates a response that is grammatically correct but factually incorrect, as the model lacks the ability to verify the accuracy of the information it provides. Bias is another issue that needs to be addressed, as GPT tends to replicate and amplify biases present in the training data. Despite these limitations, GPT has significantly advanced the field of NLP and has become a powerful tool for many applications. Ongoing research and development continue to improve and refine the capabilities of GPT, leading to even more impressive large language models in the future.
3.4 Manual Annotation Manual annotation is a crucial step in labeling text data for machine learning algorithms and the development of Sentiment Analyzer. It involves domain experts reviewing and annotating the data to identify and classify specific information based on predefined guidelines or criteria [2]. This process is necessary because machine learning algorithms depend on labeled data to learn patterns and make accurate
The Effectiveness of GPT-4 as Financial News Annotator Versus Human …
113
Fig. 4 Human annotation process
predictions or classifications. The manual annotation process begins with the selection of an appropriate corpus that aligns with the machine learning objective. These Corpus may vary on the domain, such as patient records, customer feedback, or news articles. The experts then carefully review the corpus and mark or label relevant information using predefined annotation guidelines. One common type of manual annotation is entity recognition, where specific entities or objects, such as names, organizations, dates, or locations, are identified and labeled. For example, in medical data, the manual annotator would mark and classify each disease mentioned in the data set. This helps the machine learning algorithm to identify and extract similar entities in the unseen corpus (Fig. 4). Another form of manual annotation is sentiment analysis, which involves determining the sentiment or opinion expressed in a text. Manual annotators or experts read and classify each text as positive, negative, or neutral based on the expressed emotions or opinions. This labeling process helps algorithms understand subjective information better, which is useful in applications like customer review analysis or social media sentiment monitoring. Manual annotation also plays a significant role in natural language processing tasks such as part-of-speech tagging or syntactic parsing. Annotators manually assign grammar labels or identify the grammatical structure of sentences, aiding the algorithms in understanding and analyzing the text’s syntax and grammar [8]. The manual annotation process requires subject experts and domain knowledge. Annotators not only need to be trained on the annotation guidelines but also have a deep understanding of the subject matter. They must constantly update their domain knowledge to handle evolving language patterns, sentiments, or entities. Despite advances in automated annotation techniques, manual annotation still holds immense significance. Human annotators can grasp subtle contextual cues or domain-specific information that automation-based algorithms may miss. They can leverage their judgment and expertise to resolve ambiguities or handle complex cases where the guidelines are insufficient or contradictory. However, manual annotation can be time-consuming and costly, especially for large corpus.
114
S. Azad
3.5 GPT Annotation GPT (Generative Pre-Trained Transformer) is a state-of-the-art large language model developed by OpenAI. It has gained immense popularity in various natural language processing (NLP) tasks due to its ability to understand and generate human-like text (Fig. 5). Labeling text data is a crucial step in training any machine learning algorithms. It involves the process of assigning predefined categories or tags to text data for the purpose of classification, new text generation, sentiment analysis, entity recognition, and other NLP tasks. Earlier, this process relied on human annotators manually assigning labels to the data, which can be time-consuming and expensive. However, with the advancement of large language models like GPT-4, automated annotation becomes a feasible alternative. GPT can be used as an annotator by fine-tuning its pre-trained model on a specific task and then using it to annotate unlabelled text data based on the learned patterns from the corpus. Fine-tuning involves training the model on a labeled dataset to adapt it to the specific task requirements. For example, if the task is sentiment analysis, the model can be fine-tuned on a dataset with labeled sentiments to learn the hidden patterns associated with different sentiments. Once the model is fine-tuned, it can be used to annotate unlabelled text data by predicting the labels based on the learned patterns [12]. This automated annotation process can be highly efficient and cost-effective compared to expert annotation. It reduces the dependency on human annotators and allows large amounts of data to be annotated quickly. Using GPT as an annotator offers several advantages. It can handle a wide range of NLP tasks due to its strong language understanding capability. It can annotate text for sentiment analysis, topic classification, intent recognition, entity recognition, and many more. GPT-4 can learn intricate patterns and associations from the labeled data during the fine-tuning process, making it a reliable annotator. It can capture subtle nuances and context-specific information that
Fig. 5 GPT-4 annotation process
The Effectiveness of GPT-4 as Financial News Annotator Versus Human …
115
may be overlooked by human annotators. Lastly, GPT-4 efficiency in processing and annotating large amounts of data enables rapid scalability, especially in industries where real-time or near-real-time analysis is essential. However, using GPT-4 as an annotator has limitations. Fine-tuning the model requires a significant amount of labeled data, which might not be readily available for every domain-specific task. The model’s performance heavily depends on the quality and representativeness of the labeled data used for fine-tuning. GPT-4 may exhibit biases present in the training data, potentially leading to biased annotations. In conclusion, GPT-4 may serve as a powerful annotator for labeling text data but in my study here as a finance news annotator as GPT-4 has underperformed in machine learning algorithms [6]. Its ability to learn patterns, versatility in handling various NLP tasks, and scalability make it an attractive option for automated annotation. The careful fine-tuning and consideration of potential biases are necessary to ensure high-quality annotations. With continuous advancements in large language models, the role of GPT-4 as an annotator will likely become even more prominent in the future.
3.6 Experiment This research paper investigates the potential of using high-quality annotated data to enhance the accuracy and reliability of sentiment predictions in machine learning models [13]. The study aims to provide insights into the feasibility and effectiveness of utilizing GPT-4 as a data annotation tool for the development of AI systems. However, building robust sentiment prediction models requires vast amounts of annotated corpus [2], which can be time-consuming and costly to obtain. To address this issue, my research explores the use of GPT-4, an advanced data annotation tool, to mitigate the limitations associated with the manual or expert annotation process. GPT4 is a powerful large language model that has demonstrated impressive capabilities in natural language understanding and generation [3]. This study employs a comprehensive experimental approach to assess the performance of sentiment prediction models trained on annotated corpus generated by GPT-4. The experiments reveal that models trained on GPT-4 annotated data underperformed competitively with those trained on manually annotated data, showcasing the less feasibility and effectiveness of employing GPT-4 as a data annotation tool for AI systems. Overall, this study highlights the value of utilizing high-quality annotated data in sentiment prediction tasks and provides valuable insights into the potential of GPT-4 as a data annotation tool. The results provide a foundation for further exploration and development of AI systems in sentiment analysis and other natural language processing tasks (Fig. 6).
116
S. Azad
Fig. 6 Methodology for sentiment analysis using GPT-4 and human as annotated corpus
3.7 Accuracy As per below Table 3, The accuracy of GPT-4 and human annotation is listed. Our research started with the assumption that GPT-4 is the best annotator as its capabilities are higher due to the rich training dataset, as GPT-4 uses using huge corpus. The accuracy of XG-Boost with human annotation is highest because it can easily capture non-linear decision boundary easily with boosted layers. As text data has huge dimensions XG-Boost can easily capture trends. The lowest accuracy is 0.6189 of k-NN as k-NN is based on distance measure and it memorizes the entire training dataset. And the performance of k-NN in both GPT-4 and human-annotated dataset.
The Effectiveness of GPT-4 as Financial News Annotator Versus Human …
117
Table 3 Accuracy of GPT-4 and human annotation Sl no
Machine learning algorithms Accuracy (Human annotation)
Accuracy (GPT-4 annotations)
1
Logistic regression
0.7477
0.6547
2
K-NN
0.6189
0.5581
3
Multi-layer perception (MLP)
0.7137
0.6475
4
Decision tree
0.6851
0.601
5
Random forest
0.7477
0.6493
6
Support vector classifier
0.7227
0.6565
7
Ada boost
0.5867
0.5062
8
Gradient boosting machine
0.7584
0.6601
9
XG-boost
0.7817
0.6314
10
Elastic-net classifier
0.7495
0.6529
4 Conclusion In conclusion, the effectiveness of GPT-4 as a financial news annotator compared to a human annotator in improving the accuracy and performance of sentiment analysis is evident. The study provided evidence that GPT-4 underperforms human annotators in several aspects. First of all, GPT-4 demonstrated lower accuracy in sentiment analysis compared to human annotators. Its advanced natural language processing capabilities and extensive training in financial news data allowed it to capture nuanced sentiment patterns that human annotators might overlook While human annotators might face limitations in terms of processing speed and scalability, GPT-4 can analyze large volumes of financial news data in real-time, providing faster and more context-based sentiment analysis results. This advantage is particularly crucial in the fast-paced world of finTech markets, where time-sensitive decisions can significantly impact profits. However, it is important to note that human annotators still possess certain advantages over GPT-4. Humans are capable of incorporating contextual knowledge, domain expertise, and understanding of subtle nuances that may not be evident in the text alone. Moreover, human annotators can adapt and learn from evolving market conditions, which may require constant training and updates for GPT-4’s algorithm. In conclusion, while human annotators bring valuable expertise and intuition to sentiment analysis in finance, GPT-4’s advanced capabilities, higher accuracy, and efficient performance make it still a powerful tool for financial news annotation (Fig. 7).
118
S. Azad
Accuracy of Machine Learning Algorithms Elastic-net Classifier XG-Boost Gradient Boosting Machine Ada Boost Support Vector Classifier(SVC) Random Forest Decision Tree Multi-Layer Perceptron(MLP) k-NN Logistic Regression 0 Accuracy [GPT-4 Annotation]
0.2
0.4
0.6
0.8
1
Accuracy [Human Annotation]
Fig. 7 Comparison chart of GPT-4 and human annotation
5 Recommendation for Future Studies 1. Compare GPT-4 with other AI models: While GPT-4 may have underperformed in the current study, comparing its performance with other state-of-the-art AI models specifically designed for sentiment analysis can provide insights into its relative strengths and weaknesses. 2. Investigate the impact of human Feedback: Evaluate the effectiveness of GPT-4 when used in collaboration with domain experts as annotators. This could involve analyzing sentiment with a combination of human and AI annotation to measure their synergistic effects on accuracy and performance. 3. Analyze sentiment at a granular level: Sentiment analysis can be performed at various levels, such as document-level, sentence-level, and entity-level. Future studies should explore the effectiveness of GPT-4 at these different levels to identify its strengths and limitations. 4. Investigate the Explainability of GPT-4: Understanding how GPT-4 arrived at its sentiment prediction output is essential for trust and explainability. Future research should focus on methods to make the decision-making process of GPT-4 more interpretable and transparent for end-users. 5. Evaluate performance across different financial domains: Financial news varies across different sectors, such as banking, technology, and healthcare. Conducting experiments with different domains can help assess the generalizability of GPT4’s performance in various financial contexts. 6. Explore transfer learning capabilities: Leveraging pre-trained GPT-4 models on other related tasks rather than training from scratch. Investigate how the GPT-4 knowledge base can be transferred to different financial sentiment analysis tasks.
The Effectiveness of GPT-4 as Financial News Annotator Versus Human …
119
7. Incorporate real-world trading strategies: Evaluate the impact of sentiment analysis outcomes from GPT-4 on actual trading strategies. This could involve backtesting the recommendations provided by GPT-4 against historical market data to understand its potential impact on financial decision-making.
6 Declaration of Generative AI and AI-Assisted Technologies in the Writing Process Statement: During the preparation of this work the author used chatGPT/GPT-4 in order to generate different prompts to label the data for text classification to generate sentiment analysis corpus as my research requires it. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.
References 1. Ding B, Qin C, Bing L, Joty S, Li B (2022) Is GPT-3 a good data annotator? arxiv:2212.10450v1 2. Chiang C-H, Lee H (2023) Can large language models be an alternative to human evaluation? arxiv:2305.01937v1 3. Huang F, Kwak H, An J (2023) Is chatgpt better than human annotators? Potential and limitations of chatgpt in explaining implicit hate speech. arXiv preprint arxiv:2302.07736 4. OpenAI (2022) Chatgpt: optimizing language models for dialogue. Accessed 10 Jan 2023 5. Gilardi F, Alizadeh M, Kubli M (2023) Chatgpt outperforms crowd-workers for text annotation tasks. arXiv preprint arXiv:2303.15056 6. Abagail N, McNaught J (2020) Improving sentiment analysis in financial news using deep learning techniques. J Comp Sci 43:101133 7. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comp Sci 2(1):1–8 8. Ding X, Liu B, Yu PS (2008) A holistic lexicon-based approach to opinion mining. In: Proceedings of the international conference on web search and data mining, pp 231–240 9. Hasan SA, Ng V, Michael L (2013) Why are social media important? Understanding the contributions of the academic community. In: Proceedings of the 51st annual meeting of the association for computational linguistics, vol. 1, pp 1–10 10. El-Beltagy SR, Rafea A (2021) Ontology based annotation of text segments. In: Conference: Proceedings of the 5th International Workshop on Semantic Evaluation 11. Pandey, R., & Kumar, S. (2021). Sentiment analysis using deep learning: A survey. Journal of Ambient Intelligence and Humanized Computing, 1–24. 12. Nigam K, Ghosh R (2018) Sentiment analysis of financial news articles using machine learning techniques. In: International conference on data management, analytics and innovation. Springer, pp 400–410 13. Xie W, Yang J, Cheng X (2020) Sentiment analysis of financial news based on BERT. IEEE Access 8:86657–86666 14. Hutto CJ, Gilbert E (2014) VADER: A parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international AAAI conference on weblogs and social media 15. Kumar M, Morstatter F, Liu H (2018) Twitter data analytics. Springer 16. Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Human Lang Technol 5(1):1– 167
Machine Learning Method for Analyzing and Predicting Cardiovascular Disease Yogendra Narayan, Mandeep Kaur Ghumman, and Charanjeet Gaba
Abstract Heart Disease is among the leading causes of mortality globally. Heart disease is responsible for delivering plasma to every portion of the body. Frequent causes of cardiac arrest are ischemic heart disease (CAD) and congestive heart failure (CHF). Traditional medical techniques (such as angiography) have higher costs and significant health risks and are used to diagnose heart disease. Therefore, scientists have developed a number of robotic detection methods employing ML algorithms and knowledge discovery techniques. ML-based computer-aided diagnostic techniques make detecting cardiovascular disease simple, efficient, and trustworthy. In the past, multiple machine learning, data gathering, and information sources have been utilized. In numerous past evaluations, several research articles devoted to a specified data format have been released. Likewise, the purpose of this work is to conduct a comprehensive analysis of computerized diagnostics for cardiovascular disease prognosis using multiple techniques. Keywords Heart disease · Machine learning · Early cardiac diagnosis · Data pre-processing
1 Introduction The heart is a vital organ, considered the second-most important after the brain. When something is wrong with the heart, it’s a problem for the whole physique. Among the top five leading causes of death worldwide is cardiovascular disease, which is mostly Y. Narayan (B) · M. Kaur Ghumman Department of ECE, Chandigarh University, Mohali, Punjab, India e-mail: [email protected] M. Kaur Ghumman e-mail: [email protected] C. Gaba Department of CSE, Chandigarh University, Mohali, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_11
121
122
Y. Narayan et al.
a result of the changes that occur every day. Many different names are used to describe heart illness [1]. It includes a variety of disorders that have an influence on the heart. This type of ailment can result in death. Heart disease is responsible for far too many deaths each year. The deterioration of cardiac muscle can lead to this disease [2]. The World Health Assembly has released a report stating that heart disease is the leading cause of death worldwide, taking the lives of 17.9 million people every year. During 2019, non-communicable diseases were responsible for 17 million early deaths (well before the age of 70), with cardiovascular disease accounting for 38 percent of these deaths. Heart disease is the leading cause of mortality globally, with 12 million deaths recorded each year, as reported by the WHO [3]. The American Heart Association (AHA) published an Outcome Goal in 2010 with the following two goals: By 2020, to improve cardiovascular health for all Americans by 20% while reducing deaths from CVDs and stroke by 20% [4]. Even in India, heart disease is becoming the leading cause of mortality. Heart disease was responsible for the deaths of 1.7 million Indians in 2016, per the 2016 Global Burden of Disease (GBD) Study issued on September 15, 2017. Medical expenses and lost work time are both higher when someone has a weak heart. Based on projections made by the WHO, cardiovascular disorders cost India $237 billion between 2005 and 2015. Therefore, it is crucial to make reliable predictions about heart-related disorders [2]. The estimated prevalence of coronary artery disease is often regarded as perhaps the most significant challenge facing the domain of research techniques. Across the course of the past several generations, there has been a consistent rise, all over the world, in the incidence of coronary heart disease [3]. The timely identification of cardiac disease is crucial for reducing its consequences and allowing high-risk patients to make choices about modifying their lifestyles. In order to effectively treat coronary heart disease with counseling and drugs, early identification is essential. Avoiding cardiovascular disease is largely a matter of avoiding behaviors that put one at risk, such as smoking, eating poorly, being overweight, not getting enough exercise, and drinking too much alcohol [4]. Figure 2 shows the model for the classification of cardiovascular disease [5, 6] (Fig. 1).
2 Working Model The initial process of the system is to gather information and decide which characteristics are most important. The necessary data is pre-processed into the desired format. In the subsequent step, the dataset is divided into test and training sets. The model is trained using the algorithms and the training data. In order to verify the system’s accuracy, testing is performed using simulated data.
Machine Learning Method for Analyzing and Predicting Cardiovascular …
123
Prepared Data
Test Data
Training Data
Learning
Machine learning
Validation
Fig. 1 Methods for assessing cardiovascular disease
Data Integration
Data Reduction
Data Pre-processing
Data Transformation
Data Cleaning
Fig. 2 Various actions that make up the pre-processing stage of predicting heart disease
2.1 Collection of Datasets As the backbone of our cardiovascular disease prediction system, we first gather a dataset. After gathering the data, we used it for training the system and testing. The accuracy of the prediction is evaluated by first developing the model on the training dataset and then evaluating it on the testing dataset. In this particular project, just 30 percent of the total data is used for testing, while the remaining 70 percent is put to good use in the form of training.
124
Y. Narayan et al.
2.2 Preprocessing Data “Garbage in, trash out” must have crossed your mind while working with databases. Simply put, this means that the effectiveness of your ML model will depend on the quality of the training data [7]. Even the most advanced algorithms might be biased or deliver subpar results if they were trained on unclean or imperfect data. That’s why data pre-processing is necessary; it transforms data into our required format. Importing datasets, partitioning datasets, scaling attributes, and other operations are all examples of pre-processing tasks for data. For the sake of increasing the reliability of the model, pre-processing of the data is necessary [8].
3 Literature Survey Like in distant years, when it comes to the process of diagnosing diseases, there has recently been a significant amount of curiosity in the implementation of machine learning techniques [9]. The diagnosis of cardiovascular disease has been investigated using a variety of different approaches, according to this literature review. This section is a literature overview of the prior work done on the topic of cardiac disease forecasting, including a description of the datasets utilized, the classifiers that were applied, and an evaluation of their accuracy (Table 1). Data from disease surveillance is used to assess the necessity for public health intervention. When we collect information about something in the past, we can utilize it to learn more about that something later on through data analysis. It is therefore possible to use machine learning [20]. Algorithms to look for trends and anticipate changes based on the patterns discovered [21]. This section describes different types of datasets, including their name and type, instances, number of attributes and their types, missing values, tasks, and year of publication (Table 2).
4 Result Finally, we have taken a dataset from the Kaggle website, which consists of 70,000 patient data records for cardiovascular disease classification. In this dataset, there are a total of 11 features with separate targets; the target is nothing but the column whether the person is suffering from disease or not. The types of features are factual information, patient information, and medical examination results, which are further divided into 11 features. The names of the features are age, height, weight, gender, systolic and diastolic blood pressure, cholesterol, glucose, smoking, alcohol intake, and physical activity, and whether the target is cardiovascular disease or not. Class 1 is the target where the person is affected by the disease, and class 2 is the target where the person is not affected by the disease (Fig. 3).
Machine Learning Method for Analyzing and Predicting Cardiovascular …
125
Table 1 Literature summary of cardiovascular disease Study/year
Description
Classifier used Datasets
Accuracy
Ibrahim et al. [10]
In order to cut down on labeling expenses, five different multi-label effective learning data selection algorithms were used. Hyperparameters of the label ranking classifier used in the selection methods have been optimized using a grid search for each case in the heart disease dataset
MMC, Random, Adaptive, Quire, AUDI
Obtained from the UCI data base: Heart Disease Dataset
AUDI: 62.2%
Syed et al. [11]
As a direct response to DT, KNN, the increasing amount SVM, RF, NB of fatalities caused by heart disease, a reliable, cost-effective risk evaluation model has been developed that utilizes significant non-invasive risk variables
Cardiac dataset that does not involve any invasive procedures comprises of 5776 cases
DT: 81% KNN: 69% SVM: 82% RF: 84% NB: 69%
Kavitha et al. [12]
The hybrid model for DT, RF, DT + Cleveland predicting heart disease. RF 303 instances and 14 We used a hybrid Attributes framework Decision Trees and Random Forests for the input interface to make cardiovascular disease predictions
DT: 79% RF: 81% DT + RF: 88%
Mohamed et al. [13]
Improvements in heart disease prediction accuracy are sought, therefore a genetic algorithm (GA) and a particle swarm optimization (PSO) utilizing a random forest (RF) are devised and used
87.80
GAPSO-RF
Statlog Cleveland
(continued)
126
Y. Narayan et al.
Table 1 (continued) Study/year
Description
Classifier used Datasets
Accuracy
Lubna et al. [14]
It provides an in-depth analysis of the performance of several machine learning approaches used for effective prediction, diagnosis, and treatment of different cardiac conditions
ANN, DT, KNN, NB, SVM
N/A
ANN: 86.91% DT: 74.0%
Senthil et al. [15]
Create a genetic-based crow search algorithm (GCSA) that may be used in tandem with deep convolutional neural networks for feature selection and classification. The collected findings demonstrate an improvement in classification accuracy using the suggested model GCSA
GCSA with DCNN
These include the Dermatology Data Warehouse, the Heart-C Data Warehouse, the Lung Cancer Data Warehouse, the Pima Indian Data Warehouse, the Iris Data Warehouse, the Wisconsin Cancer Data Warehouse, the Lymphography Data Warehouse, the Diabetes Data Warehouse, and the Heart Disease Data Warehouse
GCSA: Original: 88.78% Extracted: 95.34%
Najmu et al. [16]
The goal of this SVM, DT, RF research was to evaluate the efficacy of three different machine learning techniques for making accurate predictions about cardiovascular disease: the support vector machine (SVM), the decision tree (DT), and the random forest (RF)
Databank for Machine Learning at UCI 1329 instances 14 attributes
SVM: 84.93% DT: 97.59% RF: 99.39%
(continued)
Machine Learning Method for Analyzing and Predicting Cardiovascular …
127
Table 1 (continued) Study/year
Description
Rohit et al. [17]
The outcomes and KNN, ANN, findings of the Machine DL, LR, Learning at UCI Heart SVM, DT, RF Disease database are compared after being processed through several deep learning strategies and algorithms for machine learning
Classifier used Datasets UCI Machine Learning Heart Disease Dataset 14 attributes
DL: 94.2% KNN: 84.2% LR: 83.3%
Abdulwahab et al. [18]
This research LR, SVM, DT, contributes to the ANN, MP, current literature by selecting a comprehensive and thoroughly maintained dataset, as well as a set of reference techniques, and afterward confirming the effectiveness of these using a number of different metrics
299 patients’ data 12 Features Collected by: Ahmad et al
DT:80% ANN: 60% LR: 78.34% SVM: 66.67%
Safial et al. [19]
Several different computational intelligence strategies for predicting coronary artery disease were examined. Seven different forms of artificial intelligence were used to perform the analysis
Statlog and Cleveland
SVM: 97.36% DNN: 98.15%
SVM, deep neural network
Accuracy
The performance of two models, the Bagged Tree and the Linear SVM, on the Kaggle dataset is compared in Table 3. When compared to bagged trees, linear SVM produced somewhat higher accuracy and precision but was less successful at identifying genuine positives. Additionally, Bagged Tree had a marginally higher F1 score while Linear SVM obtained good specificity. As a result, Linear SVM outperformed Bagged Tree in terms of all parameters. In this paper, we have compared two classifiers based on the literature review. A comparison of the bagged tree classifier and the linear SVM classifier confusion matrix is shown in Figs. 4 and 5, respectively. For the bagged tree classifier, class 1 accuracy is 74.6% and class 2 accuracy is 68.3%, whereas overall accuracy for the Kaggle dataset is 71.45% for 70,000 patient data records. For the linear SVM classifier, class 1 accuracy is 79% and class 2 accuracy is 64.6%, whereas overall
128
Y. Narayan et al.
Table 2 Analysis of datasets #Instances Attribute
Dataset
Missing Task value
Year
Name
Type
#
Type
Statlog
Multivariate 270
13
Categorical, No real
Classification
2000
SPECTF
Multivariate 264
44
Integer
No
Classification
2001
No
Classification
2001
SPECT
Multivariate 264
22
Categorical
Heart failure clinical records
Multivariate 299
13
Integer, real No
Classification, 2020 Regression, Clustering
Heart disease
Multivariate 303
75
Categorical, Yes integer, real
Classification
1988
Echocardiogram Multivariate 132
12
Categorical, Yes integer, real
Classification
1989
Fig. 3 Confusion matrix of bagged tree classifier for cardiovascular disease Table 3 Comparative analysis of bagged tree and linear SVM
Sr. no
Performance
Bagged tree (%)
Linear SVM (%)
1
Accuracy
71.5
72
2
Exactness
72.9
75
3
Sensitivity
68.3
64
4
Specificity
74.6
79
5
F1 Score
70.5
69
Machine Learning Method for Analyzing and Predicting Cardiovascular …
129
Fig. 4 Confusion matrix of Linear SVM classifier for cardiovascular disease
accuracy is 71.8%. So, it is clear from the above discussion that linear SVM performs better as compared to the bagged tree classifier.
5 Conclusion Heart disease is a condition that, in its more severe forms, can be fatal and is widespread throughout the world. The risk to one’s health rises as a consequence of shifting lifestyles and decreased participation in physically active pursuits. The medical industry offers a variety of diagnostic procedures. On the other hand, it is believed that machine learning is the most accurate of the available options. This may contribute to treatment cost savings by offering an early diagnosis. The model can serve not only as a non-invasive diagnostic tool for physicians and cardiologists but also as a teaching tool for medical students. The initial diagnosis of cardiac patients can be made with the assistance of this instrument, which can be utilized by general practitioners. Anyone seeking a career in automated cardiac diagnostic testing will find this review useful. We anticipate that such individuals will find it valuable. Deep learning algorithms are essential in applications for the healthcare industry. As a result, the application of methods from deep learning to the prediction of heart disease might provide better outcomes.
130
Y. Narayan et al.
References 1. Ahlawat V, Thakur R, Narayan Y (2018) Support vector machine based classification improvement for EMG signals using principal component analysis. J Eng Appl Sci 13(8):6341–6345 2. Redie DK, Sirko AE, Demissie TM, Teferi SS, Shrivastava VK, Verma OP, Sharma TK (2023) Diagnosis of COVID-19 using chest X-ray images based on modified DarkCovidNet model. Evol Intel 16(3):729–738 3. Narayan Y, Kumari M, Rajan R (2022) SEMG signals identification using DT And LR classifier by wavelet-based features. Int J Electr Electron Res 10(4):822–825. https://doi.org/10.37391/ IJEER.100410 4. Godfrey KM, Juarascio A, Manasse S, Minassian A, Risbrough V, Afari N (2019) Heart rate variability and emotion regulation among individuals with obesity and loss of control eating. Physiol Behav 199:73–78. https://doi.org/10.1016/j.physbeh.2018.11.009 5. Mohi Uddin KM, Ripa R, Yeasmin N, Biswas N, Dey SK (2023) Machine learning-based approach to the diagnosis of cardiovascular vascular disease using a combined dataset. Intell Based Med 7:100100. https://doi.org/10.1016/j.ibmed.2023.100100 6. Swathy M, Saruladha K (2022) A comparative study of classification and prediction of cardiovascular diseases (CVD) using machine learning and deep learning techniques. ICT Express 8(1):109–116. https://doi.org/10.1016/j.icte.2021.08.021 7. Narayan Y (2021) Analysis of MLP and DSLVQ classifiers for EEG signals based movements identification. In: 2021 2nd Global conference for advancement in technology, GCAT 2021. Institute of Electrical and Electronics Engineers Inc., https://doi.org/10.1109/GCAT52182. 2021.9587868 8. Li Q, Campan A, Ren A, Eid WE (2022) Automating and improving cardiovascular disease prediction using Machine learning and EMR data features from a regional healthcare system. Int J Med Inform 163. https://doi.org/10.1016/j.ijmedinf.2022.104786 9. Narayan Y, Mathew L, Chatterji S (2018) SEMG signal classification with novel feature extraction using different machine learning approaches. J Intell Fuzzy Syst 35(5):5099–5109. https:// doi.org/10.3233/JIFS-169794 10. El-Hasnony IM, Elzeki OM, Alshehri A, Salem H (2022) Multi-label active learning-based machine learning model for heart disease prediction. Sensors 22(3). https://doi.org/10.3390/ s22031184 11. Ansarullah SI, Saif SM, Kumar P, Kirmani MM (2022) Significance of visible non-invasive risk attributes for the initial prediction of heart disease using different machine learning techniques. Comput Intell Neurosci 2022:9580896. https://doi.org/10.1155/2022/9580896 12. Kavitha M, Gnaneswar G, Dinesh R, Sai YR, Suraj RS (2021) Heart disease prediction using hybrid machine learning model. In: 2021 6th International conference on inventive computation technologies (ICICT). pp 1329–1333. https://doi.org/10.1109/ICICT50816.2021.9358597 13. El-Shafiey MG, Hagag A, El-Dahshan ESA, Ismail MA (2022) A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest. Multimed Tools Appl 81(13):18155–18179. https://doi.org/10.1007/s11042-022-12425-x 14. Riyaz L, Butt MA, Zaman M, Ayob O (2022) Heart disease prediction using machine learning techniques: a quantitative review. In: International conference on innovative computing and communications. Springer Singapore, Singapore, pp 81–94 15. Nagarajan SM, Muthukumaran V, Murugesan R, Joseph RB, Meram M, Prathik A (2022) Innovative feature selection and classification model for heart disease prediction. J Reliab Intell Environ 8(4):333–343. https://doi.org/10.1007/s40860-021-00152-3 16. Abu-Alhaija M, Turab NM (2022) Automated learning of ECG streaming data through machine learning internet of things. Intell Autom Soft Comp 32(1) 17. Bharti R, Khamparia A, Shabaz M, Dhiman G, Pande S, Singh P (2021) Prediction of heart disease using a combination of machine learning and deep learning. Comput Intell Neurosci 2021. https://doi.org/10.1155/2021/8387680 18. Almazroi AA (2022) Survival prediction among heart patients using machine learning techniques. Math Biosci Eng 19(1):134–145. https://doi.org/10.3934/mbe.2022007
Machine Learning Method for Analyzing and Predicting Cardiovascular …
131
19. Ayon SI, Islam MM, Hossain MR (2022) Coronary artery heart disease prediction: a comparative study of computational intelligence techniques. IETE J Res 68(4):2488–2507. https://doi. org/10.1080/03772063.2020.1713916 20. Narayan Y (2021) Direct comparison of SVM and LR classifier for SEMG signal classification using TFD features. Mater Today Proc 45:3543–3546 21. Kirar A, Bhalerao S, Verma OP, Ansari IA (2022) Protecting ECG signals with hybrid swarm intelligence algorithm. In: Garg Lalit BC, Basterrech STK (eds) Artificial intelligence in healthcare. Springer Singapore, Singapore, pp 47–60. https://doi.org/10.1007/978-981-16-6265-2_ 4
Rule-Based Learner Competencies Predictor System Priyanka Gupta, Deepti Mehrotra, and Sunil Vadera
Abstract Forecasting the academic achievement of students is a critical area of research in educational contexts. This domain’s significance stems from its ability to develop efficient mechanisms that enhance academic outcomes and minimize student attrition. In this context, rubric-based progressive learning meticulously provides valuable insights into students’ preferences, knowledge, and competencies. This study proposes a recommender model for detecting the Computational Thinking (CT) competencies of programming learners using a rubric and machine learning. A programming rubric was prepared to cover key programming concepts. A quiz conducted afterward was scored as per the rubric designed. Hierarchical clustering was applied to the rubric scores of learners to segment them into four categories according to their learning parameters. The rules were generated as CT competencies using a rule-based classifier—a multiple-layer perceptron neural network, considering cluster categories as labels. The proposed model assists learners and instructors in identifying the learners’ learning capabilities and priorities, resulting in improved learner performances. Keywords Computational thinking competencies · Learners’ behavior · Programming rubric · Learner preferences
P. Gupta (B) AIIT, Amity University, Noida, Uttar Pradesh, India e-mail: [email protected] D. Mehrotra ASET, Amity University, Noida, Uttar Pradesh, India S. Vadera University of Salford, Salford, UK © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_12
133
134
P. Gupta et al.
1 Introduction Computational Thinking (CT), defined since the 1950s [1], as the ability to solve complex problems by thinking algorithmically and executing by the digital processing agent, has become an essential skill in today’s education scenarios [2, 3]. Initially, the term ‘Computational Thinking’ has been coined by [4] who focused on the use of computation to create new knowledge and the use of computers to improve thinking patterns for accessing knowledge. After that, [5] presented a modified CT as a set of problem-solving skills abstracted from computer science. Thereafter, CT attracted the interest of all academicians, scholars, and researchers since it is a medium to capture the competencies belonging to almost every field, computer programming being the most prominent one. Computer programming is the driving force behind all digital solutions, software, and systems employed. Learning computer programming enables the learners to develop conceptual understanding from the computational point of view and develop CT skills [6]. In light of the increasing demand for computer science education and the integration of programming in various disciplines, it is crucial to accurately assess and develop students’ Computational Thinking (CT) abilities. Traditional approaches to assessing programming skills, such as manual grading and subjective evaluations, are time-consuming and inconsistent. Therefore, more efficient and objective approaches are required to assess programming proficiencies and CT abilities. One such promising approach is the combination of rubric and machine learning techniques to assess and detect CT competencies in programming tasks. The rubric provides a structured framework to evaluate and objectively score programming artifacts based on predefined criteria [7]. It enables the granular assessment of different dimensions of computational thinking including problem-solving, algorithm designing, debugging, and abstraction. Machine learning algorithms, on the other hand, may help in automating the process of programming assessments using a rubric [8]. They can sense the patterns and characteristics of higher order and lower order thinking skills in programming resulting in the automated detection of specific CT capabilities. Rubrics are designed to establish a relationship between assessment criteria and the quality of tasks the learner performs [9]. The assessment is directly linked with the projected learning outcome of the given subjects. Java is a popular programming subject where learners must develop various skills (such as conceptual understanding, critical thinking, and logical ability) to become a programmer. A learner may achieve various levels of learning in object-oriented programming through Java [10]. The approach proposed in this study builds a rubric for this programming language that will help design a rule-based intelligent system recommending the student’s CT competencies providing detailed feedback and assisting the teacher in selecting the appropriate teaching pedagogy to improve the learner’s overall ability. This is achieved by a generic rubric and a questionnaire based on it. This is the originality of the work. This study uses multilayer perceptron (mlp) neural network classification and hierarchical clustering to offer a rubric-based model for predicting a learner’s CT
Rule-Based Learner Competencies Predictor System
135
competencies. The suggested approach relies on five significant steps: (a) First, a rubric is designed considering the learning outcomes and objectives of the JAVA course; (b) Second, it conducts a multiple-response objective quiz for the intended learners; (c) Third, it calculates the scores according to the rubric designed for the assessment; (d) Fourth, it uses a hierarchical clustering technique to identify and categorize the learners based on the scores obtained; (e) Fifth, the clustered learners act as an input to the mlp neural network classification in order to extract the decision rules as CT competencies. This model automatically captures the significant characteristics of learners. It supports the appropriate detection of CT competencies and thus benefits educational contexts. The remaining paper is structured as follows: Sect. 2 presents the related research, Sect. 3 provides a thorough explanation of the proposed model in detail, considering a JAVA quiz as an instance with the findings explained, and finally, the work is concluded in Sect. 4.
2 Related Works CT was considered as the relationship between programming and procedural thinking skills [4] primarily. After that, CT emerged as a fundamental ability to solve complex problems, design systems, and understand human behavior using skills acquired from a computer science perspective [5]. CT skills may be developed by means of educational activities [11] such as learning to program. Some recent research studies emphasize the role of CT in the enhancement of thinking skills and digital competencies due to which coding plays an important role in detecting CT skills in primary as well as higher education establishments. Assessing computational thinking skills based on programming assessments has been researched and analyzed by a number of researchers and practitioners in recent years. For, e.g., [12] proposed a conceptual framework used to promote personalized learning and enhance CT skills using inferential data mining and processing techniques in open educational resources. They mentioned that engaging learners in programming exercises enhances their overall CT competencies. Also, [13] demonstrated the benefits of robot programming in strengthening CT abilities in early childhood. Similarly, [14] showed the remarkable achievement of a medium–high level of CT skills and programming expertise after attending and assessing programming assessments. The effectiveness of programming education in order to cultivate CT skills in K-12 students is highlighted by [15] in their research work. Machine learning techniques and rubrics are widely used for the analysis of learning data and for automatic code assessments, respectively, in the process of detecting or developing CT skills [16–19]. Moreover, many research studies emphasize the inculcation and mapping procedures of CT through programming in the curriculum such as [20] proposed a CT through programming conceptual model
136
P. Gupta et al.
consisting of certain CT areas and their relationships. [21] focused on the CT assessments and found the attention of most CT assessments on programming or computing skills. Most of the research in this area focuses on the methods mentioned above and employs machine learning and statistical analysis to identify different CT abilities. However, this study uses the advantages of rubrics and machine learning to detect the CT competencies of computer programming learners in terms of certain rules. The novelty of this work lies in the fact that the CT competencies are generated by using only a topic-based questionnaire that was then assessed using the rubric criteria.
3 Proposed Model This section presents a model proposed for CT abilities detection. The five steps from the introduction are represented in Fig. 1 and are detailed in the following subsections:
3.1 Rubric Creation This module employs rubrics as a trusted assessment tool for students. A rubric must include criteria for evaluation, quality definitions at distinct levels of achievement, and a criteria-wise scoring strategy [22, 23]. Rubrics provide clarity in marking schemes, help students with self-monitoring [24–27], ensure consistency and fairness in grading [27], and aid in the assessment and teaching strategy decisions [28, 29]. The present study has developed a rubric for a JAVA programming course that can be applied to various programming languages. The rubric is designed to evaluate seven critical object-oriented programming skills, and its criteria have been aligned with Bloom’s Taxonomy levels. Bloom’s Taxonomy is a well-established framework that categorizes cognitive skills into six levels: remember, understand, apply, analyze, evaluate, and create [30]. The rubric, along with its testing parameters and corresponding Bloom’s Taxonomy level for each criterion, is presented in Table 1.
Fig. 1 Proposed model for the detection of programming CT competencies
Parameters of questions
Syntax-based
Small and basic output-based
Mixed concepts output-based
Tricky and compilation-based
Comparative performance and complexity-based
Software and application development-based
Syntax knowledge (Remember)
Conceptual thinking and skills (Understand)
Critical thinking (Apply)
Logic building and thinking (Analyze)
Optimization skills and complexity (Evaluate)
Applications design (Create)
Theory and concepts Theory and (Remember) concept-based
Criteria
Table 1 Programming Rubric designed 0% (0) No/poor knowledge
25% (1–5) Novice (Limited knowledge)
50% (6–10) Fluent (Needs practice)
75% (11–15) Proficient (Good programming skills)
100% (16–20) Expert (Well-versed) (Extensive knowledge of programming)
Rule-Based Learner Competencies Predictor System 137
138
P. Gupta et al.
3.2 Quiz Conduction for Learners This module prepares a questionnaire in accordance with the rubric designed in the previous step for a quiz that can be either subjective or objective. As virtual and mass learning platforms become prevalent, assessments must be quick and easy to create and grade and effectively measure learning outcomes (knowledge, skills, and abilities) across a diverse range of learners with varying learning styles. Objective tests such as MCQs, Short Answers, and True–False are commonly used due to their reliability, consistency, and impartiality [31]. In particular, MCQs help test higher-order thinking, problem-solving, data interpretation, and critical thinking [32], making them ideal for outcome-based and active learning scenarios [33]. The quality of MCQs is determined by the distractors’ effectiveness and the questions’ ability to assess student competencies [34]. Multiple-response questions must offer perplexing and comparable possibilities to evaluate critical and analytical thinking. Owing to these benefits, a JAVA multiple-response quiz was conducted in this module on 209 Master of Computer Applications (MCA) (100) and Bachelor of Technology (B.Tech.) (109) students. The questions in the quiz were designed per the rubric specifications to ensure precise measurement of learning metrics. The partial scoring according to the rubric will be elaborated upon in the following section.
3.3 Partial Scoring According to the Rubric The rubric employs a five-point scoring system for each criterion, which is also utilized in the quiz. For each criterion, the quiz included five questions, each with four options and multiple correct answers, which assessed the programming skills associated with that particular criterion. Each question had four points, with one point assigned for every correct option. Therefore, each criterion had a maximum possible score of 20 points. The total score for each question was calculated by summing up the number of correct options chosen by the student.For example, if a particular question has the a and d options correct, then the marking performed is as follows: If the a option is marked then 1 point otherwise 0 points, if the b option is marked then 0 points otherwise 1 point, if the c option is marked then 0 point otherwise 1 point and if d option marked then 1 point otherwise 0 point. When a student selects options b and d for a question, they receive a score of 2 (0 + 0 + 1 + 1) out of 4, representing a 50% correct response. The partial grade for each criterion reflects the learner’s knowledge and analytical ability. The final MCQ scores are determined by totaling the partial grades for each criterion. This approach provides students with comprehensive feedback on their programming skills, aligned with the rubric and corresponding to each criterion. For example, suppose a student scores 25% in Theory and Concepts (earning points 1–5) but 75% in Logic Building and Thinking (earning points 11–15), it suggests they excel at programming concept
Rule-Based Learner Competencies Predictor System
139
analysis but must work on theory and principles. Using Bloom’s Taxonomy, the rubric highlights that the learner is adept at analyzing but less skilled at memorization. The rubric-based partial scoring system is used to assign scores for each criterion. Due to a large number of students, manual evaluation and analysis of learning parameters are only feasible with machine learning and deep analytical methodologies. The students’ scores are then inputted into the clustering algorithms described in the next section.
3.4 Learner Categorization Based on Clusters In this module, clustering is performed on the partial scores obtained from the rubric in the previous step. The clustering method is a data mining technique that groups data points based on similarity [35]. This research work employed an agglomerative (bottom-up) hierarchical technique. Initially, considering each data point as a singleton cluster, this method keeps merging the clusters until all the data points form a single cluster [36]. Linkage methods estimate data point distance between two sub-clusters. This work implemented hierarchical clustering using Ward’s Minimum Variance technique. The clusters were produced using R software. Table 2 presents the cluster centers resulting from the hierarchical clustering technique used in this study. These centers represent the aggregate values for each cluster. Table 2 shows the learner categories for each cluster based on the scale shown in the rubric in Table 1. Cluster 1 is the smallest group, consisting of expert learners in remembering theory, syntax, and their simple conceptual applications. However, they need more practice in critical thinking and logic-building exercises to improve their ability to optimize code and design real-time applications. Cluster 2 comprises learners who struggle to remember concepts and syntax accurately and perform moderately in simple conceptual and critical applications. However, they excel at tricky exercises, optimization, and application design. Cluster 3 consists of learners who are experts in every criterion and proficient in application design, indicating that they excel in every programming aspect with equal attention. Cluster 4 is the largest and consists of learners who are proficient in every criterion except syntax knowledge and conceptual thinking and skills, in which they are experts. These learners have good overall programming skills but need to improve to reach the level of expertise observed in Cluster 3. The clustering results provide insights into the nature of each cluster of students. Cluster 1 learners can memorize concepts but need to work on applying and analyzing them. Cluster 2 learners are not interested in memorizing syntax but enjoy software development and analyzing code. Cluster 3 learners are well-versed in every criterion and can be considered all-rounders. Cluster 4 learners are close to acquiring extensive knowledge in all programming parameters. Categorizing learners according to their performance in different subject areas provides detailed feedback to help them identify their strengths and weaknesses in
Theory and concepts
98.750 (Expert)
25.806 (Novice)
91.379 (Expert)
68.447 (Proficient)
Cluster no
1
2
3
4
80.340 (Expert)
97.701 (Expert)
32.258 (Fluent)
100.000 (Expert)
Syntax knowledge
75.485 (Expert)
96.552 (Expert)
46.774 (Fluent)
75.000 (Expert)
Conceptual thinking and skills
Table 2 Cluster centers of four clusters generated
73.058 (Proficient)
90.517 (Expert)
58.871 (Proficient)
43.750 (Fluent)
Critical thinking
68.689 (Proficient)
86.782 (Expert)
78.226 (Expert)
46.250 (Fluent)
Logic building and thinking
63.350 (Proficient)
75.287 (Expert)
87.097 (Expert)
25.000 (Novice)
Optimization skills and complexity
60.194 (Proficient)
57.471 (Proficient)
87.097 (Expert)
25.000 (Novice)
Applications design
103
87
31
20
No. of data items
140 P. Gupta et al.
Rule-Based Learner Competencies Predictor System
141
each area. This information can guide learners toward improving their performance in specific areas. Educators can also benefit from this categorization by tailoring their teaching methods and content to each group of learners’ specific needs and preferences. This approach can lead to more effective teaching and better student learning outcomes.
3.5 Rules Generation Using Classification To identify the learning CT abilities of computer programming learners, a training model employing neural network is constructed in this study to extract classifier rules from the clustered results. This approach enables the identification of critical factors that contribute to learners’ performance in programming, leading to better understanding and more effective teaching strategies. Rule learning is a classification approach that involves identifying, learning, and evolving the data into a set of rules representing the knowledge the system encapsulates [37]. This research chose the rule learning method to identify CT abilities using a neural network. A neural network approach is used to learn and generalize the knowledge mined from the training data [38]. It runs thousands of iterations and learns from the results to predict outputs [39]. The multilayer perceptron neural network (MLPNN) is one prime artificial neural network for estimation and forecasting. Numerous studies have utilized it as a benchmark model [40, 41]. It is used to model complex and non-linear systems in the real world [42, 43]. Table 3 shows the essential classification metrics: Accuracy, Recall, Precision, and F1-score for Naive Bayes, Logistic Regression, Support Vector Machine (SVM), and MLP applied to the clustered student data. Classification metrics assess model performance and recommend appropriate classifiers. Python and IDE Spyder were used for this task. Table 3 shows that MLP possesses the highest classification metrics scores. Thus, MLPNN is the best classification neural network technique for this research study and model. RuleMatrix is a tool that helps users comprehend machine learning models such as classifiers and explore their expertise. This tool extracts classifier rules [44]. This
Table 3 Classifier metrics Classifiers
Accuracy
Precision
Recall
F1-score
Naïve Bayes
0.810
0.868
0.887
0.871
Logistic regression
0.929
0.952
0.940
0.945
SVM
0.905
0.964
0.833
0.856
MLP
0.929
0.942
0.969
0.951
142
P. Gupta et al.
study utilizes RuleMatrix to extract MLPNN rules. The code is available at http://rul ematrix.github.io. Code generates rules using a Scalable Bayesian Rule List (SBRL) method for binary classification problems. This approach is used in this work for multi-class classification. This study uses four clusters from the last module as class labels. The work is done using Google Colab. The following rules were extracted as a result: 1. IF (Theory and Concepts in (-inf, 25.72196388244629)) THEN prob: [0.0156, 0.9531, 0.0156, 0.0156] 2. ELSE IF (Optimization Skills & Complexity in (65.14054107666016, 92.20243835449219)) AND (Applications Design in (28.401092529296875, 46.90818786621094)) THEN prob: [0.0128, 0.0128, 0.9231, 0.0513] 3. ELSE IF (Optimization Skills & Complexity in (-inf, 26.732999801635742)) THEN prob: [0.9167, 0.0208, 0.0208, 0.0417] 4. ELSE IF (Critical Thinking in (-inf, 45.326969146728516)) AND (Applications Design in (28.401092529296875, 46.90818786621094)) THEN prob: [0.7500, 0.0500, 0.1000, 0.1000] 5. ELSE IF (Theory and Concepts in (77.07087707519531, inf)) AND (Logic Building & Thinking in (36.02157974243164, 62.7748908996582)) THEN prob: [0.3036, 0.0357, 0.1429, 0.5179] 6. ELSE IF (Applications Design in (-inf, 28.401092529296875)) THEN prob: [0.2941, 0.0588, 0.4706, 0.1765] 7. ELSE IF (Conceptual Thinking & Skills in (89.50209045410156, inf)) AND (Applications Design in (46.90818786621094, 65.34204864501953)) THEN prob: [0.0125, 0.0125, 0.6750, 0.3000] 8. ELSE IF (Applications Design in (28.401092529296875, 46.90818786621094)) THEN prob: [0.0423, 0.0141, 0.6338, 0.3099] 9. ELSE IF (Theory and Concepts in (25.72196388244629, 42.692867279052734)) AND (Syntax Knowledge in (43.01002502441406, 66.44934844970703)) THEN prob: [0.0435, 0.7391, 0.0435, 0.1739] 10. ELSE IF (Theory and Concepts in (77.07087707519531, inf)) THEN prob: [0.0164, 0.0082, 0.3852, 0.5902] 11. ELSE IF (Syntax Knowledge in (-inf, 43.01002502441406)) THEN prob: [0.0233, 0.8605, 0.0233, 0.0930] 12. ELSE DEFAULT prob: [0.0103, 0.0773, 0.0155, 0.8969] The above rules outline how the likelihood of belonging to a specific cluster is determined based on certain criteria falling within a specific range. Specifically, for the first criterion, if a learner’s Theory & Concepts scores are approximately 25, they are more likely to belong to the second cluster. The second, third, and fourth clusters have three rules each, while the first cluster has two rules due to its smaller size. This process of generating rules illustrates the significant relationships between programming domains, and the rubric scores provide insight into the probability of being a member of a particular cluster. Ultimately, these rules define a learner’s programming CT competencies in terms of the learner’s knowledge and skills. These rules assist learners in identifying their detailed computational competencies by merely appearing in a rubric-based objective-type assessment. Further, these rules
Rule-Based Learner Competencies Predictor System
143
help instructors to adapt their teaching methodologies and syllabi according to the diverse CT competencies of a group of learners.
4 Conclusion The extraction of progressive learning abilities from the student assessment data is of utmost importance these days because it leads to improvisation in learner performance [45]. Computational Thinking practices are used as a medium to explore the programming competencies in learners by utilizing various methods these days [46, 47]. This research predicts the CT competencies of computer programming students using rubrics and machine learning. This work utilizes a partial scoring technique in multiple-response-based MCQs and opens up new possibilities in e-assessments because MCQs are scalable and easy to conduct and evaluate [32], a rubric for these types of assessments is not commonly used. However, the questions must be carefully selected to extract meaningful information about the students’ learning competencies and associated skill-set. This research uses a top-down method to create a generic programming rubric. Then, a multiple-response-based questionnaire is designed to verify that all rubric criteria are covered, and a partial marking scheme is implemented to obtain partial correctness scores for each criterion. Clusters are created to categorize learners based on their learning preferences and knowledge. MLPNN is used to generate rules, which are termed appropriate CT competencies. The proposed model allows teachers to anticipate the skill-set of their students, enabling them to adjust their teaching methods and lesson plans accordingly. This approach promotes outcome-based learning and holds the potential to yield positive outcomes not only in programming but also in other academic disciplines. However, the future scope of this work includes mapping these rules described as CT competencies with the learner achievements and outcomes. Also, the rubric may be shared or collaborated with the learners so that they know about the teachers’ expectations for scoring which further enhances their competencies and skills to the desired level.
References 1. Denning PJ (2009) The profession of IT beyond computational thinking. Commun ACM 52(6):28–30 2. Fanchamps N (2021) The influence of sense-reason-act programming on computational thinking. Open University, Heerlen 3. Nouri J, Zhang L, Mannila L, Norén E (2020) Development of computational thinking, digital competence and 21st century skills when learning programming in K-9. Educ Inq 11(1):1–17. https://doi.org/10.1080/20004508.2019.1627844 4. Papert S (1980) Mindstorms, children, computers and powerful ideas. Basic Books, inc. 5. Wing JM (2006) Computational thinking. Commun ACM 49(3):33–35. https://doi.org/10. 1145/1118178.1118215
144
P. Gupta et al.
6. Hogenboom SA, Hermans FF, Van der Maas HL (2021) Computerized adaptive assessment of understanding of programming concepts in primary school children. Computer Science Education, 30. https://doi.org/10.1080/08993408.2021.1914461. https://www.tandfonline.com/act ion/showCitFormats? Accessed 22 Sept 2022 7. Stevens DD, Levi AJ (2023) Introduction to rubrics: an assessment tool to save grading time, convey effective feedback, and promote student learning. Routledge. https://doi.org/10.4324/ 9781003445432 8. Aldriye H, Alkhalaf A, Alkhalaf M (2019) Automated grading systems for programming assignments: a literature review. Int J Adv Comp Sci Appl 10(3) 9. Chowdhury F (2019) Application of rubrics in the classroom: a vital tool for improvement in assessment, feedback and learning. Int Educ Stud 12(1):61–68 10. Khoirom S, Sonia M, Laikhuram B, Laishram J, Singh TD (2020) Comparative analysis of Python and Java for beginners. Int Res J Eng Technol 7(8):4384–4407 11. Hsu T-C, Chang S-C, Hung Y-T (2018) How to learn and how to teach computational thinking: suggestions based on a review of the literature. Comput Educ 126:296–310. https://doi.org/10. 1016/j.compedu.2018.07.004 12. Moon J, Do J, Lee D, Choi GW (2020) A conceptual framework for teaching computational thinking in personalized OERs. Smart Learn Environ 7(1):1–19. https://doi.org/10.1186/s40 561-019-0108-z 13. Yang W, Ng DTK, Gao H (2022) Robot programming versus block play in early childhood education: effects on computational thinking, sequencing ability, and self-regulation. Br J Edu Technol 53(6):1817–1841. https://doi.org/10.1111/bjet.13215 14. Gabriele L, Bertacchini F, Tavernise A, Vaca-Cárdenas L, Pantano P, Bilotta E (2019) Lesson planning by computational thinking skills in Italian pre-service teachers. Inform Educ 18(1):69– 104 15. Sun L, Hu L, Zhou D (2021) Which way of design programming activities is more effective to promote K-12 students’ computational thinking skills? A meta-analysis. J Comput Assist Learn 37(4):1048–1062. https://doi.org/10.1111/jcal.12545 16. Basu S, McElhaney KW, Rachmatullah A, Hutchins NM, Biswas G, Chiu J (2022) Promoting computational thinking through science-engineering integration using computational modeling. In Proceedings of the 16th International conference of the learning sciencesICLS 2022. International Society of the Learning Sciences, pp. 743–750. https://doi.org/10. 22318/icls2022.743 17. Castro LMC, Magana AJ, Douglas KA, Boutin M (2021) Analyzing students’ computational thinking practices in a first-year engineering course. IEEE Access 9:33041–33050. https://doi. org/10.1109/ACCESS.2021.3061277 18. De Souza AA, Barcelos TS, Munoz R, Villarroel R, Silva LA (2019) Data mining framework to analyze the evolution of computational thinking skills in game building workshops. IEEE Access 7:82848–82866. https://doi.org/10.1109/ACCESS.2019.2924343 19. Jeffrey RM, Lundy M, Coffey D, McBreen S, Martin-Carrillo A, Hanlon L (2022) Teaching computational thinking to space science students. arXiv preprint arXiv:2205.04416. https:// doi.org/10.48550/arXiv.2205.04416 20. Tikva C, Tambouris E (2021) Mapping computational thinking through programming in K12 education: a conceptual model based on a systematic literature review. Comput Educ 162:104083. https://doi.org/10.1016/j.compedu.2020.104083 21. Tang X, Yin Y, Lin Q, Hadad R, Zhai X (2020) Assessing computational thinking: a systematic review of empirical studies. Comput Educ 148:103798. https://doi.org/10.1016/j.compedu. 2019.103798 22. Nkhoma C, Nkhoma M, Thomas S, Le NQ (2020) The role of rubrics in learning and implementation of authentic assessment: a literature review. In: Jones M (ed) Proceedings of InSITE 2020: informing science and information technology education conference. Informing Science Institute, pp 237–276. https://doi.org/10.28945/4606 23. Reddy MY (2011) Design and development of rubrics to improve assessment outcomes: a pilot study in a Master’s level business program in India. Qual Assur Educ 19(1):84–104. https:// doi.org/10.1108/09684881111107771
Rule-Based Learner Competencies Predictor System
145
24. Andrade H, Du Y (2005) Student perspectives on rubric-referenced assessment. Pract Assess Res Eval 10(1):3. https://doi.org/10.7275/g367-ye94 25. Panadero E, Jönsson A (2013) The use of scoring rubrics for formative assessment purposes revisited: a review. Educ Res Rev 9:129–144. https://doi.org/10.1016/j.edurev.2013.01.002 26. Sundeen TH (2014) Instructional rubrics: effects of presentation options on writing quality. Assess Writ 21:74–88. https://doi.org/10.1016/j.asw.2014.03.003 27. Wolf K, Stevens E (2007) The role of rubrics in advancing and assessing student learning. J Effect Teach 7(1):3–14 28. Company P, Contero M, Otey J, Camba JD, Agost M-J, Pérez-López D (2017) Web-based system for adaptable rubrics: case study on CAD assessment. Educ Technol Soc 20(3):24–41 29. Halonen JS, Bosack T, Clay S, McCarthy M, Dunn DS, Hill GW IV, McEntarffer R, Mehrotra C, Nesmith R, Weaver KA, Whitlock K (2003) A rubric for learning, teaching, and assessing scientific inquiry in psychology. Teach Psychol 30(3):196–208. https://doi.org/10.1207/S15 328023TOP3003_01 30. Chandio MT, Pandhiani SM, Iqbal R (2016) Bloom’s taxonomy: improving assessment and teaching-learning process. J Educ Educ Dev 3(2). https://doi.org/10.22555/joeed.v3i2.1034 31. Bhattacherjee S, Mukherjee A, Bhandari K, Rout AJ (2022) Evaluation of multiple-choice questions by item analysis, from an online internal assessment of 6th semester medical students in a rural medical college, West Bengal. Indian J Commun Med Offic Publ Indian Assoc Prev Soc Med 47(1):92–95. https://doi.org/10.4103/ijcm.ijcm_1156_21 32. Elgadal AH, Mariod AA (2021) Item analysis of multiple-choice questions (MCQs): assessment tool for quality assurance measures. Sudan J Med Sci 16(3):334–346. https://doi.org/10. 18502/sjms.v16i3.9695 33. Das B, Majumder M, Phadikar S, Sekh AA (2021) Multiple-choice question generation with auto-generated distractors for computer-assisted educational assessment. Multimedia Tools Appl 80(21–23):31907–31925. https://doi.org/10.1007/s11042-021-11222-2 34. Burud I, Nagandla K, Agarwal P (2019) Impact of distractors in item analysis of multiple choice questions. Int J Res Med Sci 7(4):1136–1139. https://doi.org/10.18203/2320-6012.ijr ms20191313 35. Abualigah LMQ (2019) Introduction. In: Feature selection and enhanced krill herd algorithm for text document clustering. Studies in computational intelligence, vol 816. Springer, Cham. https://doi.org/10.1007/978-3-030-10674-4_1 36. Ackermann MR, Blömer J, Kuntze D, Sohler C (2014) Analysis of agglomerative clustering. Algorithmica 69(1):184–215. https://doi.org/10.1007/s00453-012-9717-4 37. Weiss SM, Indurkhya N (1995) Rule-based machine learning methods for functional prediction. J Artif Intell Res 3:383–403. https://doi.org/10.1613/jair.199 38. Yedjour D (2020) Extracting classification rules from artificial neural network trained with discretized inputs. Neural Process Lett 52:2469–2491. https://doi.org/10.1007/s11063-02010357-x 39. Desai M, Shah M (2021) An anatomization on breast cancer detection and diagnosis employing multilayer perceptron neural network (MLP) and convolutional neural network (CNN). Clinic eHealth. https://doi.org/10.1016/j.ceh.2020.11.002 40. Bui DT, Bui KTT, Bui QT, Van Doan C, Hoang ND (2017) Hybrid intelligent model based on least squares support vector regression and artificial bee colony optimization for time-series modeling and forecasting horizontal displacement of hydropower dam. In Handbook of neural computation. Academic Press, pp 279–293. https://doi.org/10.1016/B978-0-12-811318-9.000 15-6 41. Tien Bui D, Tuan TA, Klempe H, Pradhan B, Revhaug I (2016) Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13(2):361–378. https://doi.org/10.1007/s10346-015-0557-6 42. Pham BT, Bui DT, Prakash I, Dholakia MB (2017) Hybrid integration of multilayer perceptron neural networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. CATENA 149:52–63. https://doi.org/10.1016/j.catena.2016. 09.007
146
P. Gupta et al.
43. Sadowski Ł, Hoła J, Czarnecki S, Wang D (2018) Pull-off adhesion prediction of variable thick overlay to the substrate. Autom Constr 85:10–23. https://doi.org/10.1016/j.autcon.2017. 10.001 44. Ming Y, Qu H, Bertini E (2018) Rulematrix: visualizing and understanding classifiers with rules. IEEE Trans Visual Comput Graph 25(1):342–352. https://doi.org/10.1109/TVCG.2018. 2864812 45. Gaševi´c D, Dawson S, Rogers T, Gasevic D (2016) Learning analytics should not promote one size fits all: the effects of instructional conditions in predicting academic success. Internet High Educ 28:68–84. https://doi.org/10.1016/j.iheduc.2015.10.002 46. Rose S, Habgood J, Jay T (2017) An exploration of the role of visual programming tools in the development of young children’s computational thinking. Electron J E-Learn 15(4):297–309. http://www.ejel.org/volume15/issue4/p297 47. Shute VJ, Sun C, Asbell-Clarke J (2017) Demystifying computational thinking. Educ Res Rev 22:142–158. https://doi.org/10.1016/j.edurev.2017.09.003
Exploring the Relationship Between Digital Engagement and Cybersecurity Practices Among College Students: A Survey Study Farha Khan, Shweta Arora, Saurabh Pargaien, Lata Pande, and Kavita Khati
Abstract This research rigorously investigates the intricate relationship between college students’ digital engagement and their adherence to cybersecurity practices. Against the backdrop of technology’s ubiquitous integration into our lives, college students are progressively relying on digital platforms for communication, social interaction, and education. However, this transition simultaneously exposes them to an array of cyber threats, encompassing identity theft, data breaches, and malware intrusions. Employing a comprehensive survey questionnaire, this study methodically extracts insights from a representative college student sample. By disentangling the dimensions of their digital engagement—marked by the frequency of social media interaction, involvement in online gaming, and utilization of mobile devices— the research illuminates their corresponding dedication to cybersecurity practices. These practices encompass the pivotal domains of password management, the adoption of antivirus software, and the cultivation of vigilant awareness towards phishing threats. The core of the analysis lies in employing regression analysis in Excel to unveil a trove of findings. The exploration of college students’ digital engagement patterns and their ensuing cybersecurity behaviors yields a nexus of multifaceted relationships. These intricate interconnections, in turn, offer a compelling foundation for informed implications. The discerned associations between digital engagement and cybersecurity practices furnish invaluable insights, charting a course for tailored educational strategies and astute awareness campaigns. The intrinsic aim of these interventions is to foster a climate of safer online conduct, thus equipping college students with the skills to navigate the digital realm responsibly.
F. Khan · S. Arora · S. Pargaien (B) · K. Khati Graphic Era, Hill University Bhimtal Campus, Nainital, Uttarakhand 263132, India e-mail: [email protected] F. Khan e-mail: [email protected] L. Pande Kumaun University Nainital, Nainital, Uttarakhand 263001, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_13
147
148
F. Khan et al.
Keywords Digital engagement · Cybersecurity · College students · Survey study · Online behavior · Awareness raising
1 Introduction The contemporary landscape has witnessed a seismic shift towards digital platforms, wielding substantial influence over college students’ communication, social interactions, and educational pursuits. While this technological surge presents an array of advantages, it concurrently exposes these students to an array of cyber threats. In a milieu where technology’s omnipresence is becoming the norm, delving into the intricate interplay between digital engagement and cybersecurity practices among college students becomes an imperative undertaking [21]. As the technological frontier advances, college students find themselves navigating a virtual terrain teeming with latent risks. The specter of identity theft, data breaches, and malware onslaughts looms large, casting profound repercussions both for individuals and institutions [2, 11]. It is within this context that college students, often characterized by their high degree of digital immersion, emerge as a particularly susceptible demographic to these perils [3]. Thus, in-depth exploration is indispensable to fathom the nuances underpinning the convergence of their digital engagement practices and their adherence to cybersecurity protocols [7]. This study endeavors to harness the potential of a survey questionnaire as the cornerstone for data collection from a sample of college students. The survey shall cast a comprehensive net, enmeshing various facets of digital engagement. By probing into the frequency of social media interaction, online gaming pursuits, and mobile device integration, it aspires to capture the gamut of their digital involvement. Simultaneously, it seeks to delve into their cybersecurity practices, examining their adeptness in password management, utilization of antivirus software, and level of awareness towards phishing threats [4]. In the crucible of data analysis, the underlying aspiration of this research crystallizes–to unveil potential correlations between digital engagement and cybersecurity practices that underpin the actions of college students. This endeavor takes shape with the anticipation that such insights will unfurl the intricate matrix of influences that mold their online comportment. Equally significant is the degree to which they marshal defenses against the panorama of cyber threats [14]. The import of these revelations resonates in the empowerment they furnish, enabling the curation of educational initiatives and awareness campaigns tailor-made to foster prudent online behavior among college students [22]. The crucible of current research is cast within the context of a considerable lacuna in existing literature, a conspicuous dearth in exploring the nexus between digital engagement and cybersecurity practices within the unique milieu of college students. As such, this study emerges not only as a repository of knowledge but a beacon that dispels this scholarly obscurity. Its significance radiates to stakeholders across the spectrum - educational institutions, policymakers, and cybersecurity practitioners. Each, in their respective capacities, seeks
Exploring the Relationship Between Digital Engagement …
149
to augment cybersecurity cognizance and inculcate judicious digital engagement behaviors within the college student demographic [1]. In summation, the pursuit that underpins this research is the unraveling of the intricate threads that bind digital engagement and cybersecurity practices in the realm of college students. Through the medium of a meticulously structured survey questionnaire, it aspires to fathom the extent to which their digital immersion influences their adherence to cybersecurity protocols. The implications that stand poised to unravel from this endeavor have the potential to permeate the fabric of future initiatives, weaving a more secure digital ambiance for college students (Johnson 2023).
2 Literature Review Digital engagement and cybersecurity practices are two important aspects of college students’ online behavior and have been the subject of growing research interest in recent years. This section provides an overview of the existing literature on the relationship between digital engagement and cybersecurity practices, highlighting the key findings and theoretical frameworks. Digital Engagement Digital engagement refers to the extent to which individuals interact with digital technologies and platforms in their daily lives. College students, in particular, are known for their high levels of digital engagement due to their reliance on technology for academic, social, and entertainment purposes [20]. Several studies have examined the impact of digital engagement on various aspects of students’ lives, including communication patterns, information consumption, and social relationships [10, 26]. Social Media Use Social media platforms, such as Facebook, Twitter, and Instagram, play a significant role in college students’ digital engagement. Research has shown that frequent use of social media can lead to both positive and negative outcomes. On one hand, social media can enhance social connectedness, facilitate information sharing, and support academic collaboration [12, 16]. On the other hand, excessive social media use has been associated with detrimental effects on mental health, privacy concerns, and vulnerability to online threats [13], (Lin et al. 2016). Online Game Playing Online gaming is another popular form of digital engagement among college students. It offers opportunities for social interaction, skill development, and entertainment. However, excessive gaming has been linked to negative consequences, including academic underachievement, addictive behaviors, and increased susceptibility to online risks [15]. Understanding the relationship between online game playing and cybersecurity practices is crucial for developing effective interventions to promote responsible gaming and mitigate potential risks [23]. Mobile Device Use The widespread adoption of mobile devices, such as smartphones and tablets, has revolutionized the way college students engage with digital content. Mobile device use provides convenience and flexibility, allowing students to access information, communicate, and engage in various online activities anytime and anywhere. However, it also exposes them to security vulnerabilities, such as malware infections, data breaches, and unauthorized access [6]. Studies have emphasized the
150
F. Khan et al.
need for effective mobile security practices, including device encryption, secure app installation, and regular software updates [18]. Cybersecurity Practices Cybersecurity practices encompass a range of behaviors and strategies aimed at protecting individuals’ online privacy, information, and digital assets. College students, being frequent users of digital technologies, are particularly susceptible to cyber threats and need to adopt appropriate security measures [17]. Password Management one of the fundamental aspects of cybersecurity is password management. Research has indicated that college students often exhibit poor password hygiene, such as using weak passwords, sharing passwords, and reusing them across multiple accounts [24, 25]. Effective password management practices, including using strong and unique passwords, employing password managers, and enabling two-factor authentication, are crucial for mitigating the risk of unauthorized access and identity theft [27, 29]. Antivirus Software Use Antivirus software plays a vital role in safeguarding computer systems and mobile devices from malware infections. However, studies have shown that a considerable proportion of college students do not use antivirus software or fail to keep it updated regularly [19, 28]. Lack of antivirus protection exposes students’ devices to various online threats, including viruses, spyware, and ransomware, which can compromise their privacy and data security [5]. Phishing Awareness Phishing attacks, which involve fraudulent attempts to obtain sensitive information such as login credentials or financial details, pose a significant threat to college students. Developing phishing awareness is essential to avoid falling victim to such attacks. Research has highlighted the importance of educating students about phishing techniques, teaching them to recognize phishing emails, and promoting cautious online behavior [8, 9]. In summary, the literature review provides insights into the relationship between digital engagement and cybersecurity practices among college students. It emphasizes the need to understand how the frequency of social media use, online game playing, and mobile device usage influence password management, antivirus software use, and phishing awareness. The next section of this paper will outline the research methodology employed to investigate this relationship.
3 Methodology 3.1 Research Design and Approach This study utilized a quantitative research design to explore the relationship between digital engagement and cybersecurity practices among college students. The research approach involved collecting data from participants at a specific point in time, providing a snapshot of their digital engagement and cybersecurity practices. This design enabled the examination of associations between variables and offered insights into the potential impact of digital engagement on cybersecurity practices.
Exploring the Relationship Between Digital Engagement …
151
3.2 Sampling Technique and Sample Size Determination A convenience sampling technique was employed to select participants for this study. The sample size of 106 college students was determined based on the available resources and the feasibility of data collection. This sample size was considered adequate for conducting regression analysis and aimed to ensure diversity within the student population.
3.3 Data Collection Instrument (Survey Questionnaire) and Its Development A survey questionnaire was developed as the data collection instrument for this study. The questionnaire comprised two sections: digital engagement and cybersecurity practices. The digital engagement section included items that assessed the frequency of social media use, online gaming, and mobile device usage. The cybersecurity practices section included items related to password management, antivirus software use, and phishing awareness. The survey questionnaire was developed based on an extensive review of relevant literature and previous research utilizing similar scales. The items were adapted and modified to suit the specific context of college students and the objectives of this study. A pilot test was conducted with a small group of students to assess the questionnaire’s clarity, comprehensibility, and face validity. Minor modifications were made based on the pilot test feedback to enhance the readability and clarity of the questionnaire.
3.4 Measures and Operationalization of Digital Engagement and Cybersecurity Practices To measure digital engagement and cybersecurity practices, Likert-scale items were used. Participants were assigned numerical values ranging from 0 to 4 to indicate their level of engagement or adherence to specific practices. Higher values represented higher levels of engagement or better adherence to cybersecurity practices.
3.5 Data Analysis Techniques The collected data were analyzed using regression analysis. The regression analysis was performed using Microsoft Excel, which allowed for the exploration of the relationship between digital engagement (independent variables) and cybersecurity practices (dependent variable). The regression model enabled the identification of significant predictors of cybersecurity practices among college students.
152
F. Khan et al.
The statistical significance level was set at p < 0.05, indicating a threshold for determining significant relationships between variables. The analysis aimed to uncover any potential impact of digital engagement on cybersecurity practices among college students. The research methodology employed in this study involved a quantitative research design, a convenience sampling technique, a survey questionnaire, and regression analysis using Microsoft Excel. These methodological choices enabled the exploration of the relationship between digital engagement and cybersecurity practices among college students. The following section will present the findings and discussion of the study.
4 Data Analysis Summary output Regression statistics Multiple R
0.394824
R square
0.155886
Adjusted R square
0.074198
Standard error
0.98401
Observations
103
ANOVA df
SS
MS
F
Significance F
Regression
9
16.62989
1.847765
1.908303
0.060119
Residual
93
90.04973
0.968277
Total
102
106.6796
Coefficients
Standard error
P-value
Lower 95%
Intercept
1.821934
X Variable 1
−0.0512
X Variable 2
−0.14225
X Variable 3
0.272036
0.539499
t Stat 3.377085
0.08685
−0.58956
0.104813
−1.35718
0.12149
2.239169
Upper 95%
Lower 95.0%
0.001071
0.750596
2.893272
0.556913
0.22367
0.121264
−0.22367
0.178008
−0.35039
0.065888
−0.35039
0.027529
0.030782
0.513291
0.750596
0.030782
Upper 95.0% 2.893272 0.121264 0.065888 0.513291
X Variable 4
−0.0503
0.101313
−0.4965
0.620717
−0.25149
0.150887
−0.25149
0.150887
X Variable 5
−0.02194
0.084095
−0.26091
0.79474
−0.18894
0.145055
−0.18894
0.145055
X Variable 6
−0.18396
0.175195
−1.05004
0.29642
−0.53186
0.16394
−0.53186
0.16394
X Variable 7
−0.01204
0.094307
−0.12772
0.89865
−0.19932
0.175231
−0.19932
0.175231
X Variable 8 X Variable 9
0.328796 −0.05033
0.117396 0.097266
2.800739 −0.51743
0.006202
0.09567
0.561921
0.09567
0.561921
0.606085
−0.24348
0.142823
−0.24348
0.142823
The meticulous regression analysis conducted in this study unveils a nuanced panorama that sheds light on the complex interplay between digital engagement and cybersecurity practices among college students. The overarching explanatory capacity of the regression model was estimated at 15.59%, denoted by the coefficient of determination (R Square = 0.155886). It
Exploring the Relationship Between Digital Engagement …
153
is essential to underscore, however, that when the adjusted R Square is considered (0.074198), accounting for the influence of predictors, the model’s explanatory efficacy was slightly diminished. Among the cohort of independent variables, a noteworthy revelation emerged. Specifically, X Variable 3, representing mobile device usage, exhibited a significant positive relationship with cybersecurity practices (Coefficient = 0.272036, p-value = 0.027529). This unveils a compelling insight, indicating that heightened engagement with mobile devices corresponds to a pronounced elevation in the embrace of robust cybersecurity practices within the college student demographic. Conversely, the analysis found that the remaining independent variables—X Variable 1, X Variable 2, X Variable 4, X Variable 5, X Variable 6, X Variable 7, and X Variable 9—did not manifest statistically significant relationships with cybersecurity practices. This intriguing outcome warrants a discerning contemplation, suggesting either the non-direct influence of these variables or the potential influence of latent factors yet unexplored within the purview of this study. Figure 1 of regression coefficients provides valuable insights into the relationship between various independent variables and the dependent variable, “Phishing Attacks Experience”. Each cell in the table represents the coefficient of the linear regression model that quantifies the extent of influence one independent variable has on the dependent variable while controlling for the effects of other variables. Positive coefficients indicate a positive impact on phishing attack experiences, while negative coefficients suggest a negative impact. These coefficients, along with their p-values, provide a comprehensive understanding of the individual contributions of different digital engagement and cybersecurity practices in influencing students’ vulnerability to phishing attacks.
Regression Coefficients Coefficient
Intercept
X Variable 1 X Variable 2 X Variable 3 X Variable 4
X Variable 5 X Variable 6 X Variable 7 X Variable 8 X Variable 9 Fig. 1 Regression coefficients
154
F. Khan et al.
The ANOVA results in Fig. 2 indicate that the regression model has been statistically significant (p = 0.0601), suggesting that at least some of the independent variables collectively contribute to explaining the variance in the dependent variable, “Phishing Attacks Experience”. The F-statistic of 1.908 implies a reasonable fit of the model, although further exploration is required to determine the specific significance of individual variables. The multiple regression statistics in Fig. 3 highlight essential measures of the regression model’s quality. The R-squared value of 0.155 indicates that approximately 15.5% of the variance in “Phishing Attacks Experience” can be explained by the independent variables. The Adjusted R-squared value of 0.074 considers the number of predictors, demonstrating that around 7.4% of the variability is explained while accounting for model complexity. The standard error of 0.984 suggests the average distance between observed values and the regression line, aiding in evaluating the model’s accuracy. Figure 4 illustrates the relationships between various independent variables and the dependent variable in the context of college students’ cybersecurity practices. Each independent variable, such as “Social Media Use,” “Online Game Playing,” etc., is examined for its impact on the dependent variable “Phishing Attacks Experience”. The table showcases corresponding coefficients, their statistical significance, and confidence intervals, providing insights into the significance of each independent variable’s contribution to cybersecurity practices. This collection of results firmly underscores the pivotal role that mobile device use plays in dictating college students’ cybersecurity practices. It unveils a crucial juncture wherein addressing mobile security awareness and propounding best practices assume paramount significance. The resonance of this finding resonates as a clarion call, signaling the need to bolster the realm of cybersecurity education within
Significance F F MS SS df 0
20
40 Total
Fig. 2 ANOVA results
60 Residual
80 Regression
100
120
Exploring the Relationship Between Digital Engagement …
155
1%
Multiple R R Square Adjusted R Square Standard Error Observations
99%
Fig. 3 Multiple regression statistics
Count of Independent Variable by Dependent Variable (Phishing Attacks Experience)
Positively Associated
Not Significantly Associated
0
1
2
3
4
5
6
7
8
Fig. 4 Relationships between independent and dependent variables
this demographic, particularly focusing on the intricate nuances of mobile device usage. The findings collectively traverse beyond statistical implications, encapsulating profound insights. The regression analysis, while offering a glimpse into the intricate terrain of the digital and security nexus, serves as a stepping-stone. It beckons forth a myriad of questions and pathways for future research. The potential interactions between the observed variables and the existence of uncharted dimensions remain fertile grounds for continued exploration. In summation, the regression analysis lends a discerning lens, illuminating facets of the intricate relationship between digital engagement and cybersecurity practices among college students. The undeniable significance of mobile device usage as an influencer underscores the need for a holistic cybersecurity education, setting the stage for informed interventions. This study, while a harbinger of insights, simultaneously beckons forth a panorama of unexplored avenues, seeding the fertile soil for further scholarly endeavors.
156
F. Khan et al.
5 Discussion The findings emerging from the extensive regression analysis resonate as significant markers in comprehending the nuanced relationship between college students’ digital engagement and their assimilation of cybersecurity practices. Notably, the results cast a spotlight on the affirmative role of mobile device utilization in fostering cybersecurity conscientiousness among these students. The data attests that heightened engagement with mobile devices corresponds to an elevated propensity to adopt robust cybersecurity practices. This includes the cultivation of fortified passwords, diligent software updates, and an inherent wariness against phishing attempts. However, in a twist of expectations, certain dimensions did not demonstrate statistically significant associations. Elements such as social media immersion, online gaming proclivity, two-factor authentication adoption, utilization of antivirus software, instances of password sharing, downloading software from untrusted sources, and personal data loss—while initially assumed to be potential influencers—failed to manifest as substantial determinants of cybersecurity practices. This intriguing divergence from anticipation suggests a multifaceted landscape wherein these factors might not exert direct influence or could be influenced by other unexplored variables. It is pivotal to underscore that the cumulative explanatory potency of the regression model emerged as moderately impactful. This observation hints at the existence of external factors, not encompassed within this study that might collectively orchestrate the variance witnessed in college students’ cybersecurity practices. In this vein, future research trajectories beckon us to embrace a broader canvas. Variables such as the level of awareness, the efficacy of educational interventions, and the sway of institutional policies warrant extensive exploration. An all-encompassing comprehension of the determinants of cybersecurity practices necessitates a holistic inquiry. The implications of these findings are profound and resonate across academia and practical domains. The spotlight illuminated on mobile device usage underscores the efficacy of educational interventions tailored to this dimension. The imperative of these programs is underscored by their potential to heighten students’ awareness of cyber threats and to imbue them with the expertise to navigate the digital sphere judiciously. In this context, educational institutions emerge as catalysts in fostering prudent online behavior, equipping students with the discernment to forge informed choices and cultivate fortified cybersecurity habits. However, these findings, while notable, represent but a fraction of the mosaic. The complex interplay between digital engagement and cybersecurity practices remains an intricate puzzle demanding further exploration. A call for more in-depth inquiries resonates as a resounding theme. These findings weave seamlessly into the existing fabric of knowledge, enhancing our understanding and proffering insights that extend their influence to the realm of strategies designed to elevate cybersecurity awareness and practices within the college student cohort. In summation, the findings unravel a portion of the narrative, rendering a clearer understanding of the relationship between digital engagement and cybersecurity practices among college students. Mobile device usage emerges as a beacon, illuminating
Exploring the Relationship Between Digital Engagement …
157
pathways for further exploration. The evocative resonance of these findings reverberates across realms of academia and practicality, as stakeholders are summoned to intensify their efforts in sculpting a secure digital landscape for our future generations.
6 Conclusion In this research endeavor, the primary objective encompassed an in-depth examination of the interplay between digital engagement and the adoption of cybersecurity practices within the realm of college students. The culmination of this investigation has illuminated multifaceted insights into the factors that underpin the integration of cybersecurity measures by college students and offers a roadmap for cultivating a more secure online comportment. The pivotal findings emanating from the rigorous regression analysis spotlight a noteworthy correlation between mobile device usage and the cultivation of robust cybersecurity practices within the college student demographic. This correlation underscores the propensity for heightened engagement with mobile devices to correspond with an elevated manifestation of prudent cybersecurity habits. However, the panorama of results has revealed that other factors, ranging from social media involvement and online gaming to the implementation of two-factor authentication, utilization of antivirus software, instances of password sharing, downloading software from untrusted sources, and incidents of personal data compromise, do not exhibit a statistically significant nexus with cybersecurity practices. These discernments, in totality, augment the landscape of knowledge in the intersection of digital engagement and cybersecurity practices by casting a spotlight on the pivotal role that mobile device engagement plays in fostering a heightened awareness of cybersecurity dynamics among college students. This narrative accentuates the necessity for tailored educational initiatives that cogently address the nuances and vulnerabilities entailed in the use of mobile devices. By prioritizing a comprehensive discourse on mobile security and imparting prescriptive guidelines for prudent practices, academic institutions stand poised to empower students to shield their personal information and bolster their resilience against cyber threats. In a broader vista, this research engenders reverberations of significance for the promulgation of safer online conduct within the college student demographic. These findings underscore the indispensability of an all-encompassing cybersecurity education framework that encompasses not only mobile device integration but also the entirety of the digital engagement spectrum. The pragmatic course of action for institutions lies in the formulation of bespoke programs and awareness campaigns that harmonize with the distinctive behavioral patterns and requisites of college students. By doing so, they can effectively endow students with the knowledge and proficiencies imperative for navigating the digital terrain securely. In summation, this research enterprise not only contributes a new stratum of comprehension to the intricate interrelationship between digital engagement and
158
F. Khan et al.
cybersecurity practices among college students but also proffers invaluable guideposts for policymakers, educators, and institutions to curate strategies and interventions aimed at heightening cybersecurity cognizance and efficacy. Through the promotion of judicious online comportment, a more impervious digital milieu can be actualized for college students, curtailing the perils tethered to cyber vulnerabilities. Ultimately, this research study stands as a foundational cornerstone, beckoning forth further explorations and beckoning the call for prospective inquiries to delve deeper into the intricate web of factors steering cybersecurity practices among college students. This iterative pursuit promises refinement and amelioration in the endeavor to propagate cybersecurity awareness and safeguard individuals within an increasingly interconnected global landscape.
References 1. Anderson S et al (2021) Enhancing cybersecurity practices among college students: a case study of awareness programs. J Inf Secur 14(3):211–226 2. Anderson R, Thomas B (2019) Cybersecurity threats and countermeasures: a study on college students. J Cybersec 7(2):112–125 3. Brown A, Wilson C (2021) Digital engagement and cybersecurity: exploring the relationship among college students. Int J Cybersec Res 9(1):35–48 4. Davis M, Moore J (2016) Cybersecurity practices among college students: an exploratory study. J Comput Secur 5(2):78–92 5. Dunn C, Koivunen M, Leppänen V, Yli-Huumo J (2018) Ransomware—a study on user vulnerability. Comput Hum Behav 87:11–28 6. Furnell S, Whitfield A, Boisot S (2021) Users’ perceptions of mobile device security. Int J Inform Sec Res 11(2):81–92 7. Garcia L, Smith P (2018) Understanding the link between digital engagement and cybersecurity practices. Cybersec J 12(3):189–202 8. Hadnagy C, Fincher R, Pritchett M (2017) Phishing dark waters: the offensive and defensive sides of malicious emails. John Wiley & Sons 9. Hong Y, Thong JY, Tam KY (2020) Antecedents and consequences of phishing susceptibility: an empirical investigation. MIS Q 44(2):459–500 10. Jones S, Fox S (2019) Generations online in 2010. Pew Research Center 11. Jones R (2020) Navigating the virtual terrain: risks of identity theft, data breaches, and malware among college students. Cybersecurity Perspect 8(2):145–162 12. Junco R (2015) Student class standing, facebook use, and academic performance. J Appl Dev Psychol 36:18–29 13. Kross E, Verduyn P, Demiralp E, Park J, Lee DS, Lin N, Ybarra O (2013) Facebook use predicts declines in subjective well-being in young adults. PLoS ONE 8(8):e69841 14. Lee S, Kim M (2015) Defenses against the panorama: understanding how college students protect themselves in the cyber age. Cybersecurity Dyn 7(3):210–225 15. Liau AK, Khoo A, Ang PH (2018) Adolescent online gaming addiction: exploring the effect of self-control and the balance between positive and negative consequences. Addict Behav Rep 7:32–36 16. Madge C, Meek J, Wellens J, Hooley T (2019) Facebook, social integration and informal learning at university: ‘It is more for socialising and talking to friends about work than for actually doing work.’ Learn Media Technol 44(1):20–36 17. Majumdar A, Dey D, Basak SK (2020) Cybersecurity awareness among college students: an empirical analysis. Comput Secur 88:101626
Exploring the Relationship Between Digital Engagement …
159
18. Norouzizadeh Dezfouli M, Norouzi M, Moradi M (2021) Mobile security behaviors: a systematic review. Comput Secur 101:102276 19. Rajivan P, Siponen M, Vance A (2017) Measuring the impact of security training on phishing efficacy with the C-R-A-P metrics. Comput Secur 68:47–58 20. Smith A (2018) Teens, social media & technology 2018. Pew Research Center 21. Smith A (2022) Title of the source: exploring the impact of digital engagement on cybersecurity practices among college students. J Cybersecurity Educ 10(3):45–62 22. Smith J (2023) Revelations and empowerment: tailoring educational initiatives for prudent online behavior among college students. J Cybersecurity Educ 11(4):215–230 23. Snodgrass JG, Dengah HJ, Lacy MG, Fagan J, Most DE (2017) Magical thinking and neutralizing in response to online games among players of World of Warcraft and Diablo 3. J Ethnograph Qual Res 11(3):157–180 24. Stobert E, Biddle R (2014) Revising passwords for modern authentication: a case study of the potential usability and security benefits. Int J Hum Comput Stud 72(3):249–267 25. Vance A, Siponen M, Pahnila S (2020) Improving organizational password policy compliance: an integrated perspective. J Assoc Inf Syst 21(2):250–283 26. Wang JL, Sun S (2020) Adolescents’ problematic social media use and psychological wellbeing: a moderated mediation model of psychological needs and sleep quality. Addict Behav 101:105962 27. Wash R (2019) Digital password management: predicting password re-use among staff and students. Inform Comp Sec 27(2):191–207 28. Wyosnick ER, Dingman J, Fruhling A, Liu D (2021) Password choices of college students: a study of the interactions between digital natives, stress, and security knowledge. Inf Syst Secur 30(3):91–110 29. Yan J, Li W, Zhu Y (2021) User-side passwords security management: an empirical study of individual password behaviors. J Manag Inf Syst 38(1):1–33
Secure and Energy Efficient Routing in VANETs Using Nature Inspired Hybrid Optimization Gurjot Kaur and Deepti Kakkar
Abstract Ensuring security in Vehicular Ad-hoc Networks (VANETs) has emerged as the prominent requisite prior to its deployment. Due to their open ended dynamic nature, VANETs are prone to multiple security attacks that can be mitigated using lightweight security solutions based on trust models. There is a trade-off between accuracy and delay in making a decision via trust based models. In a scenario, where the behavior of the node can be estimated by its performance in the network, artificial intelligence models have proved to be most promising in terms of accuracy and fast decision making with appropriate adaptation to the environment. Thus, in this study, we propose the combination of two nature inspired algorithms, Remora optimization and Aquila optimization to select an optimal routing path based on higher trust, higher energy efficiency and lesser delay. The performance of the hybrid combination of the proposed algorithm is compared with the individual algorithm which shows a significant improvement in overall network efficiency. Keywords VANETs · Trust models · Artificial intelligence · Aquila optimization · Remora optimization
1 Introduction The idea to turn the traffic management easier and smarter has been shelved via Intelligent Transportation System (ITS). VANETs have been proposed as a part of ITS with the prime objective of better management of traffic, especially during the rush hours and preventing fatalities connected to road accidents [1]. An adhoc network is created on demand where messages related to emergency situations like road accidents, landslides, poor road conditions as well as commercial and G. Kaur (B) · D. Kakkar Dr. B.R. Ambedkar National Institute of Technology, Jalandhar, India e-mail: [email protected] D. Kakkar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_14
161
162
G. Kaur and D. Kakkar
Fig. 1 Vehicular ad-hoc networks
convenience based messages are communicated among the connected vehicles [2, 3]. The communication takes place between vehicle to vehicle (V2V) [4] and vehicle infrastructure (V2I) [5] where infrastructure could be Base stations (BS), or Road Side Units (RSUs) as shown in Fig. 1. Additionally, due to its highly dynamic and open ended nature, they are susceptible to even more complicated attacks on integrity, confidentiality, availability, authentication, privacy and non-repudiation [6]. Multiple surveys have discussed the best suited solutions to the security issues faced in the vehicular networks [6–8]. Most of these solutions are related to cryptography and digital signatures. While these solutions provide robustness against the above mentioned attacks but require dedicated infrastructure for the proper management of public and private keys which becomes highly expensive. One of the alternative solution that has been suggested in the literature is based on trust based models [9]. These models are extremely simple to deploy and do not relate to additional computational expenses unlike cryptography based solutions as they do not require a dedicated infrastructure [10, 11]. In this study, we propose the combination of two nature inspired optimizations which are Aquila optimization and Remora optimization. Remora optimization has shown some great results on some test functions but lacks severely in terms of achieving optimal global solution as its global exploration phase follows the inadequate Sailfish Optimization (SFO) strategy. The Aquila optimization is known to have a superior exploration phase. To overcome this inadequacy of Remora optimization, it has been combined with the Aquila optimization which offers good global exploration and exploitation. The performance of the hybrid optimization is compared with the individual algorithms as well as traditional algorithms used in trust based models.
Secure and Energy Efficient Routing in VANETs Using Nature Inspired …
163
Rest of the paper is presented as follows. Section 2 introduces relevant literature. Section 3 introduces the detailed methodology of the proposed model followed by the conclusion in Sect. 4.
2 Related Work Multiple trust models have been discussed in the literature that make decisions based on weighted voting of the recommendations [12, 13] or using Dempster Shafer Theory (DST) [14, 15]. However, with highly dynamic scenarios of the VANETs, the topology of the network and the attackers are always unpredictable. The better and more accurate decisions could be adaptively derived by using machine learning algorithms due to their proficiency in predicting patterns. In [16], the authors incorporated multiple objective based firefly algorithms (FA) to achieve optimal clustering in VANETs. The clusters with greater lifetime and less overhead were achieved due to the efficient local attraction and global regrouping characteristics of FA. Improvements in mean cluster size and PDR were achieved by the authors as compared to CLPSO and MOPSO algorithms. In [17], the advantages of Genetic algorithm (GA) are integrated with FA in order to achieve faster convergence and greater accuracy. The combination of the two optimizations resulted in better throughput, transmission times and PDR relative to the performance of individual algorithms. In [18], authors combined the enhanced version of Cuckoo Search algorithm along with AODV to route the packets efficiently. CSA has been incorporated along with the concept of fuzzy logic that limits the extra overhead in route selection by limiting the number of route requests. Consequently, routes with less overhead, less packet loss, less delay, greater throughput and greater PDR are selected. In [19], hybrid optimization has been proposed by the authors for the optimal routing as well as the attack detection in the VANET scenario. Firstly, CH selection and routing have been optimized using fractional Aquila optimization (FAO). Then, the Deep Maxout Network is incorporated which is further optimized using the combination of FAO and Spider monkey optimization to get the further benefits of higher accuracy and faster convergence. The proposed algorithm successfully achieves good classification accuracy and precision. Although, multiple nature inspired algorithms have been in use in the recent works for efficient routing in VANETs, very few works ensure the security while routing. The role of trust based models to ensure security while routing must be exploited as it provides a very lightweight, simple and less expensive to other alternatives like cryptography and digital signatures.
164
G. Kaur and D. Kakkar
3 Proposed Model In this work, to obtain a reliable routing protocol, we have emphasized on three parameters which are energy, trust and delay, i.e. an optimal path would be established between the two nodes only when the succeeding node is energy efficient, trustworthy and provides lowest delay. Firstly, a VANET scenario is created, in which routing strategy is designed using the combination of Remora and Aquila optimizations which take into account the parameters like energy, trust and delay to carry out the proposed optimization. The architecture of the developed approach is shown in Fig. 2.
3.1 Fitness Function It is the function that decides the optimal route to transmit and receive the data packets depending upon the parameters like energy, trust and delay in the network. It is given by Eq. 1: F=
1 (Energy + T r ust + 1 − Delay) 3
(1)
where F represents the fitness function and energy, trust and delay are presented as follows. The optimization is achieved by maximizing F. Energy: In a wireless medium, a node could either be in transmitting, receiving, idle or sleeping mode where energy consumption happens primarily in transmitting and
Fig. 2 Architecture of proposed model
Secure and Energy Efficient Routing in VANETs Using Nature Inspired …
165
receiving mode. The energy consumed during the transmission (E tms ) and receiving (Er ec ) phases is given by the following equations: S BW
(2)
S BW
(3)
E tms = Ptms ∗ T ime = (P S tms ∗ Itms ) Er ec = Pr ec ∗ T ime = (P S r ec ∗ Ir ec )
where Ptms and Pr ec denote the transmitted and received power, P S tms , Itms , P S r ec and Ir ec denotes the transmitted power supply, transmitted current, received power supply, and received current respectively. S denotes the size of the packet and BW denotes the bandwidth. The total cost of energy consumed per packet is given by: E total = E tms +
k
Er ec
(4)
1
where k denotes the number of receivers, including the destination node. Trust: A node becomes trustworthy in a network when it performs all the tasks allotted to it in a successful manner. If it fails to complete the designated task or it misbehaves, its reputation in the network decreases, subsequently decreasing its trust score. The trust score [20] of a node is estimated based on its performance in the network which is given by Eq. 5: t T r ust =
i=1
β t−1 S i ∗ ∂t S
(5)
where S i denotes the total successful tasks a node completed in time Ti , S denotes the total number of tasks a node is supposed to complete. β denotes the time attenuation factor which emphasizes on the latest performances in order to ensure the node is performing consistently well such that 0 < β > 1. ∂t is the trust parameter which is derived through the beta distribution and is given by: ∂t =
αi ,ifi ≥ 1 αi + γi
(6)
The maximum value of trust is taken as 10 in our simulation; a successful task would be positively reinforced by increasing the trust value by 0.1, while a misbehavior would result in a negative reinforcement of 0.5. Delay: The delay experienced in the network is given by the formula:
R − 2d Delay = Dmax ∗ R
(7)
166
G. Kaur and D. Kakkar
where the delay is given by the rounded value obtained on the RHS, Dmax is the largest permissible value of forwarding delay, R is the radius of broadcasting coverage and d is the distance between the source node and destination node. Being unpredictable, the wireless medium may lose certain packets due to delay in the network or due to low energy of a node. Hence, in order to avoid coarse grain revocation of trust, these two parameters have also been included along with the trust parameters as the optimization requirements.
3.2 Proposed Hybrid Optimization Algorithm Remora optimization is inspired from the parasitic behavior of the remora fish, which changes its hosts for different methods of preying. Its global exploration formula is being enhanced by incorporating Aquila optimization which results in better global update, and gives faster and precise convergence. The proposed approach is defined as follows: Free Travel: (Exploration Phase) SFO technique: When being with the group of swordfish, Remora’s position also varies similar to its movement. The location update formula for this scenario is given by: t t Ybest + Yrand t t − Yrand Yit+1 = Ybest − rand(0, 1) ∗ 2
(8)
t where t is the number of present iteration, Ybest is the best solution achieved yet, t Yrand is random location and rand generates a random number between 0 and 1. The Aquila optimization is based on the clever and rapid switching of the hunting techniques by Aquila. The alternating switching between multiple hunting methods makes this optimization very effective for finding the global optimum solution. Hence, the global exploration scheme of the remora optimization is updated by making use of the Levy’s flight parameter as described in the movement of Aquila [21] when it hunts using the contour flight with a short glide attack. The levy’s flight formula [21] emphasizes on the probability of heavy treads in random walking which further randomizes the exploration making it more effective. The Eq. 8 is hence updated as follows:
t t Ybest + Yrand t t − Yrand ∗ Levy( f ) Yit+1 = Ybest − rand(0, 1) ∗ 2
(9)
Experience attack: Remoras try to gain experience in order to determine whether it is time to change their hosts by taking small steps around the hosts. This movement is modeled as Eq. 10 from [22]:
Secure and Energy Efficient Routing in VANETs Using Nature Inspired …
Yten = Yit + Yit − Y pr e ∗ randn
167
(10)
Eat Thoughtfully (Exploitation) WOA strategy: When remora attaches itself to the whale, its position update is given in [22] as follows: Yi+1 = R ∗ er ∗ cos(2πr ) + Yi
(11)
Host feeding: The solution space here gets restricted to the location space of the host. Hence, the movement of remora towards the host is modeled in [22] as follows: Yit = Yit + m
(12)
The optimization algorithm is carried out according to the steps defined above. The pseudocode of the proposed algorithm is presented as Algorithm 1. A random integer G (0 or 1) would decide whether to follow SFO or WOA strategy. Algorithm 1: Pseudo code of proposed approach Set initial values of population size S and tmax Initialize positions of population Yi (i = 1,2,….,N) Initialize best solution Ybest and calculate best fitness F(Ybest ) while t < tmax , do Calculate the fitness value using Eq. 1 of each remora Make amendments if search agent goes beyond the search space Update r, m, G for each remora i, do if G(i) = 1, update the position using Eq. 9 elseif G(i) = 0, update position using Eq. 11 end if Get one step prediction by Eq. 10 Check fitness function values for host replacement If not replaced, feeding method is decided by Eq. 12 end for end while return Ybest
4 Results and Analysis This section discusses the results of the proposed algorithm (RAO) in terms of the parameters of Packet Delivery Ratio (PDR), average throughput, convergence time and True Positive Rate (TPR). The VANET model as suggested in [20] has been designed using Python. The comparison of the proposed algorithm has been carried out with the trust model based optimizations using individual algorithms: Aquila
168
G. Kaur and D. Kakkar
(AO) and Remora optimization (RO). The robustness of the trust model based optimizations has been compared with commonly used Dempster Shafer Theory (DST) and Weighted Voting (WV) trust based models in terms of the TPR. The comparison in terms of PDR is presented in Fig. 3. As seen in Fig. 3, the proposed RAO algorithm has the maximum PDR, as compared to RO and AO. The PDR of RAO is as high as 99.26% with 10 nodes and it merely drops to 98.32% when the number of nodes is increased to 50. At 50 nodes, RO shows PDR of 97.6% and AO shows around 98%. RAO maintains the highest PDR among the three even when the number of nodes is increased. The performance comparison with respect to average throughput is shown in Fig. 4. Fig. 3 Performance comparison with respect to PDR
Fig. 4 Performance comparison with respect to average throughput
Secure and Energy Efficient Routing in VANETs Using Nature Inspired …
169
With the increasing number of nodes, a similar downward trend as PDR is observed with the average throughput. It tends to decrease as the number of vehicles increases in the network. However, the maximum throughput is achieved by the amalgamation of the two individual optimizations which is around 102.6 kbps as compared to 96.8 and 95.2 kbps by AO and RO respectively at 10 nodes and 88.1, 81 and 79.1 kbps for RAO, AO and RO respectively at 50 nodes. The comparison of the three algorithms with respect to the convergence rate in ms is shown below in Fig. 5. Remora optimization faces slower convergence issue. The Aquila optimization performs better than the Remora optimization on the designed VANET model. However, the optimization becomes faster with the combination of two algorithms as AO tends to improve the convergence issues faced by RO. The RAO converges merely at 850 ms when 50 nodes are considered as compared to the 990.2 and 1190 ms taken by AO and RO respectively for 50 nodes. The comparison of the proposed technique has been carried out with the conventional decision logics followed in the trust based models: DST and WV; as well as individual: RO and AO algorithms as shown in Fig. 6. All the algorithms perform fairly well when the attack ratio is low. As the attack ratio increases by 30%, performance of the MV method drops respective of the rest. By 70%, performance by DST also degrades to a small extent, however, TPR calculated by RO, AO and RAO remains more than 95% even with the attack ratios as high as 90%. This is due to the fact that Artificial Intelligence (AI) based techniques can better adapt to the changing scenarios in the dynamic structures as compared to the rigid decision logics.
Fig. 5 Performance comparison with respect to convergence time
170
G. Kaur and D. Kakkar
Fig. 6 Performance comparison with respect to TPR
5 Conclusion In this work, a hybrid combination of the two nature inspired algorithms Remora and Aquila optimizations has been proposed for VANETs. It makes the routing decisions by ensuring the security, energy efficiency and faster transmission of the route. The combination of the two optimizations gets the better of the individual properties and results in the overall improved performance in the network. The improvement of the proposed algorithm in calculating PDR at 50 nodes is approximately around 0.7% with RO and 0.3% with AO. The average throughput sees an improvement of 11.3% with RO and 8.6% with AO. Also, the convergence rate improves approximately by 28.5% as compared to RO and 15% as compared to AO. The TPR parameter achieved is significantly higher as compared to the traditional DST and WV based trust models with high attack ratio. As future work, the performance improvement can further be analyzed by using other effective nature inspired optimizations.
References 1. Xi S, Li X-M (2008) Study of the feasibility of VANET and its routing protocols. In: 2008 4th International conference on wireless communications, networking and mobile computing. IEEE 2. Cheng HT, Shan H, Zhuang W (2011) Infotainment and road safety service support in vehicular networking: From a communication perspective. Mech Syst Signal Proc 25(6):2020–2038 3. Kumar V, Mishra S, Chand N (2013) Applications of VANETs: present & future. Commun Netw 5(1):12–15 4. Vinel A, Lyamin N, Isachenkov P (2018) Modeling of V2V communications for C-ITS safety applications: a CPS perspective. IEEE Commun Lett 22(8):1600–1603
Secure and Energy Efficient Routing in VANETs Using Nature Inspired …
171
5. Zhang Y, et al (2018) Optimization of information interaction protocols in cooperative vehicleinfrastructure systems. Chinese J Electr 27(2):439–444 6. Arif M, et al (2019) A survey on security attacks in VANETs: communication, applications and challenges. Vehic Commun 19:100179 7. Hasrouny H, et al (2017) VANet security challenges and solutions: a survey. Vehic Commun 7:7–20 8. Muhammad M, Safdar GA (2018) Survey on existing authentication issues for cellular-assisted V2X communication. Vehic Commun 12:50–65 9. Sun R, Huang Y, Zhu L (2021) Communication by credence: trust communication in vehicular Ad Hoc networks. Mobile Netw Appl 1–13 10. Gazdar T, et al (2012) A distributed advanced analytical trust model for VANETs. In: 2012 IEEE global communications conference (GLOBECOM). IEEE 11. Kerrache CA, et al (2016) Trust management for vehicular networks: an adversary-oriented overview. IEEE Access 4:9293–9307 12. Ahmed S (2016) Trust establishment and management in ad hoc networks. University of Windsor (Canada), Diss 13. Hu H, et al (2016) REPLACE: a reliable trust-based platoon service recommendation scheme in VANET. IEEE Trans Vehic Technol 66(2):1786–1797 14. Li W, Song H (2015) ART: an attack-resistant trust management scheme for securing vehicular ad hoc networks. IEEE Trans Intell Transp Syst 17(4):960–969 15. Chaurasia BK, Sharma K (2019) Trust computation in VANET cloud. In: Transactions on computational science XXXIV. pp 77–95 16. Joshua CJ, Duraisamy R, Varadarajan V (2019) A reputation based weighted clustering protocol in VANET: a multi-objective firefly approach. Mobile Netw Appl 24(4):1199–1209 17. Singh GD, et al (2022) Hybrid genetic firefly algorithm-based routing protocol for VANETs. IEEE Access 10:9142–9151 18. Rahnamaei Yahiabadi S, Barekatain B, Raahemifar K (2019) TIHOO: an enhanced hybrid routing protocol in vehicular ad-hoc networks. EURASIP J Wireless Commun Netw 1(2019):1– 19 19. Kaur G, Kakkar D (2022) Hybrid optimization enabled trust-based secure routing with deep learning-based attack detection in VANET. Ad Hoc Netw 136:102961 20. Fazio P, De Rango F, Sottile C (2015) A predictive cross-layered interference management in a multichannel MAC with reactive routing in VANET. IEEE Trans Mobile Comp 15(8):1850– 1862 21. Abualigah L, et al (2021) Aquila optimizer: a novel meta-heuristic optimization algorithm. Comp Indus Eng 157:107250 22. Jia H, Peng X, Lang C (2021) Remora optimization algorithm. Expert Syst Appl 185:115665
Performance Evaluation of Machine Learning Models for Intrusion Detection in Wireless Sensor Networks: A Case Study Using the WSN DS Dataset Aryan Rana, Sunil Prajapat, Pankaj Kumar, and Kranti Kumar
Abstract The security of Wireless Sensor Networks (WSNs) depends on effective Intrusion Detection Systems (IDSs), which are susceptible to Denial of Service (DoS) attacks. This research paper assesses the performance of three machine learning models, namely KNN, Logistic Regression, and Decision Tree (DT), using the WSN DS dataset for WSN intrusion detection. The dataset featured DoS attacks of several types: scheduling, blackhole, flooding, and gray hole attacks. Each model’s performance metrics were calculated and compared, including precision, recall, and F-1 score. Results showed that the DT model consistently outperformed the other models, demonstrating its effectiveness in accurately predicting different types of DoS attacks. The DT model exhibited superior performance with respect to macroprecision, macro-recall, and macro-F-1 score, achieving values of 0.98 each. In contrast, the logistic regression and kNN models yielded lower values of 0.98, 0.96, 0.97, and 0.87, 0.85, 0.86, respectively. These findings have significant implications for practitioners and researchers working on securing WSNs against DoS attacks and highlight the importance of using machine learning-based IDSs to detect and mitigate security threats in WSNs. This study enhances our knowledge of the detection of intrusions in WSNs. It offers guidance for creating strong security measures to ensure these networks’ dependable and secure functioning. Keywords Wireless sensor networks · Machine learning · Denial of service · Intrusion detection system
1 Introduction Numerous sensor nodes comprise Wireless Sensor Networks (WSNs), which broadcast data to a centralized base station or sink node [1, 2]. Applications for WSNs include traffic management, home automation, and environmental monitoring [3]. However, the resource constraints on sensor nodes, flaws in wireless communication A. Rana · S. Prajapat · P. Kumar (B) · K. Kumar Srinivasa Ramanujan Department of Mathematics, Central University of Himachal Pradesh, Dharamsala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_15
173
174
A. Rana et al.
channels, and deployment methods for these networks provide substantial security issues. These flaws are anomalous for malicious individuals to exploit, giving them unauthorized access to the network and jeopardizing data confidentiality, availability, and integrity. Traditional cryptography techniques may not always be sufficient for securing WSNs [4]. Therefore, detecting and preventing intrusions and enhancing the security features of WSNs is crucial. A possible approach for offering security services to WSNs is using IDSs, which track network activity and look for any suspicious or malicious activity [5]. IDSs fall into one of two categories: anomaly-based IDSs or misuse-based IDSs. In contrast to anomaly-based IDSs, which employ statistical or machine learning techniques to understand the typical behavior of the network and detect any variation from it [4], misuse-based IDSs utilize predetermined rules or signatures to recognize known assaults [2]. Machine Learning (ML) techniques have been widely used in anomaly-based IDSs for WSNs due to their ability to learn from data and generalize to unseen situations [6, 7]. Supervised and unsupervised learning are two subcategories of ML methods. Unsupervised learning techniques can find hidden patterns or clusters in the data, whereas supervised learning approaches require labeled data for training and testing. In IDSs, artificial neural networks [8–10], support vector machines [11], decision trees [12], and other machine learning techniques are used to analyze the massive amounts of data that sensors in WSNs gather to detect possible intrusions. These techniques can automatically learn from the data and adapt to changing network conditions, effectively detecting known and unknown attacks. Additionally, machine learning-based IDSs can reduce false positives and false negatives, which are common in traditional rule-based IDSs. By leveraging machine learning techniques, IDSs in WSNs can enhance the security and reliability of WSNs. In this research, we use a dataset called WSN DS [4], which was built using the Low-Energy Adaptive Clustering Hierarchy (LEACH) protocol [13], to characterize four different forms of DoS attacks in WSNs. We use three well-known machine learning algorithms—k-Nearest Neighbors (KNN), logistic regression, and decision tree—on the WSN DS dataset to identify and counteract these threats. We assess the effectiveness of these algorithms using a variety of performance metrics, and then we compare the outcomes to determine which model fits the dataset the best. The results of this study will aid in developing efficient IDSs for protecting WSNs from DoS assaults and offer information on how various machine learning algorithms perform in this particular application. The following sections of this paper are structured as follows: Sect. 2 provides an extensive review of existing literature in the field. Section 3 presents a concise overview of the three fundamental machine learning models employed in this research. Section 4 elaborates on the results and discussions, specifically focusing on evaluating the models using the WSN DS dataset. Finally, Sect. 5 wraps up the paper by summarizing the findings and implications of the study.
Performance Evaluation of Machine Learning Models for Intrusion Detection …
175
2 Related Works IDSs for WSNs have drawn a lot of curiosity lately due to the rising need for dependable and secure communication in IoT and smart applications. Several studies have employed machine learning techniques to build IDS solutions for WSNs that can identify numerous types of intrusions and enhance the security of these networks. This section gives a summary of the present research and related endeavors in the area of machine learning-based IDSs for WSNs. Several MLs and Deep Learning (DL) techniques for creating anomaly-based IDSs for WSNs have been proposed. Ahmad et al. [14] detailedly assessed contemporary ML and DL algorithms used in Network-based IDSs (NIDSs) and explored their benefits, limits, problems, and future possibilities. They also presented a taxonomy of prominent machine and deep learning approaches based on their learning paradigms, architectures, and applications. An Enhanced Empirical-based Component Analysis (EECA) method for feature selection in NIDSs has been presented by Zhiqiang et al. cite14, which combines the advantages of empirical mode decomposition and principal component analysis. Long Short-Term Memory (LSTM) networks are used to categorize the selected characteristics, and their method is tested on four benchmark datasets: KDD99, UNSW NB 2015, CICIDS 2017, and NSL-KDD. They show that their method outperforms novel approaches regarding accuracy, recall, F1score, false positive rate, and false alarm rate. Scheduling, black hole, gray hole, and flooding attacks are the four forms of DoS attacks. Almomani et al. [4] constructed a specialized dataset for WSNs called WSN-DS to aid in improved detection and classification. One of the most often used hierarchical routing protocols in WSNs is the LEACH protocol, which they use. To create 23 features, they first gathered and processed data from Network Simulator 2 (NS-2). They divided the collected characteristics into normal and attack classes using Artificial Neural Networks (ANNs). Also, they demonstrated that IDS could attain greater classification accuracy rates thanks to their dataset. The Waikato Environment for Knowledge Analysis (WEKA) is used in an experiment by Al-Ahmadi [15] to compare the performance of five machine learning algorithms for identifying flooding, gray hole, black hole, and scheduling DoS assaults in WSNs. The approaches include neural networks, support vector machines, decision trees, random forests, and naive Bayes. The WSN-DS dataset is the foundation for the evaluation. With an accuracy of 99.72%, the data demonstrates that RF performs better than the other methods. In conclusion, the reviewed literature demonstrates the growing interest in employing machine learning algorithms for intrusion detection in WSNs, highlighting the diverse approaches and techniques utilized in the field. The insights gained from these related works serve as a foundation for developing the proposed IDS solution in this paper, which will be discussed in detail in the subsequent sections.
176
A. Rana et al.
3 Machine Learning Models Considered Three ML classifiers that are often used for IDS design are explored in this study due to their renowned performance in detecting various threats in various networks. The following are the three ML classifiers: • Decision Tree: The decision tree is a commonly used model for solving categorization problems, as it provides a graphical representation of all possible solutions to a given problem. It is often chosen due to its ability to mimic human thinking and easily understandable logic. The algorithm starts at the root node and proceeds recursively until it reaches the leaf node. Different methods are used to construct decision tree models, including classification and regression techniques such as CART (Classification and Regression Tree) and Incremental Deduction (ID). • K-NN Model: A well-liked technique for grouping new data into related or closest categories is the K-Nearest Neighbor (k-NN) algorithm. All available data is stored, and new data is categorized according to how similar it is to data already present in the closest category. One of its main benefits is that the k-NN method makes no underlying assumptions about the data, classifying it as a non-parametric algorithm. The “k” in k-NN refers to the number of neighbors used for classification, and there is no specific method for selecting this value, often requiring trial and error. However, this can be time-consuming. Additionally, the k-NN algorithm tends to be more effective for larger training data sets due to its reliance on finding the nearest neighbors based on distance metrics. • Logistic Regression Model A categorical dependent variable’s output may be predicted using the predictive modeling approach known as logistic regression, which also establishes the link between the dependent and independent variables. The linear regression function’s predicted continuous values are converted into categorical values by the Logistic Regression (LR) model using a sigmoid function. This is achieved by altering the constant variable data into probabilities between 0 and 1 using the sigmoid function. The sigmoid function maps the predicted values to a range of 0 and 1, resembling an “S” shape. This logistic function is fitted in the logistic regression model. As a supervised classification algorithm, logistic regression often employs regularization techniques to design an optimal model for attack detection. Regularization techniques such as L1 (Lasso) or L2 (Ridge) regularization are commonly used to prevent overfitting and enhance the model’s predictive accuracy. By incorporating regularization, logistic regression can effectively mitigate the risk of overfitting and provide a more robust and reliable solution for detecting attacks in various applications and domains.
Performance Evaluation of Machine Learning Models for Intrusion Detection …
177
4 Dataset Description, Results, Discussions, and Performance Evaluation WSN-DS is an exclusive dataset for IDSs for WSNs. Blackhole, flooding, gray hole, and scheduling attacks are the four Denial of Service (DoS) attacks affecting WSNs. Almomani et al. [4] classified them to make it easier to identify and categorize them. The LEACH protocol was utilized to retrieve the dataset using NS-2. A total of 374661 records with 19 characteristics, encompassing both typical and assault-related situations, are included in the collection. An ANN was trained using the dataset to distinguish between normal and attack network traffic. The results demonstrated that WSN-DS enhanced IDSs’ capacity to attain greater classification accuracy rates compared to other available datasets.
4.1 Performance Metrics • Confusion Matrix: A confusion matrix is a 2.×2 matrix to binary classify data where rows represent prediction and columns represent actual values. Negative Positive Negative True Negative False Negative Positive False Positive True Positive • Accuracy: Accuracy is the ratio of correctly classified points to total predictions made. TP +TN . An = (1) T P + T N + FP + FN • Recall: Recall provides insight into several correct positives out of all positive predictions that could have been made. .
Rn =
TP T P + FN
(2)
• Macro-Recall: The macro-Recall score in a multi-class classification task is the average of the recall scores for each class. It is obtained by dividing the total number of classes by the sum of the recall scores for each class. .
Macr o − Recall =
Recall1 + · · · + Recall N N
• Precision: Precision is the quantification of true positives of all the positives
(3)
178
A. Rana et al.
Table 1 Comparison of yielded values Decision tree Precision macro avg Recall macro avg F-1 Score macro avg
0.98 0.98 0.98
k-NN
Logistic regression
0.98 0.96 0.97
0.87 0.85 0.86
TP (4) T P + FP • Macro-Precision: Precision models calculate the precision of each class separately and then average out the precision to get an accurate precision matrix. .
.
Pn =
Macr o − Pr ecision =
Pr ecision1 + · · · + Pr ecision N N
(5)
• F-Beta score: The F-beta score is a weighted harmonic mean of precision and recall, where beta determines the weight assigned to precision and recall. .
Fβ = (1 + β 2 )
Pr ecison.Recall + r ecall
(β 2 . pr ecision)
(6)
β is a parameter that determines the weight of precision in the score. A high value of .β emphasizes recall, whereas a low value of beta emphasizes precision. • Macro-F-Beta Score: The macro-F-beta score calculates the F-beta score for each class separately and then takes the average of those scores. .
.
Macr oFβ = (1 + β 2 )
Macr o − Pr ecison.Macr o − Recall − pr ecision) + Macr o − r ecall
(β 2 .Macr o
(7)
4.2 Results, Discussions, and Performance Evaluation This section presents the outcomes of the classical machine learning models applied to the WSN DS dataset. The simulations were conducted using Python on a Windows 11 operating system, utilizing an Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz, 8.00 GB RAM, and a 64-bit operating system. Of 374661 records, 299728 records have been allocated for training and validation, while the remaining 74733 records have been set aside for testing. This division ensures that a significant portion of the dataset, comprising approximately 80% of the records, is utilized for training and validation purposes, allowing the model to learn from a diverse range of data. The remaining 20% of the records are reserved
Performance Evaluation of Machine Learning Models for Intrusion Detection …
(a) Comparison of F-1 Score value
179
(b) Comparison of Precision and Recall values
Fig. 1 .
for testing, an independent evaluation set to assess the model’s performance and generalization capabilities. Table 1 depicts the performance of different models on the utilized dataset. The confusion matrix for each model serves as the basis for computing recall, precision, and F-1 score. The decision tree model has remarkable macro-precision, recall, and F-1 score values of 0.98, 0.98, and 0.98, respectively. In contrast, the KNN model displays somewhat lower macro-precision, recall, and F-1 score values than the decision tree model, at 0.98, 0.96, and 0.97, respectively. Last but not least, the logistic regression model has macro-precision, recall, and F-1 score values of 0.86, 0.87, and 0.85, respectively, which distinguishes it from the two mentioned above in a comparable manner. The decision tree classification strategy best matches the WSN DS dataset based on the results shown in Fig. 1a and b, consistently outperforming the other models regarding precision, recall, and F-1 score.
5 Conclusion WSNs are critical in enabling efficient data collection and monitoring in various domains. To protect the security and integrity of the network and its data, it is essential to establish an efficient IDS in WSNs due to their inherent weaknesses. To test the effectiveness of three fundamental machine learning models, decision tree, LR, and K-NN, for detecting intrusions in WSNs, this research article used a dataset named WSN DS [4]. The dataset included several DoS attacks that might be used against WSNs, including gray hole, black hole, scheduling, and flooding attacks. Each model’s performance metrics were calculated and compared, including precision, recall, and F-1 score. Based on the outcomes, it was discovered that the decision tree model surpassed other models in terms of F-1 score values, precision, and recall. The decision tree model’s ideal macro-precision, recall, and F-1 score values demonstrate
180
A. Rana et al.
how well it can correctly forecast various DoS assaults in the WSN DS dataset. The results of this study have important ramifications for WSN intrusion detection. The early identification and mitigation of security risks in WSNs can benefit from the decision tree model’s ability to classify various DoS assaults precisely. These findings might be helpful for academics and practitioners trying to protect WSNs against different kinds of DoS assaults. In conclusion, these results contribute to understanding machine learning-based intrusion detection in WSNs and provide insights for developing effective security solutions in these networks to ensure their reliable and secure operation.
References 1. Gungor VC, Lu B, Hancke GP (2010) Opportunities and challenges of wireless sensor networks in smart grid. IEEE Trans Ind Electr 57(10):3557–3564 2. Rassam MA , Maarof MA, Zainal A (2012) A survey of intrusion detection schemes in wireless sensor networks. Am J Appl Sci 9(10):1636 3. Marriwala N, Rathee P (2012) An approach to increase the wireless sensor network lifetime. In: 2012 World congress on information and communication technologies, pp 495–499. IEEE 4. Almomani I, Al-Kasasbeh B, Al-Akhras M (2016) Wsn-ds: a dataset for intrusion detection systems in wireless sensor networks. J Sensors 2016 5. Butun I, Morgera SD, Sankar R (2013) A survey of intrusion detection systems in wireless sensor networks. IEEE Commun Surv Tutor 16(1):266–282 6. Yu Z, Tsai JJP (2008) A framework of machine learning based intrusion detection for wireless sensor networks. In: 2008 IEEE International conference on sensor networks, ubiquitous, and trustworthy computing (sutc 2008), pp 272–279 7. Khan ZA, Samad A (2017) A study of machine learning in wireless sensor network. Int J Comput Netw Appl 4(4):105–112 8. Alrajeh NA, Lloret J (2013) Intrusion detection systems based on artificial intelligence techniques in wireless sensor networks. Int J Distrib Sens Netw 9(10):351047 9. Gowdhaman V, Dhanapal R (2021) An intrusion detection system for wireless sensor networks using deep neural network. Soft Comput:1–9 10. Wani S, Yadav D, Verma OP (2020) Development of disaster management and awareness system using twitter analysis: a case study of 2018 Kerala floods. In: Soft computing: theories and applications: proceedings of SoCTA 2018, pp 1165–1174. Springer 11. Mohd N, Singh A, Bhadauria HS (2020) A novel SVM based IDS for distributed denial of sleep strike in wireless sensor networks. Wireless Pers Commun 111(3):1999–2022 12. Nancy P, Muthurajkumar S, Ganapathy S, Santhosh Kumar SVN, Selvi M, Arputharaj K (2020)Intrusion detection using dynamic feature selection and fuzzy temporal decision tree classification for wireless sensor networks. IET Commun 14(5):888–895 13. Heinzelman WR, Chandrakasan A, Balakrishnan H (2000) Energy-efficient communication protocol for wireless microsensor networks. In: Proceedings of the 33rd annual Hawaii international conference on system sciences, p 10. IEEE 14. Ahmad Z, Shahid Khan A, Wai Shiang C, Abdullah J, Ahmad F (2021) Network intrusion detection system: a systematic study of machine learning and deep learning approaches. Trans Emerg Telecommun Technol 32(1):e4150 15. Alsulaiman L, Al-Ahmadi S (2021) Performance evaluation of machine learning techniques for dos detection in wireless sensor network. Preprint at arXiv:2104.01963
Arduino Controlled 3D Object Scanner and Image Classification Amoli Belsare, Sahishnu Wankhede, Nitin Satpute, Ganesh Dake, Vedang Kali, and Vishal Chawde
Abstract A three-dimensional (3D) object scanner is a device that captures and creates a digital 3D representation of a physical object. With the growing demand for faster development of 3D models, there is a need for increased production rates. As a result, it is essential for this technology to be cost-effective and easily accessible to consumers. This can be achieved by making use of affordable components and ensuring that the necessary resources are readily available. To overcome this main issue, it is proposed to build a low-cost standalone 3D scanning system that uses information from a Sharp IR sensor and web camera to generate digitized 3D models. These models may subsequently be utilized in digital animation or 3D printing for a range of purposes such as toys, prosthetics, antiquities, and so on. Another focus of this 3D scanner is to classify an object and present its information by using an image processing approach. Additionally, two-dimensional (2D) images to 3D mesh conversion are carried out algorithmically to provide information about scanned objects to newcomers in the field of electronics. Experimental analysis is performed for identifying the capacitor through the integration of a 3D scanner controlled by Arduino. The collected data was then uploaded to a computer for additional MATLAB built-in functions and external libraries, aiming to achieve reliable identification results. Keywords MESHLAB · Arduino · NEMA17 stepper motor · CAD · 3D scanning
1 Introduction A 3D scanner digitizes and records the geometry of actual items, whereas a 3D printer turns digital information into a tangible product. The scanner system can be used in a variety of fields, including machine vision, museums to digitally preserve historical A. Belsare (B) · S. Wankhede · G. Dake · V. Kali · V. Chawde Department of Electronics and Telecommunication Engineering, Yeshwantrao Chavan College of Engineering, Nagpur, India e-mail: [email protected] N. Satpute Technology Innovation Institute (TII), Abu Dhabi, UAE © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_16
181
182
A. Belsare et al.
monuments, dentistry, for creating 3D models of intricate shapes, factories for documentation, etc. [1]. In the literature, the studies to extract 3D data, using high-speed stereo capture, can be seen, and motion correction in such processes is to be carried out additionally to adjust moving objects. 3D researchers have produced a variety of 3D object scanners, ranging from contact-based devices to non-contact-based ones like laser scanners [2–4]. The scanners are also used in medical sectors for diagnosis and prognosis of a disease or to find surface flaws [5]. Contact and optical technologies are two key ways proposed in one of the related studies based on the scanning speed and precision of the two approaches [6]. In another study, the scanner works by photographing an object from various angles and then analyzing the captured photos to build a 3D model using software [7, 8]. These methods are relying on the object’s light and assessment of deviations from an expected standard form. Large calculations in the optical approach may slow down the scanning process, whereas the contact technique entails utilizing a contact coordinate quantifying machine to perform a sequence of quantifications by moving the arm from the sensor to the item and reading the linear encoder’s position to record the contact point’s location. Few studies show that texture analysis, in addition to geometry and dimension information, may be utilized to identify items in a 3D scanner. ˙It is the study of the patterns and characteristics of an object’s surface being captured, such as its color, roughness, and reflectivity to extract texture information, which can subsequently be used to identify the object. Texture analysis used to identify things based on surface marks is focused on in another research work for fruit quality and inspection [9]. Texture analysis using various techniques, which includes statistical analysis, frequency analysis, and spectral analysis, etc. for object detection and 3D reconstruction is also presented in [10–13]. Some of the studies depict that the scanner can detect the form of the item by examining the distortions in the projected patterns. These scanners are generally cheap and simple to operate, although they might struggle with shiny or clear materials. Structured-light scanners employ a projector to shine a series of patterns onto the item, which is then captured by a camera; X-rays are used by computed tomography scanners to scan the item, generating a 3D representation of its inside structure. These are costly, time-consuming, and need specific expertise to operate [6–8, 14, 15]. Another work focused on the scanner to produce a 3D representation of an object by triangulating the camera locations using Ball-Pivoting Algorithm (BPA) [16]. These scanners are flexible and may be used with a wide range of items, but they require precise camera calibration. On other hand, object recognition in 3D scanning has also a wide range of applications across different fields such as surveillance and Content-Based Image Retrieval (CBIR), among others [17, 18]. The conversion of the 2D image into a 3D mesh using image processing is a great challenge; therefore to classify and interpret items in an image, firstly, segmentation of it into regions that correspond to particular objects is performed. This is accomplished by breaking the image into smaller sections and then merging them using a collection of classifiers based on similarities. Based on cues from the RGB image, the depth of image, and the estimated scene structure, these classifiers are trained to predict if two regions correspond to the same object instance. The classifiers can
Arduino Controlled 3D Object Scanner and Image Classification
183
discover common traits between regions and integrate them to build a more accurate representation of the objects in the image by leveraging these signals [19]. The incorporation of object recognition in 3D scanning opens up possibilities for product authentication, forensic investigations, and automated systems that can identify and track objects of interest. A case study on real life on 3D scanners and printers is presented in [20]. By automating the process of capturing and analyzing 3D data, 3D object scanners can significantly improve efficiency, reduce manual labor, and save time and costs in various industries. From the literature, one could recognize the need and scope of portable 3D scanner design and the system could also be used in education for primary classes. The proposed 3D Object Scanner with Image Classification is focused on giving insight into electronic components to primary class students in a virtual way. This system is built with a sharp distance sensor, Arduino UNO, and Stepper motor. The system assists in achieving the following goals: converting data into 3D coordinate point clouds, reconstructing those 3D objects using a CAD tool, and identifying and describing the reconstructed real-world item.
2 Methodology 2.1 A System Overview The proposed 3D scanner system and corresponding circuit design are depicted in Figs. 1 and 2, respectively. The system is made up of several components such as Sharp IR sensor, Arduino microcontroller, micro-SD TF card shield, NEMA 17 stepper motor, stepper motor driver, and camera for photo capture and image processing. The Arduino microcontroller acts as the scanner’s brain. It accepts IR sensor data and controls the stepper motor to move in the required direction. Sharp’s infrared sensor measures the distance between the scanner and the thing being scanned. It sends a signal representing the distance to the Arduino microcontroller. To scan the item from various angles, the NEMA 17 stepper motor moves the scanner along the x, y, and z axes. The stepper motor driver takes control signals from a microcontroller and translates them into the voltage and current levels required to operate the motor. The key component of system designed is the micro-SD card module. It is used to store the 3D point cloud data received from the object scanning. The camera is used to take photos of the item being scanned from various perspectives. The camera’s major goal is to identify real-time object and display its information about the characteristics and the application of the object. Further, an image analysis algorithm is applied to categorize the object and convert the 2D to a 3D image by predicting depth maps and generating 3D meshes.
184
A. Belsare et al.
Fig. 1 Block diagram of 3D scanner system
Fig. 2 Circuit design for 3D scanner system
2.2 Image Classification The image classification and reconstruction process is depicted in Fig. 3, where the camera takes a picture of the object and is processed to get the details of the object scanned. Feature selection block picks the most relevant features from the
Arduino Controlled 3D Object Scanner and Image Classification
185
Fig. 3 Block diagram image reconstruction
collection of retrieved Gabor features of the object for accurate classification. This minimizes the feature space’s size and increases classification accuracy. The output block shows the name or label of the detected item, or it can initiate an action based on the discovered object. For example, it may detect an electronic object and present the information that must be learned about it. The preprocessing stage prepares the 2D picture for 3D conversion. This involves scaling the image, changing the color balance, or eliminating image noise. The depth estimation block computes the depth map of the 2D picture. It is a grayscale picture in which the intensity value of each pixel shows the distance from the camera to the item. The depth and disparity maps are used to recreate a 3D point cloud with the point cloud reconstruction block. A point cloud is a collection of 3D points that indicate the form of an item in three dimensions. The 3D reconstruction block generates a 3D model of the item from the point cloud. The output block displays the object’s 3D picture. As a result, the 2D picture is received by the input block, and it is processed by the preprocessing block. The depth and disparity estimation blocks, respectively, estimate the depth and disparity maps. The point cloud reconstruction block builds a 3D model of the item by reconstructing a 3D point cloud from the depth and disparity data. Figure 4 shows the flow diagram of the 3D scanner and image reconstruction with both the enabled the camera and the sharp Infrared sensor. The distance sensor will then begin gathering coordinates in the XYZ axis. The stepper motor1 will begin rotating in 360° with an item and will complete one rotation in 40 s. Following one rotation of the stepper motor (x and y axis), the second stepper motor will move upward (in the z-axis). The camera is attached to the laptop and pointed at the objects to capture photos once the camera takes a picture; once the camera captures a photo of the object,
186
A. Belsare et al.
Fig. 4 Flow diagram of the 3D scanner and image reconstruction
the Gabor feature extraction and selection blocks extract and pick relevant features from the image. The object recognition block compares the object to the nearest known object in the database. A scanned 2D image is converted to a 3D image using Python Gradio which includes a process as mentioned in the section ahead. The 2D picture is first taken as input and preprocessed to improve its quality. The final phase entails reconstructing a 3D point cloud describing the object’s form in 3D space using depth and disparity maps. The point cloud is then used to produce a 3D model, which is subsequently shown to the user. Python Gradio simplifies the process of connecting these processing processes and creating a user-friendly interface for users to transform 2D photos into 3D images. Figure 5a, b shows the actual implementation of the proposed 3D scanner system and Stepper Motor driver mounted on PCB, respectively.
Fig. 5 Project design: a implemented scanner system, b stepper motor driver mounted on PCB
Arduino Controlled 3D Object Scanner and Image Classification
187
2.3 2D Image to 3D Mesh Reconstruction Python script that creates a graphical user interface (GUI) using the Gradio library is implemented for 2D image to 3D reconstruction by predicting depth maps and generating 3D meshes. The Open3D library is used to create point clouds and generate 3D meshes from the predicted depth maps. PyTorch and Transformers are used to load and run a pre-trained depth estimation model (GLPNForDepthEstimation) and feature extractor (GLPNImageProcessor) from the Transformers model hub. The ‘predict_depth’ function takes an image as input and predicts a depth map for the image by using the pre-trained model. This function resizes the image to a suitable size for the model, prepares the image for input into the model, and runs the model to obtain the predicted depth map. The function first calls predict_depth to obtain the depth map, then calls generate_mesh to generate the 3D mesh. The function also converts the predicted depth map to a color map and returns the result as an image. The ‘iface’ variable creates the GUI using the Gradio library. This allows the user to upload an image and adjust the mesh quality using a slider.
3 Mathematical Analysis In the 3D scanner system design, the following mathematical analysis is being carried out at each phase of work.
3.1 3D Scanner Design The rotating table with the object placed in the designed system is shown in Fig. 6a. Here consider the green color as an object. d is the minimum distance between the object and the distance sensor. After the initial stage, the turning table will turn at certain degrees (let us say f ), but the initial distance will be the same, and the angle of rotation will be denoted as f forming a triangle shown in Fig. 6b; therefore, the coordinates of the object to be scanned could be found using equations as given below X = distance(d) × sin( f )
(1)
Y = distance(d) × cos( f )
(2)
Z=Z
(3)
188
A. Belsare et al.
Fig. 6 a Rotating table with object placed; b rotating table with angle f
These captured XYZ coordinates are then used to form point cloud and 3D image of the object.
3.2 Texture Analysis Using Gabor Features Along with the object sensor, a camera is also mounted on the table to capture an image of the object being scanned and load it into MATLAB. This information is used to identify and classify the object. Once the multiple images of the object are captured, they are preprocessed with conversion from YUV color space to RGB color space, and filtering operation to enhance the input image. Extract Gabor features from the image using a bank of Gabor filters with varying orientations and frequencies.
3.2.1
Image Preprocessing: YUV to RGB Conversion R = Y + 1.13983 × (V − 128)
(4)
G = Y − 0.39465 × (U − 128) − 0.58060 × (V − 128)
(5)
B = Y + 2.03211 × (U − 128)
(6)
In these equations, the variables Y, U, and V represent the luminance, chrominance (blue projection), and chrominance (red projection) components, respectively, of the YUV color space. The variables R, G, and B represent the resulting red, green, and
Arduino Controlled 3D Object Scanner and Image Classification
189
blue components of the RGB color space after the conversion. They are used to convert the captured image from YUV to RGB color space.
3.2.2
Gabor Filter
The multi-scale, multi-orientation Gabor filter computes texture descriptor for the image captured, which helps in identifying the object algorithmically. The Gabor kernels are calculated as g(x, y) = exp(−(x '2 +γ 2 y '2 )/(2σ 2 )) × cos(2π × x ' /λ + ψ)
(7)
where x ' = x cos θ + y sin θ y ' = −x sin θ + y cos θ These kernels extract features from an image at different orientations and frequencies, where x, and y are initial coordinates and x ' and y ' are coordinates after rotation. θ shows orientation and σ is space constant. λ represents the wavelength, γ is the spatial aspect ratio, and ψ is phase offset. The values of these constant are determined experimentally to obtain accurate texture representation of the scanned object.
3.2.3
Feature Vector Normalization
//
f _norm = f
∑
f2
(8)
Here, f represents the original feature vector, and f_norm represents the normalized feature vector. This is used to normalize the feature vectors calculated from Gabor texture approach to have unit length.
3.2.4
Similarity Measurement / d=
(
∑
(( f 1 − f 2)2 ))
(9)
The Euclidean distance between two feature vectors is denoted as f 1 and f 2. This represents the distance between two feature vectors so as to classify the object scanned accurately.
190
A. Belsare et al.
4 Result and Discussion Figure 7a shows the sample object: Capacitor. To obtain the scanned object through coordinates, place the battery on the rotating platform of the 3D scanner. After turning ON the power supply, the 3D scanner scans the object in XYZ axis. After scanning the object, the SD card stores the values, and manually by using an SD card reader imports these coordinates to form a mesh point cloud in MESHLAB as shown in Fig. 7b. Further by applying filters, the point cloud is converted to STL/OBJ file. Firstly, go to Filters then Normal, Curvatures, and Orientation then Compute Normal for Point Sets. After obtaining the point set go to Point Cloud Simplification. Then by applying Surface reconstruction screened Poisson, generate the STL file as shown in Fig. 8a. In this experiment, we capture the moving images using the camera mounted and apply image processing to identify the object. Figure 8b shows the conversion of the 2D image into 3D mesh by using the RGBD depth. Use the RGBD image to perform 3D reconstruction using techniques such as point cloud generation or mesh reconstruction. Therefore, the process involves a scanner system which captures the objects like a capacitor and a DC motor through a distance sensor as well as a camera. By placing a capacitor as the object on the scanner’s rotating platform and powering it on, scans are taken along the XYZ axis. The scan coordinate data is stored on
Fig. 7 Object1: a capacitor and b capacitor point cloud
Fig. 8 Object1: a capacitor STL file and b conversion of 2D image into 3D image
Arduino Controlled 3D Object Scanner and Image Classification
191
Fig. 9 a Object1: conversion of‘ ‘rgb2gray’; b Object1: object identification
Fig. 10 First row: a Object2: DC motor; b point cloud using XYZ coordinates; c STL file; d final 3D output after filtering. Second row: e camera captured image; f textured image; g identified image description; h final 2D to 3D output
an SD card and processed in MESHLAB, converting it into a mesh point cloud. The 3D view of the captured data is then obtained by applying filters and surface reconstruction. Images of the object captured are processed and identified through the image processing method. This includes RGBD depth which converts 2D images to 3D meshes. Figure 9a, b respectively shows the object identification used to extract features from an image at different orientations and frequencies. Similarly, an experiment on second object as DC Motor is carried out, and the corresponding output for the scanned DC Motor is as shown in Fig. 10a–h. The experimental analysis of the developed system shows the accurate conversion of the captured object on the system with XYZ coordinates as well as the Camera capture with different orientations in vertical and circular directions.
5 Conclusion In conclusion, development of a 3D object scanner using an Arduino, Sharp IR sensor, Stepper motor is a cost-effective and efficient option for individual’s wishing to record the shape and size of real-life things. This technology has a wide range of
192
A. Belsare et al.
applications, including prototype and product creation, as well as gaming and virtual reality experiences. Anyone may construct comprehensive and exact 3D models of object they choose to scan by combining the power of the Arduino with the precision of the distance sensor. Object identification using a camera and the Gabor feature is an excellent way for recognizing and categorizing items in images. The approach can use Gabor filters to extract attributes from a picture after it has been captured using a camera in order to discover and categorize unique patterns in the image. Python Gradio combined with 2D to 3D conversion creates a powerful tool for seeing and engaging with 2D pictures in three dimensions.
References 1. Shinde N, Bhosale DS (2016) Development of 3D reconstruction system. In: International conference on advances in electronics, communication and computer technology (ICAECCT). IEEE, Rajarshi Shahu College of Engineering, Pune, India, pp 82–85 2. Pawar SS, Mithaiwala H, Gupta A, Jain S (2017) Review paper on design of 3D scanner. In: International conference on innovative mechanisms for industry applications (ICIMIA). IEEE, Bengaluru, India, pp 650–652 3. Deshmukh SP, Shewale MS, Suryawanshi V, Manwani A, Singh VK, Vhora R, Velapure M (2017) Design and development of XYZ scanner for 3D printing. In: International conference on nascent technologies in engineering-ICNTE. IEEE, Vashi, India, pp 1–5 4. Qun Z, Wenbo H, Chenghao J, Yutong J (2012) Studies on portable 3D laser scanner for surface flaws. In: Second international conference on instrumentation, measurement, computer, communication and control. IEEE, Harbin, China, pp 1297–1299 5. Cho JY, Yang JY, Lee MS, Kwon DS (2011) Verification of registration method using a 3D laser scanner for orthopedic robot systems. In: 11th international conference on control, automation and systems. IEEE, Gyeonggi-do, Korea (South), pp 460–464 6. Frantis P, Toman M (2015) Low-cost 3D scanner using off-the-shelf technologies. In: International conference on military technologies (ICMT). Brno, Czech Republic, pp 1–7 7. Lavoie P, Ionescu D, Petriu EM (2004) 3-D object model recovery from 2-D images using structured light. IEEE Trans Instrum Meas 53(2):437–443 8. Ukida H, Tanimoto Y, Sano T, Yamamoto H (2010) 3D object reconstruction using image scanner with multiple light sources. In: IEEE international conference on imaging systems and techniques. IEEE, Thessaloniki, Greece, pp 115–120 9. Satpute MR, Jagdale SM (2016) Automatic fruit quality inspection system. In: International conference on inventive computation technologies (ICICT). IEEE, Coimbatore, India, pp 1–4 10. Vashisht M, Kumar B (2020) A survey paper on object detection methods in image processing. In: International conference on computer science, engineering and applications (ICCSEA). IEEE, Gunupur, India, pp 1–4 11. Algorry A, Gracia A (2017) Realtime object detection and identification of small and similar figures in image processing. In: International conference on computational science and computational intelligence. IEEE, Las Vegas, NV, USA, pp 516–519 12. Kazhdan M, Bolitho M, Hoppe H (2006) Poisson surface reconstruction. In: Eurographics symposium on geometry processing. The Eurographics Association, Cagliari, Sardinia, Italy, pp 61–70 13. Meera MK, Shajee Mohan BS (2016) Object recognition in images. In: International conference on information science (ICIS). IEEE, Kochi, India, pp 126–130 14. Shao Z, Peng Y, Pala S, Liang Y, Lin L (2021) 3D ultrasonic object detections with >1 meter range. In: IEEE 34th international conference on micro electro mechanical systems (MEMS). IEEE, Gainesville, FL, USA, pp 386–389
Arduino Controlled 3D Object Scanner and Image Classification
193
15. Ghazali R, Eziwan A, Ghazali ES (2011) Evaluating the relationship between scanning resolution of laser scanner with the accuracy of the 3D model constructed. In: IEEE international conference on control system, computing and engineering. IEEE, Penang, Malaysia, pp 590–595 16. Bernardini F, Mittleman J, Rushmeier H, Silva C, Taubin G (1999) The ball-pivoting algorithm for surface reconstruction. IEEE Trans Visual Comput Graphics 5(4):349–359 17. Liu GH (2015) Content-based image retrieval based on visual attention and the conditional probability. In: Proceedings of international conference on chemical, material and food engineering (CMFE). Atlantis Press, pp 843–847 18. Hsu RL, Mottaleb MA, Jain AK (2002) Face detection in color images. IEEE Trans Pattern Anal Mach Intell 24(5):696–706 19. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGBD images. In: 12th European conference on computer vision. Springer-Verlag Berlin, Heidelberg, Florence, Italy, pp 746–760 20. Abdelmomen M, Dengiz FO, Tamre M (2020) Survey on 3D technologies: case study on 3D scanning, processing and printing with a model. In: 21st international conference on research and education in mechatronics (REM). IEEE, Cracow, Poland, pp 1–6
Anomaly Detection for IoT-Enabled Kitchen Area Network Using Machine Learning Mohd Ahsan Siddiqui, Mala Kalra, and C. Rama Krishna
Abstract IoT has eased our life by providing various smart applications such as Smart Homes, Industrial Automation, Smart Healthcare, Smart Traffic Monitoring, and Fleet Management, to name a few. Due to various vulnerabilities in IoT infrastructure and software, Anomaly Detection is one of the major concerns for application developers. In this paper, Smart Home application of the IoT paradigm has been presented. We proposed an Anomaly Detection System (ADS) for the Kitchen Area Network (KAN), which has been deployed with the help of the MQTT Broker and various sensors on the IoT Flock Emulator. IoT Network has been created with the help of an IoT Network Emulator. Network Traffic packet sniffer (i.e., Wireshark) is used to capture the flows. The KNN machine learning model is used for binary classification to detect anomalies. The proposed model achieves accuracy, recall, precision, and F-1 score of 94.37%, 94.31%, 95.40%, and 94.85%, respectively. Keywords IoT · Anomaly detection · Machine learning · Intrusion detection · IoT-Flock · Security · Kitchen area network (KAN)
1 Introduction At present, many IoT-enabled use-case scenarios like Smart Home, Smart Transportation, Smart Health, Smart Factory, and Smart Agriculture have gained tremendous popularity. The IoT systems are making life easy, safe, and comfortable. Billions of devices are connected to the IoT systems worldwide and are ever-growing. Hence,
M. A. Siddiqui (B) · M. Kalra · C. Rama Krishna Department of CSE, NITTTR, Chandigarh, India e-mail: [email protected] M. Kalra e-mail: [email protected] C. Rama Krishna e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_17
195
196
M. A. Siddiqui et al.
it is imperative to apply a modular approach to tackle the problems persisting in the IoT Systems, Sensors, IoT Network, Actuators, Devices, etc. Due to the heterogeneous nature (i.e., power consumption, communication protocol, networking, computational power, software, and storage capacity and design) and rapidly expanding nature of IoT systems, they are more prone to vulnerabilities and cyber-attacks [1]. These activities may cause system malfunction, abnormal power consumption, network congestion, communication failure, data redundancy, unwanted computation and abnormalities in the storage, etc. [2]. Anomaly detection and prevention play an essential role in the standard, optimal, and reliable functionality of the IoT ecosystem. The Kitchen Area Network is a subdomain of the Smart Home application, consisting of various sensors/devices for making the kitchen smart, as depicted in Fig. 1. The reason to go into the deeper subdomain, like the Kitchen Area Network of IoT-based Smart Homes, is that the IoT system has diverse nature in terms of connectivity, networking, protocol, and computational power, including storage capacity. This tendency of the IoT system has more chances of anomalies and cyber-attacks. In the MQTT protocol, Broker-Based Anomaly Detection is an actual representation of IoT systems. MQTT protocol provides lightweight communication among different sensors, actuators, and IoT devices in an IoT system, which minimizes major factors like computation and power for the long and continuous working of
Fig. 1 Typical kitchen area network
Anomaly Detection for IoT-Enabled Kitchen Area Network Using …
197
IoT devices without interruption. MQTT broker works as a mediator for the effective and guaranteed delivery of messages among the connected sensors in the system. In this paper, we are presenting an Anomaly Detection System (ADS) for KAN and its different components for better reliability, accuracy, and stability of the IoT system. The paper has been divided into six sections. In Sect. 2, related work has been explored. The proposed methodology is covered in Sect. 3. Section 4 analyzes the experimental setup. Section 5 shows the proposed model’s performance, and finally, Sect. 6 describes the conclusion and future work.
2 Related Work In this section, the existing work in the area of anomaly detection approaches for various IoT systems has been discussed. Reference [3] presented anomaly detection in IoT-based sensor networks. Simulation-based local outlier factors and time series were considered. However, no machine or deep learning was considered. A claim of an improved method has been established over previous statistical methods in the literature review section. Another anomaly detection approach presented in [4] was focused on a Dual Auto Encoder Generative Adversarial Network (DAGAN). The DAGAN approach is capable of representing the IoT network in more detail. Further, the deployed system is effectively utilized for IoT networks. DS2OS and SWAT standard datasets were used to test and train various machine learning models like SVM, Gaussian Mixture Model-based method (GMM), Auto-Encoder (AE), anomaly, and FenceGAN. The accuracy of the proposed method was calculated for various datasets and found to be 97% for DoS attacks and 80% for Wrong Setup (WS) attacks, respectively. In [5], deep learning model was used to model the anomaly detection approach for IoT networks. A lightweight version of RNN model was proposed for binary classification. Standard datasets such as NSLKDD, BoT-IoT, IoT-NI, MQTT, MQTTset, IoT-23, and IoT-DS2 were used to train and test the RNN model. The accuracy of the NSLKDD dataset was measured at 99.67% for LSTM, 99.82% for BiLSTM, and 99.78% for GRU models. The accuracy of LSTM, BiLSTM, and GRU models using the IoT-NI dataset was 98.14%, 98.89%, and 98.42%, respectively. Other performance metrics like precision, F-1 score, and recall were obtained. Another article [6] presented the LSTM approach for anomaly detection was considered for the Metro train environment. The authors deployed various sensors for raw data generation. The raw data was further refined to avoid any inadequate training in the deep learning model. The real-time data traffic of three sensors, (a) air conditioner, (b) the traction system, and (c) the electric sliding gate, was recorded over a period of 30 days for an interval of 0.5 s. A total volume of 30 GB of data was captured. The performance metrics were calculated and simultaneously compared with other machine learning models. Reference [7] proposed a system known as a family Guard for smart home-based IoT systems. In this approach, an extra layer of security was proposed for the security,
198
M. A. Siddiqui et al.
monitoring, and maintenance of connected IoT devices. A machine learning model known as one class classifier was utilized for accurate anomaly detection in network flows. A real-time use-case environment was implemented to test the performance of the deployed model. Real-time data was collected and pre-processed before feeding to the machine learning model. But performance metrics were not mentioned including accuracy. An anomaly detection model for IoT integrated Maritime Transportation was developed in [8]. Abnormal trajectories in maritime transportation for anomaly detection in the proposed model were utilized. Further, the authors of the research article [9] presented a hybrid method of anomaly detection using machine learning. K-means clustering and naïve Bayes were successfully implemented. Real-time raw data was generated from numerous installed sensors such as Fridge, Garage Door, GPS Tracker, Modbus, Motion Light, Thermostat, and Weather Sensor. The accuracy of the proposed model was 90– 100% for different scenarios. Performance metrics such as precision, recall, and F-1 score were estimated to evaluate the machine learning model’s performance. The article [10] has three autoencoder-based models, namely shallow Autoencoder, Deep Autoencoder (DAE), and Ensemble of Autoencoders (i.e., KitNET) for anomaly detection. Data Poisoning Attack (DPA) was detected as an anomaly in network traffic. For training and testing purposes of the deployed model, unsupervised machine learning models were considered. Reference [11] depicts the Recurrent Neural Network (RNN)-based anomaly detection model in an IoT system. The variants of the RNN model, such as LSTM, BiLSTM, and GRU-based approaches, were successfully deployed in the proposed model. Multiple datasets like NSLKDD, BoT-IoT, IoT-NI, MQTT, MQTTset, IoT-23, and IoT-DS2 were used for training and testing the utilized machine learning models. Other parameters such as precision, recall, and F-1 score were also considered and recorded. In the article [12], a detailed survey was carried out to deploy the anomaly detection system in the IoT ecosystem. The main challenges in the deployment of an anomaly detection system using IoT data were majorly considered. Different trending methods, such as Statistical and Probabilistic, Pattern Matching, DistanceBased, Clustering, Predictive, and Ensemble, were discussed. Another paper [13] described the anomaly detection methods and techniques of IoT systems integrated with IoT-enabled sensors. Various machine learning models like LR, SVM, DT, RF, and ANN were deployed and compared for accuracy and other performance metrics. The standard dataset DS2OS was utilized. Further in [14], a novel projection from a wireless network attack data into a gridbased image for feeding one of the Convolutional Neural Network (CNN) models and Efficient Net were proposed. The AWID2 dataset was utilized. The deployed machine models were performing well considering various performance parameters like accuracy, precision, recall, and F-1 score and measured an accuracy of 95%. In [15], the authors focused on hyperparameter tuning for the deep learning model to increase the accuracy of BoT net attack identification in IoT networks.
Anomaly Detection for IoT-Enabled Kitchen Area Network Using …
199
3 Proposed Methodology Figure 2 represents a typical Kitchen Area Network (KAN)-based Anomaly Detection System (ADS) scenario limited to the house’s kitchen, a commercial pantry, a commercial kitchen, etc. The KAN comprises numerous sensors like temperature, humidity, gas, door sensor, etc., as mentioned in Table 1.
3.1 MQTT Broker-Based Anomaly Detection Model Most of the sensors are resource-constrained; hence, typical IPs, HTTPS, TCP, ICMP, etc., communication protocols may not reflect the actual use case of a typical IoT system. However, smart devices such as Smart Refrigerator, Smart Chimney, Smart Water Purifiers, and Smart Garbage Collector may communicate to other gateways
Fig. 2 Proposed MQTT broker-based anomaly detection system
Table 1 Summary of extracted features
Sr. No.
Selected features
Sr. No.
Selected features
1 2 3 4 5 6 7 8 9 10
Flow ID Flow pkts/s Fwd IAT mean Down/up ratio Idle max Src IP_10.0.2.22 Src IP_10.0.2.24 Src IP_10.0.2.28 Src IP_10.0.2.34 Dst IP_10.0.2.22
11 12 13 14 15 16 17 18 19
Dst IP_10.0.2.28 Dst IP_10.0.2.34 Timestamp_57:07.9 Timestamp_57:30.4 Timestamp_57:07.4 Timestamp_57:35.4 Timestamp_57:40.4 Timestamp_57:44.4 Timestamp_57:48.4
200
M. A. Siddiqui et al.
using conventional Internet Protocols (IP). In the proposed anomaly detection model, MQTT broker-based model is considered for light, smooth, and reliable communication among various IoT sensors, devices, gateway, etc. Figure 2 depicts the MQTT-based anomaly detection strategy using the machine learning technique. All considered sensors in Kitchen Area Network (KAN) are simulated using the IoTFlock framework (IoT network traffic emulator) and continuously sending data to the MQTT broker. Further, the MQTT broker sends (routing) data to subscribers such as smartphones or computers. KAN provides an in-depth view of possible IoT sensors, actuators, and device deployment for future smart kitchen design. This use case provides better anomaly detection, security, and privacy strategies for IoT-enabled smart home ecosystems. A variety of sensors are considered to enhance the smartness and intelligence of the proposed system. However, the consideration of sensors may vary from case to case in Smart Kitchen design as it depends upon the degree of automation required in this specific environment. The fundamental principle of sensor deployment is to provide as detailed information about the proposed environment as possible for AI integration. This detailed information is then pre-processed and fed into a machine learning model for future prediction or autonomous decision-making by the machine without the intervention of human beings. In the proposed model, various sensors work as publishers, and other devices like smartphones or computers work as subscribers. These are the essential requirements of MQTT protocols for computing constraint devices.
3.2 KNN-Based Machine Leaning Model The KNN model is utilized for the classification of anomalous and normal data flows. The labeled data is supplied to the KNN classifier, and the prediction was carried out for MQTT Publish Flood Attack. Figure 3 shows a typical MQTT-Based Anomaly Detection System (ADS) deployment scenario using KNN-based Machine Learning Model in the proposed KAN.
Fig. 3 KNN-based machine learning model deployment in KAN
Anomaly Detection for IoT-Enabled Kitchen Area Network Using …
201
Table 2 Sensor deployment in kitchen area network (KAN) Sensor name
Topic name
Data profile
Data value range
Temperature
Temp.
Integer
0–100
Humidity
Hum.
Integer
0–100
Light
Lght.
Boolean
0–1
Water filter
Wtrf.
Boolean
0–1
Oven
Ovn.
Boolean
0–1
Dish washer
Dshw.
Boolean
0–1
Smoke
Smke.
Boolean
0–1
Gas supply
Gssp.
Float
0–50
Garbage bucket
Grbg.
Boolean
0–1
Refrigerator
Rfgr.
Boolean
0–1
Wastewater duct
Wstw.
Boolean
0–1
Gas leakage
Gslkg.
Boolean
0–1
Door
Drs.
Boolean
0–1
Chimney
Chmny.
Boolean
0–1
Water supply
Wtrs.
Float
0–100
Exhaust fan
Exht.
Boolean
0–1
Water quality
Wtrq.
Float
0–14
Time profile
1 min
The proposed approach uses K-Nearest Neighbor (KNN) supervised machine learning model to classify normal and malicious traffic. It works on clustering data points of similar properties. The similarity index of the data point is calculated by measuring the Euclidean distance from the centroid. The value of K is selected randomly to represent the central pivot point of the cluster. Whenever new incoming data points are given to this algorithm, the clustering tendency is calculated using Euclidean Distance. The raw data is generated with a specified profile provided in Table 2. Features are extracted from the IoT network traffic using the Wireshark packet sniffer.
3.3 Data Pre-processing Data pre-processing is an important and essential part of an Anomaly Detection System (ADS). In this phase, raw data generated from the sensors has been cleaned up by imputation of missing values, encoding, and outliers’ removal. The efficiency of deployed machine learning is majorly dependent upon the pre-processed data. Various techniques, such as Scikit Learn, Pandas, NumPy, and Matplotlib, have been utilized for data processing. The generated dataset has been cleaned up for missing, incorrect, and unknown values. Different columns have been encoded for categorical
202
M. A. Siddiqui et al.
values. Outliers like inconsistent records have been removed. The dataset has been normalized using the Min–Max technique to rescale all features in the scale range of 0–1.
3.4 Label Defining and Mapping IoT Network Traces The label encoding systematically assigns a numerical value to a categorical value for each IoT network traffic flow data point. In the dataset, 0 is used for expected traffic flows, and one is assigned for malicious IoT network traffic flow. The label encoding is based on binary classification.
3.5 Resolving the Imbalance Data Problem The captured dataset has an imbalance problem. 20% of data points represent malicious Traffic (MQTT flood attack). This problem creates instability and inaccuracy in the machine learning model, which can result in the wrong prediction. It has been eliminated by employing the “Synthetic Minority Over Sampling Technique” (SMOTE). As a result, the normal distribution of the dataset has been achieved. SMOTE involves the following steps for normal distribution. • • • •
Determining IoT network flow feature vectors with their nearest neighbor. Calculating the difference between two data points. Multiplying the difference with a random number on a scale of 0–1. Generating a new point on the line segment by adding a random number to feature vectors.
3.6 Features Selection and Extraction The approach helps to avoid overfitting and excess computation by the machine or machine learning model. It involves the following steps: • Eliminating constant features. • Removing the correlated features. • Employing PCA to estimate the feature’s importance. Table 1 depicts the selected features.
Anomaly Detection for IoT-Enabled Kitchen Area Network Using …
203
3.7 Training and Testing After feature selection and extraction, the final dataset was divided into training and testing datasets. The ratio of 90:10 has been considered for the training and testing of the KNN model. The training dataset has a percentage for thorough training of the machine learning model, which increases the prediction accuracy of the same. However, different ratios of training and testing datasets may be selected for the evaluation of accuracy.
4 Experimental Setup The experiment is carried out on a desktop computer having a core i7 desktop with an Intel processor, 8 GB RAM, and a 500 GB hard disk. The desktop is equipped with Linux version 18. The IoT use case of the Kitchen Area Network (KAN) has been created on the IoT-Flock emulator. A Linux version of MQTT Mosquito Broker is installed on the Ubuntu Linux OS. For traffic packet capturing, a Wireshark packet sniffer has been installed on the computer. The following section covers the experimental setup in more detail. In the proposed methodology, we first generate RealTime Traffic using an IoT-Flock Network flow emulator based on the MQTT broker. Real-time traffic was captured with the help of Wireshark and converted to PCAP format. Finally, these flows are given to CIC flow meter for feature generation. A total of 50k data flows have been captured. Further data has been cleaned up using different Python libraries. Finally, a total of approximately 30k data flows were generated. The dataset has been encoded and normalized using a MinMax scaler. To change the object value of a column in an integer value, we utilized the one-hot encoding. After processing data flows, a total of 106 features have been generated. The dimensionality reduction using PCA has been applied, and as per Table 1, a total of 19 features have been extracted for training and testing of the KNN Machine Learning model.
4.1 IoT-Flock Framework For real-time IoT traffic generation, we have used the IoT-Flock framework, which works as an IoT device traffic emulator [16]. It is a powerful emulator for creating any use-case scenario like a smart home, smart agriculture, intelligent transport, etc. The IoT-Flock has been downloaded from the GitHub repository configured on the Linux Operating System as it is only Linux-compatible. In IoT-Flock, both normal and malicious nodes can be created for normal and malicious IoT network traffic generation [1]. IoT-Flock framework is GUI based for easy operation. In this
204
M. A. Siddiqui et al.
framework, XML file of use-case has been generated, and finally, this has been rerun on the IoT-Flock emulator.
4.2 MQTT Broker MQTT broker plays a vital role in MQTT protocols. It works as a broker among various IoT system-connected devices, actuators, and sensors. MQTT works on publish and subscribe model and is designed for computational constraint devices. In Mosquito MQTT Broker, sensors work as publishers, while actuators, smartphones, and computers work as subscribers for message exchange and vice versa among devices.
4.3 IoT Network Traffic Sniffer IoT-network used in the framework generates the traffic flow. To capture these flows, we have deployed Wireshark locally on the same Linux-powered desktop computer [17]. The PCAP file of the captured traffic has been converted into a CSV file with the help of the CIC flow meter. The flow meter extracts valuable information from Wireshark’s PCAP file. This extracted information is converted into a CSV file and works as various features for machine learning for prediction and autonomous decision-making.
4.4 Kitchen Area Network (KAN) Use-Case Creation For anomaly detection in KAN, the use case of the same proposed model has been created in the IoT-Flock emulator. In KAN, various sensors, like temperature, humidity, exhaust fan sensor, etc. have been proposed and considered, as mentioned in Table 2. The XML file of the use case has been generated and run in the IoTFlock framework. The framework starts the Mosquito MQTT Broker for delivering messages among connected sensors. Each sensor has a unique IP address, but the message is delivered through the broker to the subscriber for the specific topic. Hence, MQTT Mosquito Broker plays an important role. In the case of a broker failure, the whole system may be collapsed. The sensors are created in the IoT-Flock framework. Each sensor has three properties: Topic Name, Data Profile, and Time Profile. Table 2 represents the sensor’s details deployed in the Kitchen Area Network (KAN). The number of deployed sensors may vary due to different kitchen designs and the number of features required in a particular environment. Sensors have different topic names, data profiles, value ranges, and time profiles. Different sensors’ value range depends on the desired
Anomaly Detection for IoT-Enabled Kitchen Area Network Using …
205
value in which the whole system usually works typically. Sensors are sending data to the MQTT broker. The broker is responsible for delivering the message issued by publishers to a subscriber for the specific subscribed topic.
4.5 Normal Traffic Generation The normal traffic has been generated using various Kitchen Area Network sensors. These sensors publish data to the broker, and messages are delivered to subscribers. MQTT broker avoids any network congestion or system failure. The sensors reflect the digital copy of the physical sensor, and actual circumstances may not be created in the IoT traffic emulator.
4.6 Malicious Traffic Generation Malicious traffic has been generated by inserting a new sensor as an edge node in the Kitchen Area Network (KAN) with the IoT-Flock traffic emulator. An MQTT flood attack has been created by considering the abnormal time profile, data profile, and topic name. The malicious node sends a considerable amount of data to the MQTT broker, which can cause an abnormal condition in the IoT network. The abnormal traffic may cause abnormality in the captured network traffic flows using Wireshark. Table 3 represents the normal and malicious (MQTT flood attack) flows.
5 Performance Evaluation The performance evaluation is the final phase to judge the machine learning model’s reliability, accuracy, and stability. Figure 4 shows the relationship among the extracted features. A Heat Map has been obtained for the correlation of features. Figure 5 shows the different values of K with respect to error. With this plot’s help, we can select the optimum value for maximum accuracy of the KNN machine learning model (i.e., k = 15). As per Fig. 6, the accuracy is 94.37%, precision is 95.40%, recall is 94.31%, and F-1 score is 94.85% of deployed KNN machine learning model. Figure 7 represents the training and testing accuracy. The accuracy decreases for the value of 2 of n_neighbour, and the accuracy increases more than two values of Table 3 Normal versus malicious traffic flows
Sr. No.
Types of flows
No. of flows
1
Normal flows
24000
2
Abnormal flows (MQTT flood attack)
06000
206
M. A. Siddiqui et al.
Fig. 4 Heat map of extracted features
Fig. 5 K-values versus error rate
n_neigbour. After the value of 7 of n_neighbours, there is saturation in training and testing accuracy. Figure 8 shows the accuracy score concerning training samples (data points) and is generally known as a learning curve. Initially, the accuracy score increases, and after a particular value of training samples, the accuracy is fixed, and there is saturation in the learning of the KNN machine learning model. Finally, Fig. 9 represents the confusion matrix. According to the confusion matrix, True Positives (TP) are 83%, True Negatives (TN) are 68%, False Positives (FP) are 4%, and False Negatives (FN) are 5%, respectively. Table 4 summarizes the results.
Anomaly Detection for IoT-Enabled Kitchen Area Network Using …
207
Performance Metrics of KNN 96 94 92
95.4 94.3194.8594.37
Fig. 6 Performance measures of proposed model
Fig. 7 Training versus testing accuracy
6 Conclusion and Future Directions In this paper, we have proposed an extension use case for the smart home, i.e., Kitchen Area Network (KAN). The devices, sensors, and actuators installed in the kitchen play an essential and critical role in the smart home ecosystem. The anomaly detection in the proposed MQTT-based KAN has been carried out. The normal and malicious traffic has been captured and converted into feature columns. KNN machine learning model has been deployed as proof of possible anomaly detection in a subdomain of an IoT system. The accuracy and other parameters such as precision, recall, and F-1 score have been measured for ML model evaluation. It may have a significant advantage in IoT system by bifurcating the IoT system into small use cases to achieve better accuracy, performance, and stability in the deployed system. In the future, other implementations based on different ML models can be proposed for KAN as well as for other IoT systems to improve accuracy.
208
M. A. Siddiqui et al.
Fig. 8 Learning curve
Fig. 9 Confusion matrix
Table 4 Results
Performance metric
Result (%)
Precision
95.40
Recall
94.31
F-1 score
94.85
Accuracy
94.37
Anomaly Detection for IoT-Enabled Kitchen Area Network Using …
209
References 1. Raj H, Kumar M, Kumar P, Singh A, Verma OP (2022) Issues and challenges related to privacy and security in healthcare using IoT, fog, and cloud computing. In: Advanced healthcare systems. Wiley, pp 21–32. https://doi.org/10.1002/9781119769293.ch2 2. Singh S, Kumar M, Verma OP, Kumar R, Gill SS (2023) An IIoT based secure and sustainable smart supply chain system using sensor networks. Trans Emerg Telecommun Technol 34(2). https://doi.org/10.1002/ett.4681 3. Wei Z, Wang F (2022) Detecting anomaly data for IoT sensor networks. Sci Program 2022(1):1– 7. https://doi.org/10.1155/2022/4671381 4. Chen L, Li Y, Deng X, Liu Z, Lv M, Zhang H (2022) Dual auto-encoder GAN-based anomaly detection for industrial control system. Appl Sci 12(10):4986. https://doi.org/10.3390/app121 04986 5. Ullah I, Mahmoud QH (2022) Design and development of RNN-based anomaly detection model for IoT networks. IEEE Access 1. https://doi.org/10.1109/access.2022.3176317 6. Wang Y, Du X, Lu Z, Duan Q, Wu J (2022) Improved LSTM-based time-series anomaly detection in rail transit operation environments. IEEE Trans Ind Inform 3203(c):1–11. https:// doi.org/10.1109/TII.2022.3164087 7. de Melo PHAD, Miani RS, Rosa PF (2022) FamilyGuard: a security architecture for anomaly detection in home networks. Sensors (Basel, Switzerland) 22(8):2895. https://doi.org/10.3390/ s22082895 8. Hu J, Kaur K, Lin H, Wang X, Hassan MM, Razzak I, Hammoudeh M (2023) Intelligent anomaly detection of trajectories for IoT empowered maritime transportation systems. IEEE Trans Intell Transp Syst 1–10. https://doi.org/10.1109/TITS.2022.3162491 9. Best L, Foo E A hybrid approach: utilising k-means clustering and Naive Bayes for IoT anomaly detection, pp 1–43 10. Bovenzi G, Foggia A, Santella S, Testa A, Persico V, Pescap A (2022) Data poisoning attacks against autoencoder-based anomaly detection models: a robustness analysis, no May 11. Iot I, Feng X, Xiangyu S (2012) Fog computing based distributed forecasting of cyber-attacks in Internet of Things, no Kott 2015 12. Cook AA, Misirli G, Fan Z (2020) Anomaly detection for IoT time-series data: a survey. IEEE Internet Things J 7(7):6481–6494. https://doi.org/10.1109/JIOT.2019.2958185 13. Hasan M, Islam MM, Zarif MII, Hashem MMA (2019) Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches. Internet Things (Neth) 7:100059. https://doi.org/10.1016/j.iot.2019.100059 14. Aminanto ME, Purbomukti IR, Chandra H, Kim K (2022) Two-dimensional projectionbased wireless intrusion classification using lightweight EfficientNet. Comput, Mater Contin 72(3):5301–5314. https://doi.org/10.32604/cmc.2022.026749 15. Popoola SI, Adebisi B, Gui G, Hammoudeh M, Gacanin H, Dancey D (2022) Optimizing deep learning model hyperparameters for botnet attack detection in IoT networks attack detection in IoT networks. IEEE Internet Things J 0–16. https://doi.org/10.36227/techrxiv.19501885.v1 16. Hussain F, Abbas SG, Shah GA, Pires IM, Fayyaz UU, Shahzad F, Garcia NM, Zdravevski E (2021) A framework for malicious traffic detection in IoT healthcare environment. Sensors (Basel, Switzerland) 21(9):3025. https://doi.org/10.3390/s21093025 17. Moustafa N (2021) A new distributed architecture for evaluating AI-based security systems at the edge: network TON_IoT datasets. Sustain Cities Soc 72. https://doi.org/10.1016/j.scs. 2021.102994
Character-Level Bidirectional Sign Language Translation Using Machine Learning Algorithms K. Rajeswari, N. Vivekanandan, Sushma Vispute, Shreya Bengle, Anushka Babar, Muskan Bhatia, and Sanket Annamwar
Abstract Sign language is an indispensable mode of communication for the hard of hearing and deaf population. However, there is still a substantial language barrier between users of sign language and those who do not use it. This paper presents a bidirectional character-level sign language translation system that uses various machine learning algorithms, including Support Vector Machines (SVM), Random Forest, Logistic Regression (LR), and K-Nearest Neighbors (KNN), as well as deep learning algorithm—Convolutional Neural Networks (CNN), to provide a solution to this communication issue. Keywords Sign language · Support Vector Machines (SVM) · Random forest · Logistic Regression (LR) · K-Nearest Neighbors (KNN) · Convolutional Neural Networks (CNN)
K. Rajeswari · S. Vispute · S. Bengle (B) · A. Babar · M. Bhatia · S. Annamwar Computer Engineering Department, Pimpri Chinchwad College of Engineering, SPPU, Pune, Maharashtra, India e-mail: [email protected] K. Rajeswari e-mail: [email protected] S. Vispute e-mail: [email protected] A. Babar e-mail: [email protected] M. Bhatia e-mail: [email protected] S. Annamwar e-mail: [email protected] N. Vivekanandan Mechanical Engineering Department, Pimpri Chinchwad College of Engineering, SPPU, Pune, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_18
211
212
K. Rajeswari et al.
1 Introduction There are more than 70 million deaf people around the globe estimated by the World Federation of the Deaf [1]. Among them, nearly 80% belong to the developing countries, which together use more than 300 different sign languages. Sign language [2] is vital for those with hearing impairments, conveying thoughts and emotions. Communication gaps arise when unfamiliar individuals interact with sign language users Systems that automatically interpret hand signals into spoken or written language have been developed to bridge this gap. Due to their capacity for pattern learning and prediction, machine learning (ML) algorithms such as SVM, KNN, and LR as well as deep learning (CNN) are frequently used. Strong communication is made possible by their collaboration in bidirectional sign language translation in a variety of contexts, including daily contacts, education, healthcare, and public services.
2 Literature Survey In their work, Zheng et al. [3] tackle sign language translation challenges with a novel model that enhances long sentence translation. They alter current models to take into account complications including efficient hand motions, expressive facial expressions, and context memory. In their two-stage technique, sign phrases are first broken down into smaller pieces to improve context awareness before being translated into spoken language. Through the use of attentional processes and visualizations, this model’s interpretability is improved, promoting openness and trust. Significant accuracy increases are seen in testing on the RWTH-PHOENIX-Weather 2014T dataset, notably for lengthier sign assertions. A real-time sign language translation system that translates gestures into spoken and written words is presented by Ojha et al. [4]. They recommend CNN implementation in order to achieve accurate translation and strong, immediate communication between sign and non-sign users. They use CNN for gesture processing in real-time and for translation and recognition. From large sign motion datasets, the CNN model learns visual-textual mapping. Hand gesture detection, tracking, and identification are the phases that make up the system. The user’s hand is located using computer vision algorithms, ensuring continuous tracking. The trained CNN classifies gestures into text and provides accurate sign language translations in real-time. Results confirm the performance and real-time capability of the CNN-based technique. Kanvinde et al. [5], focus on bidirectional conversion of sign language and spoken language. The authors suggest using machine learning techniques, notably deep learning models, to achieve bidirectional translation. To capture and analyze the visual and temporal features of sign language, they use a combination of CNN and Recurrent Neural Networks (RNNs). The suggested system is divided into two parts: input processing and output generation. The system uses computer vision techniques to extract features from the input video of sign language or spoken language
Character-Level Bidirectional Sign Language Translation Using …
213
during the input processing stage. After that, the collected features are fed into deep learning models, which analyze the input and provide an intermediate representation. The intermediate representation is utilized in the output creation stage to generate the output in the specified language format. In the case of spoken-to-sign-language translation, the system converts the intermediate representation into sign language motions and vice versa. Another work done by Katoch et al. [6] on Indian Sign Language (ISL), captures a live stream of video and transforms it into text and speech. For segmentation, the technique employs the Bag of Visual Words (BOVW) model with skin color and background subtraction. The photos are processed to extract Speeded Up Robust Features (SURF), and histograms are constructed to map signs to matching labels. For classification, SVM and CNN are used. The dataset for training the model was created by capturing the signs from three different people showing characters (A– Z) and numbers (0–9). The research presents a novel strategy for classifying and recognizing ISL alphabets (A–Z) and digits (0–9) using SVM and CNN.
3 Dataset and Methods 3.1 Dataset The mnist dataset is used for performing classification. The training dataset consists of 27,455 tuples and 7172 tuples of the testing dataset. Each of these consists of 784 attributes which represent the pixels of the 28 × 28 grid. The various instances refer to the signs of the alphabet from A–Z (0–25) shown by different people in different angles (there are no cases for 9 = J or 25 = Z because of gesture motions). They have a header row of labels, pixel1, pixel2, pixel3, and pixel784, which represent a single 28 × 28 pixel image with grayscale values between 0 and 255.
3.2 Sign to Text Translation Image Frames Acquisition. The initial phase of image frame acquisition involves capturing sign language gestures with a webcam. The video transmission captures the entire duration of the sign, from the beginning to the end. The captured video stream is then separated into frames that are processed as grayscale images. It uses the OpenCV library for video processing and the OS library to create a directory to store the extracted frames. This process is continued until there are no more frames to read from the video. The current frame variable is used to keep track of the number of frames extracted and to create the image name. The system also checks if there exists a directory with the specified name and if not, then it creates one with the given name (Fig. 1).
214
K. Rajeswari et al.
Fig. 1 Flow diagram for sign to text translation
Image Frames Preprocessing. The image frames generated in the above step are then converted into grayscale images. Converting an image to grayscale [7] reduces the image to a single channel, where each pixel represents the intensity or brightness of the image. This simplification can make image processing and analysis tasks easier. The grayscale images are then converted into pixels which represent the image as a 2D array of pixel values. Using this dataset further tasks are been performed. Classification of Images Using Various Algorithms. The optimal algorithm is chosen based on numerous considerations, including the dataset’s size and complexity, the number of classes, the availability of labeled data, the required accuracy, and the processing resources available. The underlying algorithms are used for the classification of the hand gestures based on the features. Support Vector Machine (SVM) is a machine learning algorithm that is applied to regression and classification issues [8]. The main goal of SVM is to locate a hyperplane that precisely pinpoints the classes that correspond to the target variable. The kernel trick is a method used by SVM to alter data. To choose the optimum boundary from a range of potential outcomes, kernel trick approaches employ data transformation models. Firstly, an intricate transformation of data is done by Kernel trick techniques and then the decision on data separation is made on the basis of tags and outcomes. The K-nearest neighbors (KNN) algorithms are supervised machine learning algorithms that are very simple to use and are also capable of carrying out rather complex classification tasks [9]. While classifying a new data instance, all data is used for training. It is a non-parametric approach and learns from the data used for classification since it makes no assumptions about the underlying data. A statistical technique called logistic regression is used to predict a categorical outcome of classification tasks [10]. This is a well-known approach in machine learning and works particularly effectively for issues consisting nonlinear relationship between the input variables (features) and the goal variable (outcome).
Character-Level Bidirectional Sign Language Translation Using …
215
The random forest algorithm is an improvement over the decision tree, a supervised machine learning technique [11]. Decision trees use n-ary trees for classification and regression, but tend to overfit. Random forest addresses this by building a collection of decision trees, enhancing accuracy and reducing sensitivity to training data. Random Forest (RF) classification models classify input sign images. Bagging, or bootstrap aggregating, is used [12]. It involves randomly selecting rows from the dataset and averaging predictions from decision trees. This reduces bias from outliers. Random Forest employs this method to make predictions. Convolutional Neural Networks (CNN) can classify hand gestures from raw image data [4]. In a web-based bidirectional sign language translation app, CNN identifies and translates gestures into text or audio. Its layers, including convolution, max pooling, flatten, dense, dropout, and fully connected neural network layers, form a robust mechanism for feature extraction. Starting with low-level features, they progressively capture higher-level attributes. The implemented Sequential CNN model begins with a grayscale image’s first convolutional layer, recognizing lowlevel features like lines. A 5 × 5 filter computes the dot product with a stride of 1, yielding a 128-dimensional output [13]. ReLU activation replaces negative values, and max pooling considers top values in 3 × 3 sections to enhance attribute recognition. A second convolutional layer identifies angles and curves using 3 × 3 filters. Further, max pooling narrows down activation. A third convolutional layer captures movements and forms, followed by more max pooling. Padding is set to “same” for all layers, preserving the original image size during convolutions. The flatten layer converts the 2D map to a linear vector. The dense layer expands it to a 512element array, with dropout reducing overfitting at a rate of 0.25. A final dense layer compresses the map into a 24-item array for classes, employing “softmax” activation for multi-classification. The highest probability from the Softmax function determines the target class.
3.3 Text to Sign Translation A dataset having 25 tuples each denoting an alphabet from A–Z numbered as 0–25 has been used. The grayscale images taken were divided into 28 × 28 grids creating a total of 784 pixels. These 784 pixels were taken as the features. Whenever the user inputs some text, the string would be traversed and each alphabet would map to the corresponding number in the dataset. The particular instance would be selected and a grayscale image would be generated, the extracted image is displayed using Matplotlib.
216
K. Rajeswari et al.
4 Results According to the quantitative analysis the precision, recall and F1-score have been derived for the various ML-models. Figure 2 denotes the performance metrics of the Machine Learning models employed—Support Vector Machine, K-nearest Neighbor, Gaussian Naïve Bayes, Multinomial Naïve Bayes, Logistic Regression and Random Forest Classifier. From Fig. 2 it is clear that the Random Forest Classifier gives the best score across all the performance metrics with each of precision, recall and F1-score being 80%. Further accuracy and the time required for training of these models were also calculated. From Table 1 it is observed that Random Forest has the highest accuracy of 81.48% and takes 49 s to train the model. It gives best results among all the other ML-models across all the performance metrics. The confusion matrix for Random Forest Classifier is shown in Fig. 3.
Fig. 2 Grouped bar graph displaying the precision, recall, and F1-score of various ML-models
Table 1 Accuracy of various ML-models
Model
Time (min)
Accuracy (%)
SVM
0.75
78.16
KNN
0.1670
76.33
GNB
0.0167
38.98
MNB
0.0167
47.03
LR
8
67.63
RF
0.8167
81.48
(SVM—Support Vector Machine, KNN—K-nearest neighbor, GNB—Gaussian Naïve Bayes, MNB—Multinomial Naïve Bayes, LR—Logistic Regression, RF—Random Forest)
Character-Level Bidirectional Sign Language Translation Using …
217
Fig. 3 Confusion matrix of random forest classifier
Table 2 Accuracy of deep learning model (CNN)
Model
Time (min)
Accuracy (%)
CNN
75
98.89
After various machine learning models, the dataset was also trained on Deep Learning model like CNN (Convolutional Neural Network) (Table 2). Convolution Neural Network (CNN) when applied on the dataset gives an accuracy of 98.89%, however, it takes 1 h 15 min to train the model. Hence CNN outperforms all other models giving the highest accuracy. It has the maximum precision, recall and F1-score each of 99%. The confusion matrix for CNN is obtained as follows in Fig. 4.
218
K. Rajeswari et al.
Fig. 4 Confusion matrix of CNN
5 Conclusion This paper aims to address the crucial topic of improving communication and accessibility for those who are hard of hearing or deaf by combining computer vision techniques with various machine learning and deep learning models. Through a comparative study of these models, valuable insights into their effectiveness in achieving accurate and efficient communication solutions were gained. The experimentations resulted in notable differences in performance among the above-mentioned models. The deep learning model Convolutional Neural Network (CNN) outperformed all other machine learning algorithms with an accuracy of 98.89%. Despite its excellent accuracy, it is observed that the CNN model necessitated a training time of 1 h and 15 min. On the other hand, the Random Forest Classifier yielded an accuracy of 81.48% and demonstrated a more expedited training process, completed in 49 s. The results we obtained imply that the CNN model finds a good compromise in the setting of real-world deployment, where both accuracy and processing speed are crucial aspects. It benefits from precise prediction while still executing at an acceptable pace. CNN is more suitable as one can find pre-trained models on image classification, object detection, and more. Fine-tuning these models on your sign language data can significantly speed up your development process. As a result, CNN is the foundation of our modeling because it provides a high level of accuracy while still being relatively quick to process.
Character-Level Bidirectional Sign Language Translation Using …
219
6 Future Scope The current system gives optimal output only when the frame consists of particular hand gestures. The system is specifically designed to work on character-level (i.e. alphabets A–Z) of the American Sign Language dataset. The future scope of the system can be extended towards the following. Feature extraction involves extracting meaningful and relevant information or features from raw image data, which can then be used as input for machine learning algorithms or other image processing tasks. This step is necessary so as to extract the exact hand gesture from the rest of the image so that it can be processed further accurately. Word level and sentence level implementation can be done further to make the system faster and can be used in real-time. A dataset needs to be created specifically for Indian Sign Language (ISL).
References 1. Núñez-Marcos A, Perez-de-Viñaspre O, Labaka G (2022) A survey on sign language machine translation. Expert Syst Appl 118993 2. Khan R (2022) Sign language recognition from a webcam video stream 3. Zheng J et al (2020) An improved sign language translation model with explainable adaptations for processing long sign sentences. Comput Intell Neurosci 2020 4. Ojha A et al (2020) Sign language to text and speech translation in real time using convolutional neural network. Int J Eng Res Technol (IJERT) 8(15):191–196 5. Kanvinde A et al (2021) Bidirectional sign language translation. In: 2021 international conference on communication information and computing technology (ICCICT). IEEE 6. Katoch S, Singh V, Tiwary US (2022) Indian sign language recognition system using SURF with SVM and CNN. Array 14:100141 7. Saravanan C (2010) Color image to grayscale image conversion. In: 2010 second international conference on computer engineering and applications, vol 2, March. IEEE, pp 196–199 8. Yagin FH, Cicek ˙IB, Alkhateeb A, Yagin B, Colak C, Azzeh M, Akbulut S (2023) Explainable artificial intelligence model for identifying COVID-19 gene biomarkers. Comput Biol Med 154:106619 9. Sharma M, Pal R, Sahoo AK (2014) Indian sign language recognition using neural networks and KNN classifiers. ARPN J Eng Appl Sci 9(8):1255–1259 10. McKee MM, Barnett SL, Block RC, Pearson TA (2011) Impact of communication on preventive services among deaf American Sign Language users. Am J Prev Med 41(1):75–79 11. Das S, Imtiaz MS, Neom NH, Siddique N, Wang H (2023) A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier. Expert Syst Appl 213:118914 12. Ayyadevara VK (2018) Random forest. In: Pro machine learning algorithms: a hands-on approach to implementing algorithms in Python and R, pp 105–116 13. Patil R et al (2021) Indian sign language recognition using convolutional neural network. In: ITM web of conferences, vol 40. EDP Sciences
Enhancing Performance of Noise-Robust Gujarati Language ASR Utilizing the Hybrid Acoustic Model and Combined MFCC + GTCC Feature Bhavesh Bhagat and Mohit Dua
Abstract The study introduces an enhanced method for improving the accuracy and performance of End-to-End Automatic Speech Recognition (ASR) systems. This involves combining Gammatone Frequency Cepstral Coefficient (GTCC) and Mel Frequency Cepstral Coefficient (MFCC) features with a hybrid CNN-BiGRU model. MFCC and GTCC features capture temporal and spectral aspects of speech, while the hybrid architecture enables effective local and global context modelling. The proposed approach is evaluated using a low-resource Gujarati multi-person speech dataset, incorporating clean and noisy conditions via added white noise. Results demonstrate a 4.6% reduction in Word Error Rate (WER) for clean speech and a significant 7.83% reduction in WER for noisy speech, compared to baseline MFCC with greedy decoding. This method exhibits potential for enhancing ASR systems, making them more reliable and accurate for real-world applications necessitating precise speech-to-text conversion. Keywords ASR · MFCC · GTCC · CNN-BiGRU · WER
1 Introduction End-to-End (E2E) deep learning-based ASR models have gained significant attention in recent years, due to their ability to simplify the traditional ASR pipeline and achieve competitive performance [1]. Unlike conventional ASR systems that consist of separate components for acoustic modelling, pronunciation modelling, and language modelling, E2E models aim to directly map input audio signals to output text or character transcriptions in a single neural network architecture. Automatic Speech Recognition (ASR) systems face several challenges related to noise and robustness [2]. These include Background Noise, Variability in Noise Types, Signal-to-Noise B. Bhagat (B) · M. Dua National Institute of Technology, Kurukshetra 136118, India e-mail: [email protected] M. Dua e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_19
221
222
B. Bhagat and M. Dua
Ratio (SNR), and Speaker Variability. Background noise is caused by diverse sources such as traffic, machinery, crowds, and reverberation. Variability in noise types poses a unique challenge to ASR performance. Robust ASR models should be able to handle various speakers to ensure accurate transcription. This includes channel variability, reverberation, adverse acoustic conditions, and lack of training data. To address these challenges, advanced signal processing techniques, robust feature extraction methods, noise reduction algorithms, and the use of deep learning architectures are needed [3, 4]. Data augmentation techniques and transfer learning-based approaches can also be employed to enhance the system’s robustness in diverse conditions. Gammatone Frequency Cepstral Coefficients (GTCC) and Mel Frequency Cepstral Coefficients (MFCC) are two commonly utilized techniques for robust feature extraction in deep learning-based ASR systems [5]. GTCC is a feature extraction technique that combines the principles of Gammatone filterbank analysis and cepstral analysis. GTCC involves several steps: first, a bank of Gammatone filters is applied to the speech signal to capture its spectral information. Next, the filterbank outputs undergo a non-linear compression stage. Then, the logarithm of the compressed filterbank energies is taken, followed by the application of the Discrete Cosine Transform (DCT). Finally, a subset of the resulting cepstral coefficients is selected to represent the speech features. GTCC aims to capture important acoustic characteristics of speech signals, particularly in noisy environments, to enhance ASR performance. On the other hand, MFCC is a widely used technique in ASR. It involves several steps as well: pre-emphasis, frame segmentation, windowing, Fast Fourier Transform (FFT), Mel Filterbank, logarithm, DCT, and cepstral coefficient selection. MFCC features are known for their robustness to noise and their ability to capture important spectral information while discarding less relevant details. The Mel filterbank, used in the MFCC process, mimics the frequency resolution of the human ear and has been proven effective in various ASR applications. The proposed approach aims to enhance the accuracy and improve the performance of ASR by utilizing different features (MFCC, GTCC, and a fusion of MFCC+GTCC) and combine with a hybrid CNN-BiGRU model including the CTC layer. The fusion approach combines the MFCC and GTCC coefficients [6], creating a combined feature representation that incorporates both temporal and spectral information. To enable end-to-end training and decoding, a Connectionist Temporal Classification (CTC) [7, 8] layer is added to the model. The CTC layer allows for the prediction of variable-length output sequences, which is essentially required in ASR tasks, where the transcription length may differ from the input speech sequence. It handles the alignment between input speech and output text sequences without requiring explicit alignments during training. Experimental results show that the proposed approach achieves 7.83% of decrease in WER over the baseline model of hybrid CNN-BiGRU with MFCC features only.
Enhancing Performance of Noise-Robust Gujarati Language ASR …
223
2 Literature Review Deep learning-based ASR systems have made significant progress in recent years and have found widespread applications in various domains. However, ASR systems still face limitations, particularly in handling noisy environments [4]. This is explored in an overview of existing ASR systems and their limitations in such conditions. The most important details are the four main models used for ASR: Hidden Markov Model (HMM)-based Systems [1], Deep Neural Network (DNN)-based Systems, Hybrid Systems [9], and End-to-End Systems [10, 11]. HMM-based systems use Gaussian mixture models and dynamic time warping algorithms for recognition, but are susceptible to performance degradation in noisy environments. DNN-based systems use deep neural networks to model acoustic and language information, but struggle with noisy environments. Hybrid systems combine the strengths of HMMs and DNNs, but still face challenges with noise robustness. Joshi et al. [12] introduced DNN-based End-to-End systems which directly map acoustic input to written text output using deep learning-based architectures, i.e. Recurrent Neural Networks (RNNs) or transformer-based models. Another study proposed by Maji et al. [9] aims to improvise the overall performance of emotion recognition in Odia by leveraging deep learning architectures. They propose a parallelized CNNBiGRU network for this task. The network incorporates feature selection techniques to identify the most relevant features for emotion recognition. There are certain limitations in Noisy Environments including Acoustic Variability, Insufficient Training Data, Difficulty in Noise Separation, Lack of Noise-Aware Modelling, and Degraded Feature Extraction. Ephrat et al. [13] present a paper which addresses the difficulty of noise separation in ASR systems by proposing a speaker-independent audio and visual-based model for speech separation. The authors investigate techniques to leverage publicly available data to improve ASR performance in such settings. The work focuses on the use of audio and text data pairs to develop more accurate and robust ASR systems [14]. Diwan et al. presented work which focuses on multi-lingual and code-switching ASR challenges in Indian languages, especially low-resource languages. The authors address the challenges that occur in building ASR systems for languages with limited resources and also investigate code-switching scenarios. The study explores techniques to improve ASR performance in such challenging linguistic contexts [15]. Dua et al. proposed a noise-robust continuous ASR system for the Hindi language [5]. The system utilizes GFCC as the feature representation which is a variant of the conventional MFCC feature extraction technique, known for its ability to capture both spectral and temporal information effectively. The authors focus on addressing the challenges of noisy environments in ASR. They propose a discriminative training approach to improve the robustness of the ASR system. They also focus on improving the performance of ASR systems by incorporating discriminative training techniques and refined Hidden Markov Model (HMM) modelling [6]. The authors proposed the use of integrated features that combine multiple noise-robust feature representations. These features are designed to capture important acoustic charac-
224
B. Bhagat and M. Dua
teristics while minimizing the impact of noise. In 2022, Gaudani and Patel presented a work which compared different robust feature extraction techniques for the Hindi language ASR system as a low-resource language [3]. The authors compare various feature extraction methods to identify the most effective approach for Hindi ASR. The study aims to improve ASR performance in the context of Hindi, which is a widely spoken language in India. Dubey and Shah [10] presented a deep learning-based ASR system tailored to the Indian-English accent. The study aims to improve ASR accuracy and performance in the context of Indian-English speech by leveraging deep speech recognition techniques. In 2021, Anoop [11] presented a Connectionist Temporal Classification (CTC)-based E2E ASR system for the limited resource Sanskrit language. They proposed the utilization of spectrogram augmentation techniques to enhance the performance of the ASR system. This work aims to address the challenges of ASR in low-resource languages and specifically focuses on Sanskrit. The suggested approach, which draws inspiration from the aforementioned efforts, notably [3, 5, 9], uses integrated features and a hybrid CNN-BiGRU architecture to enhance the performance of E2E ASR for a noisy Gujarati dataset.
3 Proposed Method To improve the accuracy and performance of ASR, a proposed approach can be utilized that combines different features like MFCC, GTCC, and MFCC+GTCC feature extraction techniques with a hybrid CNN-BiGRU model. The MFCC and GTCC features are particularly effective in capturing temporal and spectral information from the speech signal, while the hybrid CNN-BiGRU architecture allows for effective modelling of both local and global context. Here is a step-by-step outline of the proposed approach shown in Fig. 1.
3.1 Dataset Preparation We utilized the low-resource clean crowd-sourced Gujarati language high-quality multi-speaker speech data and to create a noisy dataset, we added noisy background disturbance (white noise) to the above-mentioned crowd-sourced Gujarati dataset to prepare a noisy Gujarati language dataset. Here, we utilized one clean dataset and one noisy Gujarati dataset, which were separated into three independent components for the proposed work: a training set, which included 80% speech utterances, a validation set, which contained 10% speech utterances, and a testing set, which contained 10% speech utterances. The validation set aids in tracking the model’s performance throughout training, while the testing set is utilized for the final assessment. The training set is used to train the model.
Enhancing Performance of Noise-Robust Gujarati Language ASR …
225
Fig. 1 Proposed E2E noise-robust ASR architecture
3.2 Speech Feature Extraction In this section, the proposed work uses combining MFCC and GTCC techniques which are used in speech and audio signal processing to enhance the representation of spectral features for improved performance in different applications such as ASR and speaker recognition. MFCC is a widely used feature extraction method in ASR systems. It involves converting the audio signal’s short-term power spectrum into a set of coefficients that capture the spectral envelope. This is achieved by applying a series of operations, including the application of a Mel filterbank, logarithmic compression, and discrete cosine transform. On the other side, GTCC is a feature extraction method inspired by the human auditory system. It uses a series of gammatone filters, which model the frequency selectivity of the cochlea, to analyse the power spectrum of the audio signal. The resulting coefficients represent the magnitude of the filtered signal, capturing both spectral and temporal characteristics. By combining MFCC and GTCC, the aim is to leverage the strengths of both methods and create a more comprehensive representation of the audio signal [5, 6]. This can potentially improve the discrimination and classification capabilities of the feature set. The combination of MFCC and GTCC can be done in different ways. One common approach is to concatenate the feature vectors derived from both methods, resulting in a longer feature vector that contains information from both spectral and temporal domains. This combined feature vector can then be used as input to a classification algorithm or fed into subsequent processing stages in an ASR system.
226
B. Bhagat and M. Dua
3.3 CNN-BiGRU Hybrid Acoustic Architecture The CNN-BiGRU ASR model with a CTC layer is an E2E deep learning architecture designed to perform speech recognition tasks [9]. This model combines Convolutional Neural Networks (CNNs) and Bidirectional Gated Recurrent Units (BiGRUs) to effectively capture both local and temporal dependencies in speech data. By incorporating a CTC layer, our model can immediately generate variable-length sequences as output, avoiding the need to match the lengths of the input and output sequences. Below is the breakdown of the architecture. Input Layer: Speech signals with acoustic information extracted are sent to the model’s input layer. The features MFCC, GTCC, and MFCC+GTCC are frequently used features. These characteristics are commonly expressed as a time-frequency matrix, where the columns stand in for frequency bins and the rows for time steps. Convolutional Neural Network (CNN): There are several convolutional layers that make up the CNN portion of the model. Each layer applies a set of learnable filters to extract local patterns from the input acoustic features. The filters capture different frequency and temporal characteristics. To incorporate non-linearity, the output of each convolutional layer is often passed via a non-linear activation function, such as Rectified Linear Unit (ReLU). Here, ReLU was chosen over sigmoid or other activation functions in this case because it alleviates the vanishing gradient problem, speeds training, and encourages better representation learning, resulting in improved model performance. After each convolutional layer, max pooling is applied to downsample the feature maps. Max pooling reduces the spatial dimensions, preserving the most relevant information while providing some degree of translation invariance. This helps the model focus on the most salient features. Bidirectional Gated Recurrent Units (BiGRUs): Following the CNN layers, the model utilizes a stack of five BiGRU layers. BiGRUs is a type of recurrent neural network (RNN) variant that enables modelling of temporal dependencies in sequential data. The BiGRUs capture information from past time steps and update their internal hidden states based on the current input and previous states. This allows the model to retain context and long-term dependencies. CTC Layer: The CTC layer [8] conducts the temporal alignment between the input and output sequences using the output of the BiGRU layers. The CTC layer applies a softmax activation over each output time step to compute the probability distribution over a set of output labels. The model is trained using the CTC loss function, which incorporates the concept of “blank” tokens to handle variable-length alignments. The CTC layer enables the model to output sequences directly without any requirement for explicit input and output alignment. Overall, the CNN-BiGRU ASR model with a CTC layer leverages the power of CNNs for local feature extraction, BiGRUs for capturing temporal dependencies, and the CTC layer for handling variable-length outputs. This architecture has been widely used in various ASR applications, attaining cutting-edge performance in ASR workloads.
Enhancing Performance of Noise-Robust Gujarati Language ASR …
227
3.4 Text Decoding During inference, the output of the CTC layer is typically subjected to a decoding step to convert the probability distribution into the final recognized sequence. Decoding approaches like greedy search or prefix beam search are commonly used to find the most likely sequence of labels. Here, we used greedy search decoding [11] which is a simple and efficient technique used in ASR systems. It involves selecting the most likely word or unit at each step based on the highest probability from the language model. Greedy decoding is computationally efficient and can operate in real-time scenarios. However, it may overlook global context and make suboptimal decisions. It can struggle with rare or out-of-vocabulary words. We also employ prefix beam decoding to improve the performance of the system. Overall, both decoding are a straightforward method for ASR, but it may sacrifice accuracy and context awareness compared to more advanced techniques.
4 Experiment Setup and Results We utilized the multi-voice high-quality speech dataset for language Gujarati, collected using crowd sourcing to implement the proposed work into practice. The sample has two different kinds of speaker voices (Male and Female) for which this research suggested multi-person ASR models. We configured a machine with a Tesla T4 GPU having 12 GB RAM and a system operating system having Windows 10 with 8GB of RAM. Using 10 size data batches for training and validation task, the multi-person model was trained for up to 30 epochs over the period of 24 h. Gradient descent was carried out via the Adam optimizer, and loss was determined by the CTC function. Below, Figs. 2 and 3 display the loss determined for each period, which are evaluation metrics called WER.
4.1 Multi-person ASR Performance Analysis A multi-person ASR system was created using 3,418 of the 4,272 Gujarati speech utterances in the dataset (23 h total) for training, 427 for model validation, and the remaining 427 for testing. From Table 1, we observed that in a clean environment the WER of hybrid CNN-BiGRU ASR systems has reduced 4.6% from the initial WER 62.72 of greedy decoding with MFCC. And from Table 2, we observed that in noisy environments, the WER of hybrid CNN-BiGRU ASR systems has reduced 7.83% from the initial WER 67.27 of MFCC features with greedy decoding.
228 Fig. 2 Loss plot for proposed architecture
Fig. 3 Loss plot for GTCC features
Table 1 Experimental results of ASR model in clean dataset System WER(%) MFCC + hybrid CNN-BiGRU model + Greedy 62.72, 61.89 decoding and prefix beam GTCC + hybrid CNN-BiGRU model + Greedy 65.31, 64.7 decoding and prefix beam MFCC+GTCC+ hybrid CNN-BiGRU model + 61.5, 58.12 Greedy decoding and prefix beam
B. Bhagat and M. Dua
Enhancing Performance of Noise-Robust Gujarati Language ASR …
229
Table 2 Experimental results of ASR model in Noisy dataset System WER(%) MFCC + hybrid CNN-BiGRU model + Greedy 67.27, 64.7 decoding and prefix beam GTCC + hybrid CNN-BiGRU model + Greedy 66.73, 63.54 decoding and prefix beam MFCC+GTCC+ hybrid CNN-BiGRU model + 63.17, 59.44 Greedy decoding and prefix beam
4.2 Multi-person Model Loss Graph In a clean and noisy environment, this section provides the loss value for training and validation of hybrid CNN-BiGRU ASR systems with Prefix Beam decoding. Figures 2 and 3 below illustrate that when comparing ASR systems with either GTCC or MFCC features, the loss graph reduced for multi-person ASR model with integrated features MFCC + GTCC in noisy situations.
4.3 Comparison of Proposed Methods with Existing Works The goal of this study is to enhance the performance of E2E ASR systems for Gujarati speech. The researchers propose using MFCC and GTCC feature extraction techniques, building upon previous methods that have utilized various features and models such as Log-mel acoustic feature, MFCC spectrogram features, VTLN factor, and log frequency spectrogram with hybrid LSTM-CTC with attention mechanism, CNN Bidirectional LSTM model, RNN combined with CTC, and Monolingual and Multilingual models. Table 3 presents the results of the proposed model, which uses different feature extraction techniques, greedy decoding method using the CNN-BiGRU architecture for noisy as well as for clean dataset. The combination of MFCC integrated with GTCC features with hybrid acoustic architecture and greedy decoding in clean and noisy environments outperforms previous combinations in their experiments. The observed results clearly indicate that the suggested approach significantly reduces the WER by 7.83% compared to existing work, which only achieved a reduction of 5.87% in a clean dataset. By comparing the results presented in Table 3, it becomes evident that using MFCC integrated with GTCC features with the hybrid acoustic CNN-BiGRU architecture and prefix decoding methods achieves superior performance compared to other models in both clean and noisy datasets.
230
B. Bhagat and M. Dua
Table 3 Comparison with existing work Research Feature Architecture extraction method Bill [2]
VTLN factor and Feature frame rate
Ravel et al. [16]
MFCC
Diwan and Joshi [17]
Log-mel acoustic feature
Proposed approach
MFCC + GTCC
LSTM-CTC, Monolingual or Multilingual Training CNN, BiLSTM, CTC
Hybrid LSTM- CTC, Attention mechanism, Conformers based model CNN-BiGRU Model
Decoding
Dataset
WER(%)
NA
CMU-INDIC dataset (Gujarati)
20.91, 21.44, 19.33, 19.30
Greedy or Prefix decoding with LMs’ and Spell corrector BERT Beam decoder
Microsoft Speech Corpus (Gujarati)
70.65, 69.94, 64.78
Greedy decoding and prefix beam
Multi-voice 61.5, 58.12, high-quality 63.17, 59.44 Gujarati dataset and Noisy Gujarati dataset created
Microsoft 39.6, 59.9 speech corpus (Indian languages)
5 Conclusion The suggested strategy of merging MFCC, GTCC, and the hybrid CNN-BiGRU model has demonstrated promising results in improving the accuracy and performance of the Gujarati ASR system. By utilizing the temporal and spectral information captured by MFCC and GTCC features, along with the hybrid CNN-BiGRU architecture’s ability to model local and global contexts, the system achieved significant decrease in WER compared to baseline architecture. The experiments were conducted on a low-resource crowd-sourcedGujaratilanguagehigh-qualitymulti-personspeechdataset,whichwas further augmented by adding white noise to simulate a noisy environment. From the results, it can be observed that in a clean environment, the hybrid CNN-BiGRU ASR system achieved a 4.6% reduction in WER compared to the initial WER of 62.72% obtained by greedy decoding with MFCC. Similarly, in a noisy environment (as shown in Table 2), the hybrid CNN-BiGRU ASR system demonstrated a substantial improvement, with a 7.83% reduction in WER compared to the initial WER of 67.27% achieved by greedy decoding with MFCC.
Enhancing Performance of Noise-Robust Gujarati Language ASR …
231
Overall, the combination of MFCC, GTCC, and the hybrid CNN-BiGRU model holds promise for further enhancing ASR systems, potentially enabling their deployment in various real-world applications that require accurate and reliable speech-totext conversion.
References 1. Deshmukh AM (2020) Comparison of hidden markov model and recurrent neural network in automatic speech recognition. Eur J Eng Technol Res 5(8):958–965 2. Billa J (2018) ISI ASR system for the low resource speech recognition challenge for Indian languages. Interspeech 3. Gaudani H, Patel NM (2022) Comparative study of robust feature extraction techniques for ASR for limited resource Hindi language. In: Proceedings of second international conference on sustainable expert systems (ICSES 2021). Springer Nature, Singapore 4. Lakshminarayanan V (2022) Impact of noise in automatic speech recognition for low-resourced languages. Rochester Institute of Technology 5. Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10:2301–2314 6. Dua M, Aggarwal RK, Biswas M (2018) Discriminative training using noise robust integrated features and refined HMM modeling. J Intell Syst 29(1):327–344 7. Graves A et al (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning 8. Bourlard HA, Morgan N (1994) Connectionist speech recognition: a hybrid approach, vol 247. Springer Science & Business Media 9. Maji B, Swain M, Panda R (2022) A feature selection based parallelized CNN-BiGRU network for speech emotion recognition in Odia language 10. Dubey P, Shah B (2022) Deep speech based end-to-end automated speech recognition (asr) for indian-english accents. Preprint at arXiv:2204.00977 11. Anoop CS, Ramakrishnan AG (2021) CTC-based end-to-end ASR for the low resource Sanskrit language with spectrogram augmentation. In: 2021 National conference on communications (NCC). IEEE 12. Joshi B et al (2022) A novel deep learning based Nepali speech recognition. In: International conference on electrical and electronics engineering. Springer, Singapore 13. Ephrat A et al (2018) Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. Preprint at arXiv:1804.03619 14. Bhogale K et al (2023) Effectiveness of mining audio and text pairs from public data for improving ASR systems for low-resource languages. In: ICASSP 2023-2023 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE 15. Diwan A et al (2021) Multilingual and code-switching ASR challenges for low resource Indian languages. Preprint at arXiv:2104.00235 16. Raval D et al (2021) Improving deep learning based automatic speech recognition for Gujarati. Trans Asian Low-Resour Lang Inf Process 21(3):1–18 17. Diwan A, Jyothi P (2020) Reduce and reconstruct: ASR for low-resource phonetic languages. Preprint at arXiv:2010.09322
Random Forest (RF) Assisted and Support Vector Machine (SVM) Algorithms for Performance Evaluation of EDM Interpretation Vivek John, Ashulekha Gupta, Saurabh Aggarwal, Kawerinder Singh Sidhu, Kapil Joshi, and Omdeep Gupta
Abstract Hard and electrically conductive materials can be shaped and given detailed characteristics using the non-traditional machining technique known as electro-discharge machining (EDM). Artificial Intelligence and Machine Learning are being utilised to predict high precision operations in the intricate electrochemical discharge machining process. In the proposed work, during an electro-discharge chemical machining (EDM) operation, the support vector machine (SVM), a supervised learning system developed using a novel artificial intelligence paradigm, is used to anticipate three responses, including pulse time on, pulse time off, current, and servo feed. In comparison to linear and quadratic models, the SVM-based results demonstrate good agreement between the experimental and projected response values. Servo feed is the parameter of the four used in the EDM process that has been found to have the greatest impact on the outcomes under consideration. Based on the figures for accuracy, Reminder, correctness, and F1, RF has higher ratios V. John (B) · S. Aggarwal Department of Mechanical Engineering, Uttaranchal Institute of Technology, Uttaranchal University, Dehradun, India e-mail: [email protected] S. Aggarwal e-mail: [email protected] A. Gupta Department of Management Studies, Graphic Era (Deemed to be University), Dehradun, India e-mail: [email protected] K. S. Sidhu Department of Mechanical Engineering, Uttaranchal Institute of Technology, Dehradun, India e-mail: [email protected] K. Joshi Department of Computer Science and Engineering, Uttaranchal University, Dehradun, India e-mail: [email protected] O. Gupta School of Management, Graphic Era Hill University, Dehradun, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_20
233
234
V. John et al.
across all four measurement scales, with respective values of 93.95, 92.62, 89.62, and 91.77%. In order to maintain high levels of precision in their job, RFs have the highest chance of determining the best machining condition and performance. Keywords Machine learning · Algorithms · EDM · Performance and machining
1 Introduction Electrostatic Discharge is the most effective contactless method for high quality machining, particularly when processing materials that are challenging to cut, like alloys, composites, and ceramics. By generating a spark amid a workpiece and a slender wire electrode (typically below 0.5 mm of diameter), deionized water is employed in wire EDM as the dielectric medium to grind down the workpiece and build complex two- and three-dimensional structures [1–4]. It is challenging to produce intricately shaped components with varying hardness or sharped edged components using conventional techniques [5, 6]. WEDM is a sort of thermal machining that makes these possibilities possible. The greatest technique for building micro-scale products where a high degree of precision work is required an EDM is preferred, which has advanced from a straightforward method for creating tools and dies [7]. A non-conventional method of material removal called WEDM can be utilised to give components intricate forms and contours [8]. Extremely small corner radii are possible with WEDM, which has a wire electrode with a diameter of 0.05– 0.3 mm that is frequently moving. Mechanical equipment that tensions the wire makes it less likely to produce erroneous pieces [9, 10]. Wire-electro-discharge machining removes material from electrically conductive materials by using a thermo-electric energy source. Controlled erosion involves repeatedly igniting sparks amid electrodes, or workpiece and tool, in order to remove material. According to these multiple optimisation techniques, a sharp optimisation structure with a graphical user interface (GUI) has been formed, allowing users to obtain the optimised cutting parameters under the considered necessary surface roughness (Ra) in Wire EDM malfunction forecast and progression control based on sensor fusion [11]. Machining failures cannot be totally eliminated, despite several attempts to optimise the procedure. Here, a model for offline categorization is proposed to forecast machining failures [12]. Data must be available for modern manufacturing. Based on data supplied by the process in real-time, manufacturers make judgements. This is essential to a company’s ability to compete and adapt in the current and immediate future [13]. Whilst the latest machines include embedded technologies, many legacy machines are still functioning today, and they are not intended to be linked to the internet [14]. For an individual spark in the third stage, the real-time status of the machined workpiece is plotted utilising live data capture, data analysis, and numerous spark simulations. EDM data is processed and saved in a big data set that is then utilised to plot the big data set [15]. Artificial intelligence and its subfields have mostly assumed certain important responsibilities. Particularly, using structured datasets and machine
Random Forest (RF) Assisted and Support Vector Machine (SVM) …
235
learning, many things can be detected automatically. A type of artificial intelligence is machine learning that often makes use of a wide range of probabilistic, optimisation, and statistical techniques to increase effectiveness on the basis of prior knowledge and new data. Several machine learning methods have been broadly employed for computerised real-time monitoring and error prediction. Nowadays, using a branch of artificial intelligence such as machine learning algorithms is common, it is simpler to recognise input parameters and their consequences. The task of classification requires supervision and is challenging. Numerous classification algorithms, including those used in [16]. Throughout the preceding few decades, artificial intelligence along with algorithms of machine learning has developed forecasting models to improve efficient choice-making. These machine learning algorithms work well to conduct research, uncover several findings in a data set, and assess whether or not a complex design can be machined [17]. The primary objective of this research is to figure out exactly how successful ML is in finding poor machining capabilities. This research evaluates two prevalent machine learning algorithms using a machining input data set. These approaches are utilised to create complicated form items, such as parts for Aerospace, Chemical, and Biomedical applications. The analysis is concerned on the application of machine learning algorithms to categorise EDM, as well as the flourishing classification approach in them. This research will assist in determining the preeminent performing algorithms, particularly the Support Machine and Random Forrest algorithms for identifying EDM processes. This input will be valuable for medical purposes in order to make prompt decisions in order to avoid losses. This study includes a literature survey, methodology, outcomes, discussion on outcomes and conclusions. In the SVM classifier procedure, RFE and SVM are merged [18]. RFE is a method that uses recurrence for choosing features from a dataset according to the smallest attribute value. Consequently, the incorrect are erased in all SVM-RFE cycles. The above data collection is commonly utilised for classification purposes. The data set was evaluated using ML and DL algorithms. On top of ANN, other ML methods have conquered surface roughness forecast by employing electrical discharge machining [19]. As stated by Yasui and Wang (2003), the Random Forest (RF) stratagem is based upon a recursive means in which each iteration is made by picking up a random sample of range N from the data set. There is no substitute for forecasters. The information gathered is then used to design a module. Based on previous research, this study assesses the effectiveness of SVM and RF algorithms in categorising EDM records. Researchers in the past have already offered a variety of approaches for forecasting the performance of the ECM processes and examining the intricate interactions between the input parameters and responses. Artificial neural networks (ANN) and grey relation analysis (GRA) were used by Ashokan et al. [20] for modelling and multi-objective optimization of an ECM process. It was determined that ANN will produce better response prediction with regard to percentage variation between training and testing datasets. Senthilkumar et al. An ANN model was used to anticipate the results of the current experimental study of flank wear and surface roughness, which was then experimentally assessed [21–23]. Zhang et al. developed hybrid models based on SVM and the Gaussian kernel function to predict operating time
236
V. John et al.
and electrode rates of wear in a micro-electrical discharge machining process [24]. Utilisation of SVM and Radial Basis Function Neural Network (RBFNN) Methodologies for Wafer Quality Forecast in a Semiconductor Production Method has been proposed by Chou et al. [25]. It was claimed that the SVM strategy would produce predictions with greater accuracy than the RBFNN method. Lu et al. [26] built forecasting algorithms for imagining SR features in various procedures for machining using SVM. Optimized variables for auto-SARIMA models are chosen in order to achieve an ideal fit between the test data and its forecast outcome [27]. It was determined that the XGB model is more effective at forecasting than the FB prophet model in terms of greater accuracy and better fitting [28]. As a result, it can be seen that methods based on artificial intelligence are the best means of creating forecasting algorithms for imagining the intricate substance removal behaviour of different machining operations. Due to the EDM’s stochastic characteristic, random fluctuations in the replies are fairly visible. These unpredictable changes in testing data can be successfully absorbed with a specific tolerance value for intelligent prediction. While anticipating the relevant response values, the implementation of SVM can be a clever option for handling the complex behaviour of EDM processes. Consequently, an attempt is made in this study to accurately anticipate three reactions to an EDM process, namely pulse time on, pulse time off, current, and servo feed. The expected results are subsequently evaluated to the actual trial outcomes, demonstrating the SVM algorithm’s great prediction accuracy.
2 Methodology The subsequent combination of several input parameter ranges is updated and given in Table 1 in order to produce the best results. The Taguchi L-16 orthogonal arrays have guided the selection of the two stages of process variables. The whole experimental design and response values are displayed in Table 2. These EDM process parameters are used in this case as the SVM algorithm’s inputs in order to accurately anticipate the appropriate response values. The creation of SVM-based models for accurate forecasting of three responses utilising training data sets is covered in this study. The ideal values for all four parameters determine the efficiency and preciseness of these models. Due to its widespread acceptance and potential for usage with higher-dimensional input spaces, the GRBF method is employed. Table 1 Input performance parameters Inputs
Pulse time on (Ton) (μs)
Pulse time off (Toff) (μs)
Current (amps)
Servo feed (mm/min)
Level 1
33
9
3.5
18
Level 1
35
12
4.5
12
Random Forest (RF) Assisted and Support Vector Machine (SVM) …
237
Table 2 Experimental trial runs S. No
Pulse time on (μs)
Pulse time off (μs)
Current (amps)
Servo feed (mm/min)
Surface roughness Ra (μm)
1
34
8
4.5
16
1.96
2
34
8
4.5
10
2.20
3
34
8
5.5
16
2.17
4
34
8
5.5
10
1.51
5
34
13
4.5
16
1.20
6
34
13
4.5
10
2.85
7
34
13
5.5
16
2.01
8
34
13
5.5
10
2.34
9
38
10
4.5
16
1.12
10
38
10
4.5
10
2.22
11
38
10
5.5
16
1.85
12
38
10
5.5
10
1.79
13
38
13
4.5
16
1.42
14
38
13
4.5
10
1.62
15
38
13
5.5
16
1.31
16
38
13
5.5
10
2.49
SVM and RF algorithms were employed in this study to categorise EDM. For the experimental analysis, the machining data with 5 set input data is engaged. A total of 27 experimental and 2 features to examine machining data have been set up at [29]. The subsections will go through two specific algorithms (see Fig. 1).
Fig. 1 Layout of study
238
V. John et al.
2.1 Support Vector Machine (SVM) SVM is an acronym for supervised machine learning classification and is suitable for EDM process monitoring and prediction. SVM engages through accumulating main key inputs from the support vector module and then building a linear function that keeps them as far apart as possible. Consequently, SVM may be said to be utilised to move a key in a vector into high-dimensional space in order to locate the greatest hyperplane for categorising data. The purpose of selecting the relevant hyperplane is to reduce the distance between the choice hyperplane and the following data point with a linear classifier [30]. Figure 2 shows a two-class scatter plot, each with two attributes. The goal of a linear hyperplane is to locate a, b, and c for class 1 and class 2 relations as depicted in Eqs. 1 and 2 [31]. For class 1: ax1 + bx26
(1)
ax1 + bx2 > c
(2)
For class 2:
SVM algorithms, as opposed to other ways, count on support vectors, which encompass the conclusion boundaries to the nearest data sets. In this case, due to eliminating data points contributed to the decision hyperplane has a lesser influence on the border rather than deleting supporting vectors. Fig. 2 SVM roadmap
Random Forest (RF) Assisted and Support Vector Machine (SVM) …
239
Fig. 3 RF workflow
2.2 Random Forest (RF) The noise in the input data set has no impact on RF. One of the primary reasons for using RF in determining the effect of machining input parameters is its capacity to moderate data minorities. Regardless of the fact that 15% of the incoming data set is utilised in the subsequent class, the roughness and material removal rate can be classified as acceptable or unacceptable according to industry standards. In each iteration, The RF method entails choosing an arbitrary replacement dataset of N-sample from an arbitrary N-sample derived from a forecaster with no substitution as shown in Fig. 3. Each procedure is described using the RF method. The data is then partitioned prior to cleaning of external data, and the steps are iterated as per need, which is based on how many trees are needed. Ultimately, the observation was separated into two groups of trees. Following that, decision-making bodies are employed to categorise circumstances based on a majority vote [32].
2.3 Proposed Approach The WEDM machining dataset is used in this study, and the values are evaluated before being tested. To successfully categorise surface roughness and material removal rate for performance evaluation, SVM and RF algorithms were employed to determine the achievement of documenting, accuracy, precision, and F1 score strategies. Figure 4 depicts the planned methodology for this study.
240
V. John et al.
Fig. 4 Planned workflow
3 Results The segment discusses the parameters and offers information to the two classifiers undergoing evaluation. Table 4 shows that servo feed has a significant impact on surface roughness, followed by MRR and current. It also shows that the turning operation’s ideal parameters were A2, B2, and C1. The surface roughness (Ra) input parameter results. Whenever the cutting speed was the fastest, Ra decreased. Higher cutting speeds give rise to the lowest surface roughness since the material being cut softens. Friction between the workpiece and the tool raises the degree of heat in the machining area, which is what causes this.
3.1 Accuracy A classifier’s accuracy determines how accurately instances may be put into a suitable category [33]. The sum of the number of turns in the supplied dataset is divided by the correct forecasting [34, 35]. It is important to mention the classifier’s approach, which varies depending on the analysed set and considerably influences preciseness. As an outcome, while it might offer you a general picture of the class, it is not the best method for evaluating various classifiers. A classifier’s accuracy is an expression of how well it groups situations. It can be evaluated by dividing the total quantity of events in the dataset by the total quantity of right forearms.
3.2 Recall The quantity of positive notes that are accurately predicted as positive is characterised as the rate of correctly foreseen positive observations, also known as recall [36]. This is an important parameter when it comes to machining high-level biomedical equipment for medicine since it displays how many observations are correctly identified.
Random Forest (RF) Assisted and Support Vector Machine (SVM) …
241
Table 3 SVM and RF assessment of performance Algorithms
Exactness (%)
Recall (%)
Precision (%)
F1 score (%)
SVM
93.31
91.06
85.56
88.23
RF
93.95
92.62
89.08
91.77
Table 4 ANOVA for mean variance Level
Current (amps) (A)
Servo feed (mm/min) (B)
MRR (gm/min) (C)
1
−7.452
−7.672
−7.166
2
−7.236
−7.099
−7.609
Delta
0.215
0.578
0.443
Rank
3
1
2
3.3 Precision Precision is a fraction of real positives to real negatives [37] that are considered as true positives. It depicts the ability of the classifier to pick positives and neglect negatives present in data [38].
3.4 F1 Score F1 is a metric of model accuracy to merge precision and recall, much like Boolean operations used to combine two or more ingredients to create something new. Higher F1 score indicates lower number of false (positives and negatives), showing the accurately analysed significant hazards, unaffected by false alarms. Table 3 depicts both algorithms of exactness, recall, precision and F1 scores. According to the evaluation, an F1 score of unity indicates a faultless model, whereas if the model is an utter failure the F1 score is zero.
4 Discussion and Conclusion The primary objective of this study is to investigate the development of appropriate SVM-based models for the efficient prediction of parameter combinations for EDM machining. The study examines connections between current, servo feed, and pulse time on, and off, using linear and quadratic regression models. SVM-based models are more accurate in predicting voltage and responses, making them suitable for predicting the quality attributes of machining methods. The future focus of this
242
V. John et al.
research may be deemed to be the SVM algorithm’s performance predictions for the EDM process under consideration. In the context of evaluation matrices, the results provided in Table 3 depict that Random Forest (RF) performs the most excellent. The Support Vector Machine (SVM) algorithm, on this as opposed to produces high quality results and detects surface roughness and material removal rates that are a little lower than RF. This indicates that RF has a higher likelihood of telling the difference among an excellent finish on the surface and an adequate rate of material removal. Machine learning techniques are commonly used in the technical sector as one of the tools to assist professionals in evaluating data and developing high quality mechanisms. This study provided RF and SVM, two of among the most important extensively used ML techniques for surface roughness and material removal rates. The main character and strategies of both the ML algorithms employed were elaborated. The effectiveness of the explored techniques was analysed using original EDM machining data. The outcome shows slightly enhanced values from RF and the finest practises in classifying performance monitoring than SVM.
References 1. Haddad MJ, Tehrani AF (2008) Material removal rate (MRR) study in the cylindrical wire electrical discharge turning (CWEDT) process. J Mater Process Technol 199(1–3):369–378 2. Tsai TC, Horng JT, Liu NM, Chou CC, Chiang KT (2008) The effect of heterogeneous second phase on the machinability evaluation of spheroidal graphite cast irons in the WEDM process. Mater Des 29(9):1762–1767 3. Mohammadi A, Tehrani AF, Emanian E, Karimi D (2008) Statistical analysis of wire electrical discharge turning on material removal rate. J Mater Process Technol 205(1–3):283–289 4. Yuan J, Wang K, Yu T, Fang M (2008) Reliable multi-objective optimization of high-speed WEDM process based on Gaussian process regression. Int J Mach Tools Manuf 48(1):47–60 5. Saha P, Singha A, Pal SK, Saha P (2008) Soft computing models based prediction of cutting speed and surface roughness in wire electro-discharge machining of tungsten carbide cobalt composite. Int J Adv Manuf Technol 39(1):74–84 6. Sarkar S, Sekh M, Mitra S, Bhattacharyya B (2008) Modeling and optimization of wire electrical discharge machining of γ-TiAl in trim cutting operation. J Mater Process Technol 205(1– 3):376–387 7. Mahapatra SS, Patnaik A (2007) Optimization of wire electrical discharge machining (WEDM) process parameters using Taguchi method. Int J Adv Manuf Technol 34(9):911–925 8. Puri AB, Bhattacharyya B (2005) Modeling and analysis of white layer depth in a wire-cut EDM process through response surface methodology. Int J Adv Manuf Technol 25(3):301–307 9. Bagal DK, Parida B, Barua A, Naik B, Jeet S, Singh SK, Pattanaik AK (2020) Mechanical characterization of hybrid polymer SiC nano composite using hybrid RSM-MOORA-whale optimization algorithm. In: IOP conference series: materials science and engineering, vol 970, no 1. IOP Publishing, p 012017 10. Naik S, Das SR, Dhupal D (2020) Analysis, predictive modelling and multi-response optimization in electrical discharge machining of Al-22%SiC metal matrix composite for minimization of surface roughness and hole overcut. Manuf Rev 7:20 11. Abhilash PM, Chakradhar D (2022) Wire EDM failure prediction and process control based on sensor fusion and pulse train analysis. Int J Adv Manuf Technol 118(5–6):1453–1467
Random Forest (RF) Assisted and Support Vector Machine (SVM) …
243
12. Abhilash PM, Chakradhar D (2020) Prediction and analysis of process failures by ANN classification during wire-EDM of Inconel 718. Adv Manuf 8:519–536 13. Bi Z, Xu LD, Wang C (2014) Internet of things for enterprise systems of modern manufacturing. IEEE Trans Ind Inform 10:1537–1546. https://doi.org/10.1109/TII.2014.2300338 14. Schroder C (2015) The challenges of Industry 4.0 for small and medium-sized enterprises. Friedrich Ebert Found 2015:1–28 15. Jamunkar T (2022) Digital Twin modeling of surface roughness generated by the electrical discharge machining process. Doctoral dissertation, University of Cincinnati 16. Rokach L (2019) Ensemble learning: pattern classification using ensemble methods 17. Tambake NR, Deshmukh BB, Patange AD (2021) Data driven cutting tool fault diagnosis system using machine learning approach: a review. J Phys: Conf Ser 1969(1):012049 18. Rout JK, Rout M, Das H (eds) (2020) Machine learning for intelligent decision science. Springer, Singapore 19. Markopoulos AP, Manolakos DE, Vaxevanidis NM (2008) Artificial neural network models for the prediction of surface roughness in electrical discharge machining. J Intell Manuf 19:283– 292 20. Ashokan P, Ravi Kumar R, Jeyapaul R, Santhi M (2008) Development of multi objective optimization models for electro chemical machining process. Int J Adv Manuf Technol 39(1– 2):55–63 21. Gupta A, Parmar R, Suri P, Kumar R (2021) Determining accuracy rate of artificial intelligence models using Python and R-Studio. In: 2021 3rd international conference on advances in computing, communication control and networking (ICAC3N), Greater Noida, India, pp 889– 894. https://doi.org/10.1109/ICAC3N53548.2021.9725687 22. Gupta A et al (2022) Artificial intelligence and smart cities: a bibliometric analysis. In: 2022 international conference on machine learning, big data, cloud and parallel computing (COMIT-CON), Faridabad, India, pp 540–544. https://doi.org/10.1109/COM-IT-CON54601.2022. 9850656 23. John V, Aggarwal S, Arora RK, Oza A, Verma R (2022) Forecasting the output using ANN models and effect of input factors on machinability of Duplex Steel 2205 in dry-turning operation for high strength and anti-corrosive applications. Adv Mater Process Technol 1–12 24. Zhang L, Jia Z, Wang F, Liu W (2010) A hybrid model using supporting vector machine and multi-objective genetic algorithm for processing parameters optimization in micro-EDM. Int J Adv Manuf Technol 51(5–8):575–586 25. Chou PH, Wu MJ, Chen KK (2010) Integrating support vector machine and genetic algorithm to implement dynamic wafer quality prediction system. Expert Syst Appl 37(6):4413–4424 26. Lu J, Liao X, Li S, Ouyang H, Chen K, Huang B (2019) An effective ABC-SVM approach for surface roughness prediction in manufacturing processes. Complexity 2019:3094670. https:// doi.org/10.1155/2019/3094670 27. Gupta R, Yadav AK, Jha SK, Pathak PK (2023) Long term estimation of global horizontal irradiance using machine learning algorithms. Optik 283:170873 28. Gupta R, Yadav AK, Jha SK, Pathak PK (2022) Time series forecasting of solar power generation using Facebook prophet and XG boost. In: 2022 IEEE Delhi section conference (DELCON), February. IEEE, pp 1–5 29. Bisht YS, John V, Aggarwal S, Anandaram H, Rastogi N, Joshi SK (2022) Application of AI and RSM to optimize WEDM process parameters on D4 steel. In: 2022 2nd international conference on emerging smart technologies and applications (eSmarTA). IEEE, pp 1–5 30. Williams G (2011) Descriptive and predictive analytics. In: Data mining with Rattle and R: the art of excavating data for knowledge discovery (use R!), pp 193–203 31. Yasui Y, Wang X (2009) Statistical learning from a regression perspective. Springer 32. Shailaja K, Seetharamulu B, Jabbar MA (2018) Machine learning in healthcare: a review. In: Second international conference on electronics, communication and aerospace technology, pp 910–914 33. Sutariya K, Vishal Gupta M, Lal B, Rahim Alatba S, Sriramakrishnan GV, Tripathi V (2023) Rabble based autonomous assistance using machine learning algorithms. In: 2023 3rd
244
34.
35.
36.
37.
38.
V. John et al. international conference on advance computing and innovative technologies in engineering (ICACITE), Greater Noida, India, pp 484–486. https://doi.org/10.1109/ICACITE57410.2023. 10182830 Shah SK, Joshi K, Khantwal S, Bisht YS, Chander H, Gupta A (2022) IoT and WSN integration for Data Acquisition and Supervisory Control. In: 2022 IEEE world conference on applied intelligence and computing (AIC), Sonbhadra, India, pp 513–516. https://doi.org/10.1109/AIC 55036.2022.9848933 Negi SS, Memoria M, Kumar R, Joshi K, Pandey SD, Gupta A (2022) Machine learning based hybrid technique for heart disease prediction. In: 2022 international conference on advances in computing, communication and materials (ICACCM), Dehradun, India, pp 1–6. https://doi. org/10.1109/ICACCM56405.2022.10009219 Verma S, Raj T, Joshi K, Raturi P, Anandaram H, Gupta A (2022) Indoor real-time location system for efficient location tracking using IoT. In: 2022 IEEE world conference on applied intelligence and computing (AIC), Sonbhadra, India, pp 517–523. https://doi.org/10.1109/AIC 55036.2022.9848912 Jain A, Somwanshi D, Joshi K, Bhatt SS (2022) A review: data mining classification techniques. In: 2022 3rd international conference on intelligent engineering and management (ICIEM), April. IEEE, pp 636–642 Sharma S, Diwakar M, Joshi K, Singh P, Akram SV, Gehlot A (2022) A critical review on sentiment analysis techniques. In: 2022 3rd international conference on intelligent engineering and management (ICIEM), April. IEEE, pp 741–746
COVID-19 Classification of CT Lung Images Using Intelligent Wolf Optimization Based Deep Convolutional Neural Network Om Ramakisan Varma and Mala Kalra
Abstract Chest computed tomography (CT) imaging is highly reliable and practical in diagnosing and analyzing COVID-19, mainly in the infectious region rather than reverse-transcription polymerase chain reaction (RT-PCR). The intelligent wolf optimization-deep convolutional neural network (deep CNN) classifier is proposed to classify COVID from the CT images in this research. The texture features are acquired from three different regions of the CT chest images, namely the lung region, area, and contour using the texture descriptors, viz “local binary pattern (LBP), local optimal oriented pattern (LOOP)”, and ResNet-101-based features. The texture features form the input to the proposed intelligent wolf-based deep CNN classifier, which performs the COVID-19 classification. The proposed classifier achieves accuracy, sensitivity, and specificity of 85.32%, 85.74%, and 87.57%, respectively for the training–testing ratio of 80–20. The accuracy, sensitivity, and specificity of the intelligent wolf optimization-deep CNN classifier are 88.37%, 89.47%, and 91.29% respectively for the K-fold value of 10. Keywords Intelligent wolf optimization · COVID-19 classification · Computed tomography images · Deep learning · Texture descriptors · Deep CNN
1 Introduction In December 2019, Wuhan, China, had experienced the emergence of the new Coronavirus named COVID-19. Novel Coronavirus spread quickly over the world due to the high infection rate. Millions of people were infected, and the fatality rate was significant [1, 2]. Though, vaccination and medication steps are available for COVID-19, O. R. Varma (B) · M. Kalra Computer Science and Engineering Department, National Institute of Technical Teachers Training and Research, Chandigarh, India e-mail: [email protected] M. Kalra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_21
245
246
O. R. Varma and M. Kalra
identifying the infected patients at an earlier stage is essential to treat them immediately and prevent spreading the virus to others. COVID-19 is an infectious disease so infected people may spread the disease to healthy people [3]. COVID-19 patients can be detected easily, quickly, and effectively with a chest CT-scan [4–9]. However, it requires expert radiologists and consumes more time for COVID-19 diagnosis from a CT-scan. Machine learning (ML) and deep learning (DL) techniques can be utilized for classifying COVID-19 patients using CT scans [3, 10]. Numerous researchers have developed ML techniques for COVID-19 classification utilizing CT-scan images. For instance, some researchers have used linear regression technique to predict COVID-19 patients using medical data [11]. Tang et al. Utilize a random forest approach to classify COVID-19 predicted on the quantitative features [12]. Additionally, various deep learning-based methods for identifying COVID-19 have been presented [13]. Shankar et al. [1] presented the fusion-based feature extraction model with the convolutional neural network (FM-CNN) for the automatic detection and classification of COVID-19. The features are then retrieved using a fusion-based method that uses the “grey level run length matrix (GLRLM), local binary patterns (LBP), and grey level co-occurrence matrix (GLCM)”. The most challenging problem is the enormous number of image classifications, leading to node classes in the grading system. Singh et al. [3] utilized the CNN to divide the patients’ test samples as positive or negative. The hyperparameters present in the CNN classifier are well-tuned by the multi-objective differential evolution (MODE). For modeling and testing the provided test samples, CNN utilizes several layers, enhancing the classification accuracy. There is a need for a large number of training data during the classification of COVID-19 disease. Sun et al. [13] introduced an adaptive feature selection-based deep forest (AFSDF) technique for classifying CT lung images to identify COVID-19. Initially, the location-based features are extracted, and the advanced description of the extracted features is collected from the CT images with limited data. In addition, the relevant features are selected by utilizing the trained deep forest method, but the trained features are challenging to classify. Even though the deep forest may acquire the features at a high level, the learned features are always challenging to categorize. The deep learning method is not employed for obtaining the manually created features. Pathak et al. [10] utilized the deep transfer learning method to classify CT images and identify the bilateral change in the sample images. The hyperparameter’s optimal selection is not considered for enhancing the classification performance. Li et al. [14] presented the ensemble deep learning method with the diagnosis algorithm, which ensembles the visual geometry group (VGG16) and the stack generalization technique. The ensemble method plays a significant role in the classification of disease in a rapid manner. Only the 2D images are considered for testing and training the presented technique, and the 3D spatial data is not utilized. The dataset employed in this technique is not in the authentic “DICOM” data format, and the image is damaged during data processing using the ensemble deep learning method.
COVID-19 Classification of CT Lung Images Using Intelligent Wolf …
247
Redie et al. [15] developed an improved DarkCovidNet CNN model to detect COVID-19 cases using X-ray images for binary and multiclassification. Their model achieved 99.53 and 94.18% accuracy for binary and multiclassification scenarios.
1.1 Challenges The challenges that arise during the classification of the coronavirus by the deep learning techniques are revealed in this section. In addition, Support Vector Machines (SVM) were created using features that were obtained from the CNN structure and classified using a traditional Machine Learning (ML) method. Healthcare continues to be ineffective although these techniques produce efficient results [1]. Even though the deep forest may acquire the features at a high level, the learned features are always challenging to categorize, and the deep learning method is not employed for obtaining the manually created features [13]. There are various issues with the three-dimensional deep neural network for identifying the COVID-19 disease, including a lengthy training period, a small test set size, limited accuracy, and a lengthy training period [14]. Only the VGG model is employed, certain novel network approaches are not used, and the classification model is only trained and tested on 2D images, not on the rich geographical data retained in the structure of 3D [14]. The publicly accessible dataset employed in this technique is not in the authentic DICOM information setup, and the image is damaged during data processing by using the ensembled deep learning method [14]. This paper aims to develop an intelligent wolf optimization-based deep CNN to classify COVID-19 from the CT-scan images. Lung region, area, and contour, significant texture features from CT-scan images are attained utilizing LBP, LOOP, and ResNet-101-based features. The primary contributions of this research work are as follows. • Intelligent wolf optimization enhances Grey wolf optimizer (GWO) [16] by incorporating the wolf’s previous hunting experience into the initial three best solutions, resulting in improved hunting behavior by attaining the best possible solutions. • The intelligent wolf optimization is utilized for tuning the deep CNN classifier’s hyperparameters to enhance the classification performance. This paper is arranged as follows: Sect. 2 elaborates on the proposed method for the COVID-19 disease classification with the mathematical model of the optimization. Section 3 presents the experimental setup and results. Finally, Sect. 4 concludes the paper.
248
O. R. Varma and M. Kalra
2 Proposed Methodology The block diagram for the proposed COVID-19 classification is shown in Fig. 1. The COVID-19 CT scans data [17] is provided as input for evaluating the performance of the optimized deep CNN classifier. A pre-processing step is performed on the input images to eliminate the background noise and extract the region of interest. The pre-processed images extract the features for providing the input to the proposed intelligent wolf-based deep CNN classifier. The features, such as descriptor-based, statistical, and shape and texture-based descriptors, are extracted for classification. The intelligent wolf optimization [16, 18] is proposed and utilized for tuning the hyperparameters of the classifier, which enhances the performance in the COVID-19 disease classification.
2.1 Data Collection The COVID-19 CT scan dataset was collected from Kaggle for this study [17]. This dataset contains CT scan images of COVID-19 patients along with segmented lung and infection regions.
Fig. 1 Block diagram of COVID-19 classification
COVID-19 Classification of CT Lung Images Using Intelligent Wolf …
249
2.2 Image Pre-processing Although the accuracy of the information provided by original data is limited to a basic level, the most current development in CNN models has demonstrated remarkably quick identification of hidden features of visual data. Practical data preparation approaches are needed to further enhance the quality of an image for the recognition and classification of COVID-19 by providing the model with only the interesting portions of an image and conveying to discard the noise. It involves sorting out noise or unnecessary data from the CT images without removing important information. The lungs in the CT image have been isolated from background noise by image processing, which provides the information required by the CNN model.
2.3 Feature Extraction The texture features extracted from CT images would improve classification accuracy, and the texture descriptors used for feature extraction are mentioned below. The required region of interest (ROI) is extracted from the CT image during feature extraction. Each pixel present in the image with the low-level feature is converted into a high-level feature due to valuable information in the high-level features. The LBP, LOOP, and ResNet 101 are the significant texture features involved in the feature extraction technique. The ROI is frequently double or even larger than the pattern size for which the ROI images are fed to the Resnet-101, a pre-trained residual network model, to retrieve the features of the 1000 fully connected layers.
2.3.1
Resnet-101
The ResNet-101 architecture introduced the principle of residual blocks to address the issue of the loss of functionality. Batch Normalization is the basis of ResNet, in which batch normalization modifies the input layer to improve network performance. The issue of parameter displacement in the network is reduced, and the residual blocks improve the network performance.
2.3.2
Texture-Based Features
LBP and LOOP are used as texture-based descriptors. The grey-scale constant texture primitive known as the LBP operator has become very popular for characterizing an image’s texture. By thresholding each pixel’s P-neighbor values with the center value and converting the outcome into a binary integer, it identifies the individual pixel of an image.
250
O. R. Varma and M. Kalra
LBP f,r (m d , n d ) =
f −1 e b f − bd 2 f
(1)
f =0
e(m) =
1, m ≥ 0 0, m < 0
(2)
where for the center pixel (m d , n d ), the grey value is denoted as bd , for uniformly spaced pixels f corresponding grey value is denoted as b f with the radius r in the circle circumference, and e denotes the pattern present in the image. LOOP is the non-linear incorporation of the features, such as the “LBP” and LTP, and the LOOP for the pixel (m d , n d ) is formulated as, LOOP(m d , n d ) = e(m) =
7
e( pu − pd ) · 2su
(3)
u=0
1 if m ≥ 0 0 otherwise
(4)
where pd be the intensity of the image, pu be the intensity of the pixel with the neighborhood pixel as (m d , n d ) except the center pixel intensity pd . Finally, the texture features from Resnet-101, LBP, and LOOP are combined to form the feature vector, which is then fed into the intelligent wolf-based deep CNN classifier.
2.4 Intelligent Wolf Optimization Based Deep CNN Classifier for COVID-19 Classification Using the Texture CT Features The features attained from the chest images are provided as input for the classification of COVID-19 using the convolution layers, pooling, and the fully connected layer. The deep CNN classifier effectively classifies the disease, where the classifier’s performance is enhanced through the proposed optimization that is developed by integrating the intelligent hunting characteristics of the wolf. As a result, the intelligent wolf optimization-based deep CNN effectively classifies the COVID-19 disease from the patient’s CT images.
2.4.1
Intelligent Wolf Optimization
Tuning the hyperparameters improves the deep CNN classifier’s performance. If the hyperparameters are not properly tuned to reduce the loss function, the accuracy may be lower with the worse confusion matrix. The value of hyperparameters is achieved
COVID-19 Classification of CT Lung Images Using Intelligent Wolf …
251
by employing the proposed intelligent wolf optimization, which improves model performance with a minimal loss function. Apex predators are at the head of the hunting hierarchy, and intelligent wolves are one of these [16], in which most intelligent wolves want to live in packs. Alphas, or the leaders, are both female and male; the alpha is primarily reliable for position to make decisions on foraging, where to sleep, when to wake up, and other issues. The pack must follow the alpha’s directives; however, an alpha has also been seen to behave democratically by adhering to the pack’s other wolves. The entire pack bows to the alpha during assemblies by lowering their tails downward. Since the pack should obey the intelligent alpha wolf’s commands, this predator is also called the dominating wolf. Only the alpha wolves are permitted to mate, and it is interesting to note that the alpha is not always the most physically powerful member of the pack but rather the greatest at leading the pack. This demonstrates that a pack’s structure and discipline are far more crucial than its physical power. Beta is the next position in the intelligent wolf hierarchy, which are indeed the wolves below the alpha who assist in making decisions or participating in other pack activities. Omega is the intelligent wolf with the poorest ranking; thus, the omega serves as the victim. All other dominant wolves must continuously yield to omega wolves, and they are the final wolves that are permitted to consume food. Although the situation could appear that the omega is a relatively unimportant member of the group, and is noted that when the omega is lost, the entire pack has inner conflicts and issues. A delta wolf is referred to as a subordinate and the delta wolves rule the omega.
2.5 Mathematical Modeling of the Intelligent Wolf Optimization The mathematical modeling of the proposed intelligent wolf optimization is discussed in this section in four phases: communal ordering, prey encircling, foraging, grabbing the prey, and searching.
2.5.1
Communal Ordering
In the communal ordering of an intelligent wolf, the alpha is considered the top predator depending on the fitness solution. The subsequent solutions are the intelligent beta wolf and the intelligent delta wolf. The remaining intelligent wolf is considered an omega, following the rules governed by the pack’s top alpha, beta, and delta wolves.
252
2.5.2
O. R. Varma and M. Kalra
Prey Encircling
The intelligent wolves surround the prey during foraging and are mathematically expressed using the following equations. = S · Z g (T) − Z(T) W
(5)
+ 1) = Z g (T) − P · W Z(T
(6)
, coefficient where the current iteration is shown as T for encircling behavior W vectors are represented as P and S, the hunting position of an intelligent wolf is and the position of the prey during hunting is represented as Z g. denoted as Z, Equation (6) is modified in a standardized form as follows. Z(T + 1) − Z(T) = − P · W
(7)
According to the Eqs. (2) and (3) in [17], the position updation of the intelligent wolf is represented in Eqs. (8) and (9) as follows. Dγ [Z(T + 1) − Z(T)] = − P · W 1 1 Z(T + 1) − γ Z(T) − γ Z(T − 1) − γ (1 − γ )Z(T − 2) 2 6 1 − γ (1 − γ )(2 − γ )Z(T − 3) = − P · W 24 1 1 Z(T + 1) = γ Z(T) + γ Z(T − 1) + γ (1 − γ )Z(T − 2) 2 6 1 + γ (1 − γ )(2 − γ )Z(T − 3) − P · W 24
(8)
(9)
(10)
In Eq. (10), the present position of the intelligent wolf is represented during foraging, for which the various previous positions are considered for the effective foraging of the wolf depending on the fitness solution. The coefficient vectors are evaluated using Eqs. (11) and (12) as follows. P = 2h · l1 − h
(11)
S = 2 · l2
(12)
where, the value of h is gradually reduced to 0 from the value of 2 during the various iterations. The random numbers l1 and l2 are in the range of 0 and 1.
COVID-19 Classification of CT Lung Images Using Intelligent Wolf …
2.5.3
253
Foraging Phase
The initial position of alpha, beta, and delta intelligent wolf is represented in Eq. (13), the positions are updated randomly depending on these positions. + 1) = Z1 + Z2 + Z3 Z(T 3
(13)
The position change of the intelligent wolf is also modified in a standardized form as follows. Z(T + 1) − Z(T) =
2 + Z 3 1 + Z Z − Z(T) 3
(14)
According to the Eqs. (2) and (3) in [17], the position updation of the intelligent wolf is represented in Eqs. (15) and (16) as follows. Dγ [Z(T + 1) − Z(T)] =
2 + Z 3 1 + Z Z − Z(T) 3
1 1 Z(T + 1) − γ Z(T) − γ Z(T − 1) − γ (1 − γ )Z(T − 2) 2 6 2 + Z 3 1 Z1 + Z − γ (1 − γ )(2 − γ )Z(T − 3) = − Z(T) 24 3 2 + Z 3 1 + Z
1 Z + Z(T) γ − 1 + γ Z(T − 1) 3 2 1 1 + γ (1 − γ )Z(T − 2) + γ (1 − γ )(2 − γ )Z(T − 3) 6 24
(15)
(16)
Z(T + 1) =
(17)
Generally, the position updation of the wolf is dependent on the first three best solutions Z 1 , Z 2 , and Z 3 , but in the proposed Eq. (17), the previous hunting experience of the wolf is integrated to enhance the performance of hunting to attain the best possible solutions. The previous hunting experience is represented as in Eq. (17). The γ optimization parameter is in the range of 0 to 1. The final position of vector of the current individual is calculated by Eqs. (18) and (19) denote the updated position of α, β, and δ wolves. α , Z β , Z δ 2 = Z 3 = Z α − P 1 · W β − P 2 · W δ − P 3 · W 1 = Z Z β = S2 · Z δ = S3 · Z α = S1 · Z α − Z , W β − Z , W δ − Z W
(18) (19)
254
2.5.4
O. R. Varma and M. Kalra
Grabbing the Prey
The wolf quits grabbing after getting the prey when it is still and not moving. In where h is mathematical modeling, the h value is dropped together with the P, between 2 and 0. The intelligent wolf can move its position between its current initial location and the prey position based on the random value of P in the range of [−1, 1]. The placement of the individual characters alpha, beta, and delta is crucial for capturing the prey in this restoration. This behavior is appropriate for a local solution, and the additional criteria that operators must meet when expanding exploration are necessary.
2.5.5
Searching Phase
The random values for P range from 1 to −1 to expand the search phase globally, and the grey wolves are instructed to leave the encircled prey and start looking for better when P > 1. When P < 1 the predator usually moves in the direction of the intended prey, in which the termination occurs once it has been determined that the best possible solution has been found. The pseudocode of the proposed Intelligent wolf optimization is given in Algorithm 1. Algorithm 1 Intelligent wolf optimization 1. Initialize the grey wolf population − → 2. Initialize input value for Z (T) − → − → − → 3. Compute output for Z 1 , Z 2 , Z 3 − → − → − → − → − → 4. Initialize: Z (T), P , S1 , S2 , S3 − − → → −→ − → −−→ 5. Calculate Z (T) = Z 1 (T), Z 2 (T), ...., Z nn (T) where Z nn (T) is the position of nn th wolf 6. Evaluate the fitness value while t < tmax − → 7. Update Z (T) for each wolf 8. Declare Z α , Z β , and Z δ − → 9. Determine Z (T + 1) 10. Calculate t = t + 1 11. End while − → − →− → 12. Return Z 1 , Z 2 , Z 3
COVID-19 Classification of CT Lung Images Using Intelligent Wolf …
255
3 Experimental Setup and Results The proposed Intelligent wolf optimization-deep CNN classifier for the classification of COVID-19 is performed using MATLAB r2021 in Windows 10 OS with 8 GB RAM. The performance validation of the proposed technique is done in terms of accuracy, sensitivity, and specificity by comparing it with various existing methods. Figure 2 shows the experimental results for the COVID-19 images with the significant features of three layers. Figure 3 illustrates the experimental results for the nonCOVID-19 images with the significant features of three layers. The features include the LBP and LOOP with the input CT image’s three layers of lung area, region, and contour.
3.1 Comparative Analysis The existing methods considered for comparing the performance of the Intelligent wolf optimization-based deep CNN classifier is the “fusion-based feature extraction model with the convolutional neural network (FM-CNN) [1], convolutional neural network (CNN) [3], adaptive feature selection-based deep forest (AFS-DF) [13]”, ensemble deep learning classifier [14], deep transfer learning network [10]. The performance analysis of the intelligent wolf optimization-deep CNN classifier and the other reviewed methods is shown in Table 1 considering training–testing ratio of 80–20 and the K-fold value of 10 (Table 2). Chest CT image
Lung area
Lung region
Lung contour
LBP
LOOP
Fig. 2 Experimental results of COVID-19 with significant features
Chest CT Image
Lung area
Lung region
Lung contour
LBP
Fig. 3 Experimental results of non-COVID-19 with significant features
LOOP
256
O. R. Varma and M. Kalra
Table 1 Overview of dataset Name of data set
Description
Data repository and source
COVID-19 CT scans
20 CT scans and expert segmentations of patients with COVID-19
Kaggle, https://www.kaggle.com/ datasets/andrewmvd/covid19-ctscans?select=lung_mask
Table 2 Performance analysis of the intelligent wolf optimization-deep CNN classifier Methods
COVID-19 CT scans Training–testing ratio of 80–20
K-fold value 10
Accuracy %
Sensitivity %
Specificity %
Accuracy %
Sensitivity %
Specificity %
FM-CNN
77.13
73.49
80.90
80.74
77.29
83.89
CNN
77.39
74.11
82.60
81.24
77.46
85.61
AFS-DF
78.53
74.64
83.46
82.23
78.53
86.47
Ensemble deep learning classifier
79.03
75.17
84.31
84.65
78.88
86.85
Deep transfer learning network
79.57
75.71
85.16
87.31
82.65
90.49
Proposed intelligent wolf optimization-deep CNN
85.32
85.74
87.57
88.37
89.47
91.29
Figure 4 shows the accuracy of the intelligent wolf optimization-deep CNN and the existing methods varying the training percentages. The accuracy of the intelligent wolf optimization-deep CNN for the training percentages of 40, 60, and 80 is 78.74%, 81.76%, and 85.32%. The performance improvement in accuracy of the intelligent wolf optimization-deep CNN over FM-CNN, CNN, AFS-DF, ensemble deep learning classifier , and deep transfer learning network is 10.08%, 9.73%, 8.28%, 7.65%, and 6.98% respectively for the training percentage of 80. Figure 5 depicts the sensitivity of the intelligent wolf optimization-deep CNN and the existing methods varying the training percentages. The sensitivity of the intelligent wolf optimization-deep CNN at the training percentages 40, 60, and 80 is 78.18%, 81.58%, and 85.74%. The sensitivity of intelligent wolf optimizationdeep CNN is 15.38%, 14.54%, 13.84%, 13.13%, and 12.43% better than FM-CNN, CNN, AFS-DF, ensemble deep learning classifier, and deep transfer learning network respectively for the training percentage of 80. Figure 6 shows the specificity of the intelligent wolf optimization-deep CNN and the existing methods varying the training percentages. The specificity of the intelligent wolf optimization-deep CNN at the training percentages 40, 60, and 80 is 80.94%, 84.07%, and 87.57%. The performance improvement in specificity of the intelligent wolf optimization-deep CNN over FM-CNN, CNN, AFS-DF, ensemble
COVID-19 Classification of CT Lung Images Using Intelligent Wolf …
257
Fig. 4 Performance analysis with respect to accuracy varying training–testing ratio
Fig. 5 Performance analysis with respect to sensitivity varying training–testing ratio
deep learning classifier , and deep transfer learning network is 7.92%, 5.84%, 4.81%, 3.79%, and 2.79% respectively for the training percentage of 80. Figure 7 shows the accuracy of the intelligent wolf optimization-deep CNN and existing methods for K-fold values of 4, 6, 8, and 10. The accuracy of the intelligent wolf optimization-deep CNN at the K-fold values of 4, 6, 8, and 10 are 82.69%, 84.69%, 86.46%, and 88.37%. The performance improvement in accuracy of the intelligent wolf optimization-deep CNN over FM-CNN, CNN, AFS-DF, ensemble deep learning classifier , and deep transfer learning network is 9.03%, 8.41%, 7.20%, 4.30%, and 1.20% respectively for the K-fold value of 10.
258
O. R. Varma and M. Kalra
Fig. 6 Performance analysis with respect to specificity varying training–testing ratio
Fig. 7 Performance analysis with respect to accuracy taking various K-fold values
Figure 8 represents the sensitivity of the intelligent wolf optimization-deep CNN and the existing methods taking various K-fold values. The sensitivity of the intelligent wolf optimization-deep CNN at the k-fold values 4, 6, 8, and 10 are 82.91%, 85.09%, 87.28%, and 89.47%. The sensitivity of intelligent wolf optimization-deep CNN is 14.61%, 14.39%, 13.01%, 12.57%, and 7.92% better than FM-CNN, CNN, AFS-DF, ensemble deep learning classifier , and deep transfer learning network for the K-fold value of 10. Figure 9 depicts the specificity of the intelligent wolf optimization-deep CNN and the existing methods taking various K-fold values. The specificity of the intelligent
COVID-19 Classification of CT Lung Images Using Intelligent Wolf …
259
Fig. 8 Performance analysis with respect to sensitivity taking various K-fold values
wolf optimization-deep CNN at the K-fold values of 4, 6, 8, and 10 are 84.17%, 86.78%, 88.35%, and 91.29%. The performance improvement in specificity of the intelligent wolf optimization-deep CNN over FM-CNN, CNN, AFS-DF, ensemble deep learning classifier , and deep transfer learning network is 8.44%, 6.41%, 5.42%, 5.39%, and 0.88% respectively for the K-fold of 10.
Fig. 9 Performance analysis with respect to specificity taking various K-fold values
260
O. R. Varma and M. Kalra
4 Conclusion In this research, a deep CNN classifier integrated with intelligent wolf optimization is proposed for the classification of COVID-19 from CT images. The texture features are obtained utilizing texture descriptors from three different regions of the CT chest images, namely the lung region, area, and contour. The proposed classifier uses the texture data to perform the COVID-19 classification. The proposed intelligent wolf optimization-deep CNN shows performance improvement of 6.98– 10.08%, 12.43–15.38%, and 2.79–7.92% for accuracy, sensitivity, and specificity respectively compared to the existing classifiers for the training–testing ratio of 80– 20. The proposed algorithms perform better by achieving 1.20–9.03%, 7.92–14.61%, and 0.88–8.42% more accuracy, sensitivity, and specificity for the K-fold value of 10. In the future, we will implement multi-class classification for bacterial pneumonia, viral pneumonia, and COVID-19 using deep learning.
References 1. Shankar K, Mohanty SN, Yadav K, Gopalakrishnan T, Elmisery AM (2021) Automated COVID-19 diagnosis and classification using convolutional neural network with fusion based feature extraction model. Cogn Neurodyn 0123456789. https://doi.org/10.1007/s11571-02109712-y 2. Mahase E (2020) Coronavirus covid-19 has killed more people than SARS and MERS combined, despite lower case fatality rate. BMJ 368:m641. https://doi.org/10.1136/bmj.m641 3. Singh D, Kumar V, Vaishali, Kaur M (2020) Classification of COVID-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks. Eur J Clin Microbiol Infect Dis 39(7):1379–1389. https://doi.org/10.1007/s10096-020-03901-z 4. Wu J, Wu X, Zeng W, Guo D, Fang Z, Chen L, Huang H, Li C (2020) Chest CT findings in patients with coronavirus disease 2019 and its relationship with clinical features. Invest Radiol 55(5):257–261. https://doi.org/10.1097/RLI.0000000000000670 5. Fang Y, Pang P (2020) Sensitivity of chest CT for COVID-19: comparison to RT-PCR. Radiology 296:15–17 6. Li Y, Xia L (2020) Coronavirus disease 2019 (COVID-19): role of chest CT in diagnosis and management. Am J Roentgenol 214(6):1280–1286. https://doi.org/10.2214/AJR.20.22954 7. Chung M, Bernheim A, Mei X, Zhang N, Huang M, Zeng X, Cui J, Xu W, Yang Y, Fahad Z, Jacobi A, Li K, Li S, Shan H (2020) CT imaging features of 2019 novel coronavirus (2019-NCoV). Radiology 295(1):202–207. https://doi.org/10.1148/radiol.2020200230 8. Li M, Lei P, Zeng B, Li Z, Yu P, Fan B, Wang C, Li Z, Zhou J, Hu S, Liu H (2020) Coronavirus disease (COVID-19): spectrum of CT findings and temporal progression of the disease. Acad Radiol 27(5):603–608. https://doi.org/10.1016/j.acra.2020.03.003 9. Long C, Xu H, Shen Q, Zhang X, Fan B, Wang C, Li Z, Zhou J, Hu S, Liu H (2020) Diagnosis of the coronavirus disease (COVID-19): rRT-PCR or CT? Eur J Radiol 126:108961. https:// doi.org/10.1016/j.ejrad.2020.108961 10. Pathak Y, Shukla PK, Tiwari A, Stalin S, Singh S (2022) Deep transfer learning based classification model for COVID-19 disease. Irbm 43(2):87–92. https://doi.org/10.1016/j.irbm.2020. 05.003 11. Ghosal S, Sengupta S, Majumder M, Sinha B (2020) Linear regression analysis to predict the number of deaths in India due to SARS-CoV-2 at 6 weeks from day 0 (100 cases - March
COVID-19 Classification of CT Lung Images Using Intelligent Wolf …
12.
13.
14. 15.
16. 17. 18.
261
14th 2020). Diabetes Metab Syndr Clin Res Rev 14(4):311–315. https://doi.org/10.1016/j.dsx. 2020.03.017 Tang Z, Zhao W, Xie X, Zhong Z, Shi F, Ma T, Liu J, Shen D (2020) Severity assessment of coronavirus disease 2019 (COVID-19) using quantitative features from chest CT ımages. 2019:1–18. http://arxiv.org/abs/2003.11988 Sun L, Mo Z, Yan F, Xia L, Shan F, Ding Z, Song B, Gao W, Shao W, Shi F, Yuan H, Jiang H, Wu D, Wei Y, Gao Y, Sui H, Zhang D, Shen D (2020) Adaptive feature selection guided deep forest for COVID-19 classification with chest CT. IEEE J Biomed Health Inform 24(10):2798–2805. https://doi.org/10.1109/JBHI.2020.3019505 Li X, Tan W, Liu P, Zhou Q, Yang J (2021) Classification of COVID-19 chest CT images based on ensemble deep learning. J Healthc Eng 2021. https://doi.org/10.1155/2021/5528441 Redie DK, Sirko AE, Demissie TM, Teferi SS, Shrivastava VK, Verma OP, Sharma TK (2023) Diagnosis of COVID-19 using chest X-ray images based on modified DarkCovidNet model. Evol Intell 16(3):729–738. https://doi.org/10.1007/s12065-021-00679-7 Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61. https://doi.org/10.1016/j.advengsoft.2013.12.007 COVID-19 CT scans. Kaggle. https://www.kaggle.com/datasets/andrewmvd/covid19-ct-scans. Accessed 28 Aug 2022 Zhang X, Duan H (2014) Comments on ‘particle swarm optimization with fractional-order velocity.’ Nonlinear Dyn 77(1–2):427–429. https://doi.org/10.1007/s11071-014-1288-2
Parallelization of Molecular Dynamics Simulations Using Verlet Algorithm and OpenMP Preksha Mathur, Hiteshwar Kumar Azad, Sai Harsha Varma Sangaraju, and Ekansh Agrawal
Abstract Molecular Dynamics (MD) simulations provide qualitative insights into the dynamics of liquids, solids, and liquid–solid interfaces under varying temperature and pressure conditions. Accurate force calculations for ion cores are crucial for the success of classical MD simulations. This research explores the application of MD simulations to study large-scale systems and understand atom structure and interactions. MD simulations can successfully examine protein dynamics, such as folding and unfolding, contributing to a better understanding of their behavior. Integrating MD simulations with experimental data enables a holistic examination of atomic-level properties and their impact on cellular behavior. Additionally, MD simulations offer valuable insights into protein–ligand interactions and facilitate drug development. This study introduces an innovative optimization technique that uses the Verlet algorithm with OpenMP to parallelize MD simulations, resulting in significant computing time reductions for force and energy evaluations. This optimization methodology shows promise in various domains, including protein–ligand interactions and complex system investigations. The results show that, when compared to the serialization simulation, the proposed approach can deal with computational load balancing challenges better and more effectively, while also reducing computing time. Keywords Molecular dynamics · Verlet algorithm · Kinetic energy · Potential energy · Optimization technique
1 Introduction In recent years, Molecular Dynamics (MD) simulations [10] have played an increasingly important role in molecular biology and drug discovery[17]. These simulations capture the behavior of each atom in a protein or other molecular system as it travels over time based on inter-atomic interactions [8]. An atomic-level structure is extremely useful and often yields significant information about how the biomolecule P. Mathur · H. K. Azad (B) · S. H. V. Sangaraju · E. Agrawal School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_22
263
264
P. Mathur et al.
Fig. 1 Application of molecular dynamics
functions. Traditionally, it has been observed that dealing with computational load imbalance and also increasing the computation time was quite challenging. Recent hardware advancements have enabled the analysis of systems and timelines that were previously unfeasible. These advancements have also permitted the introduction of new algorithms, and lengthier simulations have enabled more precise calibration against experimental data [13]. However, using cutting-edge methods and effective parallel implementations on advanced computing hardware can boost simulation performance while decreasing computational load balance, albeit at a higher cost. MD simulations have a wide range of applications [5] in the study of biomolecular systems, including membrane structure and permeability, lipid–protein, lipid–drug, protein–ligand interactions, etc. Figure 1 depicts the application of MD simulations to determine the structure of a protein. This study employs MD techniques to acquire better understanding of surface disorder and pre-melting [4] of molecules while also reducing computing time. Specifically, it investigates how the structure and vibration dynamics change with temperature at the surfaces of face-centered cubic (FCC) metals, notably Ag, Cu, and Ni. Furthermore, it involves the exchange of results from different theoretical and experimental approaches. While MD simulations have been extensively applied to study a wide range of surfaces, including semiconductors, insulators, alloys, glasses, and liquids, the primary focus of this investigation is on metal surfaces. However, conducting a comprehensive analysis of the technique’s benefits and limitations in relation to these intriguing systems is beyond the scope of this research. The objective of this article is to parallelize the sequential Molecular dynamics simulation, aiming to reduce the computational load balancing to minimize the computing time. The primary focus is on parallelizing the compute and update functions, which are responsible for calculating the potential and kinetic energies of atoms. By parallelizing these functions, we anticipate a significant reduction in the overall computation time, leading to improved efficiency in the MD simulation.
Parallelization of Molecular Dynamics Simulations …
265
1.1 Problem Statement Time scale simulations in molecular dynamics require small time increments for stability, with two-time steps (.10−15 s) approximating one larger time step. These simulations cover nanoseconds (.10−9 s) to microseconds (.10−6 s), milliseconds (.10−3 s), and beyond, capturing vital protein structural changes. Events in picosecond (.10−12 s) to femtosecond (.10−15 s) range demand countless sequential time steps, posing significant computational demands. While recent computing advancements enable microsecond simulations, extending timescales necessitates algorithmic enhancements and parallel computing, often using specialized hardware like GPUs. Force fields, despite achieving a score of 0, still exhibit limitations. Force fields [3] are pivotal in molecular mechanics, but challenges endure even with notable progress. Lower scoring force fields better align with simulations and experimental data. However, even top-performing force fields exhibit deficiencies. Confidence in simulations grows through repeated trials and a profound understanding of covalent bond formation and breakage. Covalent bonds in proteins typically remain stable once formed, but specific bonds like cysteine disulfide bonds and proton transfers in extreme pH undergo more frequent changes.
1.2 Organization This article follows a structured approach. Section 2 presents a comprehensive literature survey, examining various parallelization techniques in Molecular Dynamics (MD) simulations. In Sect. 3, we introduce our proposed research on parallelizing molecular dynamics simulations using the Verlet algorithm and leveraging the power of OpenMP for efficient computation. Section 4 discusses the experimental results and their in-depth analysis, showcasing the benefits of the parallelization approach. Finally, in Sect. 5, we draw conclusions from the findings, emphasizing the significance of our research in enhancing the efficiency and performance of MD simulations through parallel computing.
2 Literature Review Molecular dynamics simulations have a lengthy history in the scientific literature. However, it wasn’t until recently that molecular dynamics was able to achieve temporal scales consistent with biological processes. It has recently received increased attention as a result of enhanced computing hardware and excellent parallel implementations. To overcome the inadequacies of existing MDs, researchers have offered a number of solutions and improvements. In this context, Yin et al. [21] proposed a parallelization and optimization framework for MD simulations on Many Integrated
266
P. Mathur et al.
Core (MIC) Architectures, which represents a co-processor technology enhancing CPU-based systems. By implementing OpenMP multi-threading and vectorization techniques, they achieved substantial speedup and efficiency compared to CPU-only implementation. Addressing the challenge of load imbalance in spatial domain decomposition, Morillo et al. [12] explored the benefits of using a hybrid MPI+OpenMP model. Through extended OpenMP implementation and optimization in the widely used MD software LAMMPS, they demonstrated the superiority of their hybrid approach over other balance mechanisms in terms of performance and scalability. These findings highlight the effectiveness of parallel computing in addressing computational challenges in MD simulations. Turning the focus towards GPU clusters, Wu et al. [20] presented a parallelization and optimization framework for MD simulations. Their work encompassed modeling and testing on four promising MD accelerator platforms, including FPGA-only systems and FPGA-GPU hybrid systems, revealing significant speedup and efficiency gains compared to CPU-based systems. To exploit the performance potential of Intel Xeon Phi coprocessors, Harode et al. [7] investigated the optimization of MD simulations. By utilizing OpenMP multithreading and SIMD instructions, they achieved competitive results against CPU and GPU implementations. Their study exemplifies the diverse hardware platforms that can benefit from parallelization. Taking another approach, Meyer et al. [11] proposed a parallelization of MD simulation with a cell lists algorithm on multi-core systems. By implementing multi-threading using OpenMP and Pthreads, they achieved significant speedup and efficiency, particularly for complex systems with high levels of inhomogeneity. Continuing the focus on MD simulations for biomolecular research, Sedova et al. [15] discussed the optimization of GROMACS for biological systems. They addressed various aspects such as force calculation, integration algorithm, neighbor searching, communication optimization, and load balancing. Their work showcases the significance of parallelization in enabling efficient simulations of complex biomolecular systems. Rovigatti et al. [14] extended the parallelization frontier to heterogeneous systems, presenting an adaptive cell lists algorithm. Leveraging OpenCL for device kernels and MPI for inter-device communication, they demonstrated performance improvements for dynamic systems. This highlights the potential of parallelization in harnessing the capabilities of diverse hardware devices. Shen et al. [16] explored the optimization of MD simulations with the quantum mechanics/molecular mechanics method, offering advantages in terms of speed and accuracy over pure quantum mechanical methods. Their findings showcase the potential of parallelization in enhancing the precision of quantum simulations. Recently, Jung et al. [9] developed a new domain decomposition parallelization technique with effective load balancing to meet computational parallelization challenges.
Parallelization of Molecular Dynamics Simulations …
267
3 Proposed Research The molecular dynamics simulation (MD) is a well-established numerical method that shows how to utilize an appropriate algorithm to solve the classical equations of motion for atoms with a known interatomic potential. The ability to grasp surface structure and behavior qualitatively makes MD simulation techniques ideal for researching surface phenomena. To enhance computational efficiency and expedite the process, parallelization of the serial molecular dynamics simulation will be implemented. The computation of the potential and kinetic energy of atoms is primarily carried out by the calculate and update functions. Therefore, parallelizing these functions is being pursued to accelerate the computation process and achieve the main objective of this study. Additionally, it is expected that the simulation will lead to a reduction in energy error when comparing theoretical predictions with actual observations. While the complete elimination of errors is not feasible, rigorous efforts are being made to minimize errors by avoiding excessive approximation in numerical calculations throughout the simulation.
3.1 Working Methodology The Verlet algorithm [6] is used in this study’s code to determine the kinetic and potential energies of atoms in a molecular dynamic simulation. The program makes it possible to forecast atoms’ motion properties at each time step, including their velocity, location, and acceleration. It guarantees that the system’s total energy—the kinetic and potential energies added together—remains constant and equal. By calculating the difference between total energy and initial energy, the code also provides error analysis, displaying the percentage of computation errors. For the purpose of analyzing atomic-level behavior, molecular dynamics simulations frequently employ the Verlet algorithm and GROMACS [1]. Long-term simulations can benefit from the great accuracy and energy efficiency of the Verlet algorithm, a symplectic integrator. For complicated or large-scale simulations, the Verlet algorithm might not be as computationally efficient as GROMACS. There are several algorithms that are commonly used in parallelizing molecular dynamics simulations besides the Verlet algorithm like Leapfrog Integrator, Langevin Dynamics, Multiple Time-Stepping Algorithm, Cell Lists and Spatial Decomposition, Fast Multipole Method, Parallel Replica Exchange Molecular Dynamics, Tree-Based Algorithms, and Molecular Mechanics with Generalized Born and Surface Area. The selection between the two alternatives is driven by factors such as study objectives, system complexity, and available computing resources. To enhance the understanding of the MD simulation, the flowchart of the simulation code is presented in Fig. 2. – Compute: The COMPUTE function calculates the forces and energies exerted on atoms.
268
P. Mathur et al.
Initialize
Forces
Motion
Analysis
Summarize
Set up the initial conditions of the simulation, such as the positions and velocities of the atoms in the system.
Calculate the forces acting on each atom in the system. This step involves computing the potential energy of the system, which depends on the positions of the atoms and their interactions with each other.
Use Newton's equations of motion to update atom positions and velocities, solving a set of differential equations describing their time-dependent changes.
Analyze simulation results, including atom trajectories and thermodynamic properties. Compute various quantities such as temperature, pressure, and energy to describe the system's behavior.
Summarize the results of the simulation in a clear and concise manner. This step involves presenting the results in a way that is easy to understand and interpret, such as by creating graphs or tables.
Fig. 2 Flowchart of MD Simulation
– Initialize: The INITIALIZE function sets the initial positions, velocities, and accelerations of atoms. – Timestamp: The TIMESTAMP function generates a time stamp in the format YMDHMS, representing the current date and time. – Update: The UPDATE function modifies the positions, velocities, and accelerations of atoms during each time step. – Displacement: The DISPLACEMENT function determines the displacement between two particles. – CPU Time: The CPU_TIME function provides information about the elapsed CPU time during the simulation. Compute and update functions are identified as the most time-consuming functions in the application code, as they involve computation and solving mathematical equations. Therefore, they are targeted for parallelization using parallel functions such as pragma omp parallel, pragma omp reduction, pragma omp for, pragma omp shared, and pragma omp private. The time difference between the serial code and parallel code will be a major factor in evaluating the effectiveness of the parallelization in reducing the time taken for molecular dynamics simulations. Domain decomposition methods, which involve solving independent
Parallelization of Molecular Dynamics Simulations …
269
problems on subdomains, are identified as suitable for parallel computing to further optimize the performance of the simulations.
3.2 Computation of Atomic Forces A molecular mechanics force field is used to calculate the forces acting on each atom at each discrete time step, which must be no more than a few femtoseconds (10–15 s) in length. Using Newton’s equations of motion, we update each atom’s position and velocity whenever its position changes. Newton’s second law: . F = ma (1) where . F is the force on an atom, .m is the mass of the atom, and .a is the atom’s acceleration. Recall that: . F(x) = −∇U (x) (2) where .U is the potential energy function and .x corresponds to the coordinates of every atom. Acceleration is the derivative of velocity .v, which is the derivative of position. dx dv F(x) (3) . =v & = dt dt m A set of ordinary differential equations make up this system. We have .3n velocity coordinates and .3n location coordinates for .n atoms. Numerical solutions are simple, while analytical (algebraic) solutions are challenging. The open numerical solution with a time step of .δt is as follows: The numerical open solution using a time step of .δt is given by: x
. i+1
v
. i+1
= xi + δt · vi
= vi + δt ·
F(xi ) m
(4)
(5)
In practice, “time-symmetric” integration techniques like “Leapfrog Verlet” are used: x
. i+1
v
= xi + δt ·
. i+1/2
vi+1/2 + vi−1/2 2
= vi−1/2 + δt ·
F(xi ) m
This gives more accuracy in the numerical solution.
(6) (7)
270
P. Mathur et al.
4 Experiment Result and Analysis Simulations are a widely used tool in computational chemistry and materials science to investigate the dynamics and behavior of molecules and materials. MD simulations involve the calculation of forces and energies on each atom in the system at every time step, which can be computationally expensive and time-consuming. To address this computational challenge, parallelization techniques have been developed to take advantage of the parallel processing power of modern computer systems. Parallelization involves dividing the MD simulation into smaller tasks that can be executed concurrently on multiple processors or cores. The OpenMP parallel programming model is a popular technique for parallelizing MD simulations. In this study, the performance benefits of parallelizing MD simulations using OpenMP were explored. A series of experiments were conducted to compare the execution times of serial computation and parallel computation using different numbers of processors. The experiments were performed on a computer system with four physical cores and eight logical cores. Time comparison analysis for molecular dynamics simulation utilizing parallel and serial code implementations in this work are proposed. As referred in the Table 1, the results emphasize the enormous speedup gained by using parallel processing, revealing its potential to dramatically accelerate molecular dynamics simulations and improve computing efficiency in bioinformatics research. The results demonstrated that parallelization can significantly reduce the overall evaluation time of MD simulations. The execution time was reduced by 84.88% when comparing serial computation to parallel computation. These findings highlight the potential of parallelization techniques to enhance the efficiency of MD simulations, enabling the exploration of larger and more complex systems. Table 1 presents the performance outcomes of parallelizing molecular dynamics simulations across varying processor counts. The “Processors” column indicates processor numbers and the “Time” column denotes execution times in seconds. Enhanced efficiency accompanies higher processor counts, as evidenced by decreased execution times. The “Performance” column displays speedup relative to single-processor serial computation, with values greater than 100% indicating superior parallel performance. With 8 processors, speedup reached 183.1%, rendering the simulation about 2.83 times faster. Moreover, multi-threading improved parallel computation. Four threads yielded the quickest execution (13.360 s), a significant gain over the 79 s in serial computation.
Table 1 System performance based on time (in sec) and number of processors No. of processors Serial time Parallel time Performance 1 2 4 8
79.036
57.061 33.698 25.056 13.360
127.8% .↑ 157.4% .↑ 168.3% .↑ 183.1% .↑
Parallelization of Molecular Dynamics Simulations …
271
Fig. 3 System performance based on time and number of processors
Fig. 4 Comparative analysis of parallel and serial computation
Execution times dwindled from 57.061 s (one thread) to 25.056 s (three threads). Figures 3 and 4 depict system performance in parallel (varying processors) versus serial computation. These insights hold significance for computational chemistry and materials science. Figure 5 depicts the number of processors and the time in percentage that they take for parallel processing computing. By enabling more efficient MD simulations, parallelization techniques empower researchers to investigate larger and more complex systems, leading to new insights into molecular and material behavior. Such advancements have significant applications in areas like drug discovery, materials design, and other scientific and technological domains.
272
P. Mathur et al. Parallel Processing Computation Time (sec) P8 10.3% 13.36
P4 19.4%
25.056
P1 44.2%
57.061
33.698 P2 26.1%
Fig. 5 Parallel processing computatuion time (sec) 45 40
39.52 35.92
Time (in seconds)
35 30
26.87
25 20
15.81
15
14.22
13.72
Watanabe et al. [19]
Proposed Approach
10 5 0 Buchholz et al. [2] Morillo et al. [12]
Watanabe et al. [18]
Meyer et al. [11]
Time(sec)
Fig. 6 Comparative analysis of the proposed approach with state-of-the-art methods
We have compared our proposed approach with best cases (8 processors) with cutting-edge articles such as Meyer et al. [11], Morillo et al. [12], Buchholz et al. [2], Watanabe et al. [18, 19]. Figure 6 shows the comparative analysis of the proposed approach with state-of-the-art approaches.
5 Conclusion This article focused on the application of molecular dynamics (MD) simulations to study the dynamics and behavior of molecules and materials. The study demonstrated the effectiveness of MD simulations in providing valuable qualitative insights into
Parallelization of Molecular Dynamics Simulations …
273
surface phenomena, protein dynamics, and protein-ligand interactions. The accuracy of force calculations and energy evaluations in MD simulations significantly influences their success in capturing atomistic properties and behaviors. To address the computational challenges associated with MD simulations, the research explored the benefits of parallelizing the simulations using the OpenMP parallel programming model. The experimental results showcased the remarkable performance improvements achieved through parallelization. By distributing the computational workload across multiple threads, the evaluation time of MD simulations was significantly reduced, enabling the exploration of larger and more complex systems. The findings highlighted the potential of parallelization techniques to enhance the efficiency and effectiveness of MD simulations. These advancements have important implications for computational chemistry and materials science, offering researchers the ability to investigate molecular and material behavior on a larger scale and with greater accuracy. The findings demonstrate that our proposed approach effectively addresses computational load balancing challenges, leading to reduced computing time compared to the serial simulation.
References 1. Berendsen HJ, van der Spoel D, van Drunen R (1995) Gromacs: a message-passing parallel molecular dynamics implementation. Comput Phys Commun 91(1–3):43–56 2. Buchholz M, Bungartz HJ, Vrabec J (2011) Software design for a highly parallel molecular dynamics simulation framework in chemical engineering. J Comput Sci 2(2):124–129 3. Chmiela S, Vassilev-Galindo V, Unke OT, Kabylda A, Sauceda HE, Tkatchenko A, Müller KR (2023) Accurate global machine learning force fields for molecules with hundreds of atoms. Sci Adv 9(2):eadf0873 4. Fan X, Pan D, Li M (2020) Rethinking lindemann criterion: a molecular dynamics simulation of surface mediated melting. Acta Mater 193:280–290 5. Filipe HA, Loura LM (2022) Molecular dynamics simulations: advances and applications. Molecules 27(7):2105 6. Grubmüller H, Heller H, Windemuth A, Schulten K (1991) Generalized Verlet algorithm for efficient molecular dynamics simulations with long-range interactions. Mol Simul 6(1–3):121– 142 7. Harode A, Gupta A, Mathew B, Rai N (2014) Optimization of molecular dynamics application for intel xeon phi coprocessor. In: 2014 International conference on high performance computing and applications (ICHPCA), pp 1–6. IEEE 8. Hollingsworth SA, Dror RO (2018) Molecular dynamics simulation for all. Neuron 99(6):1129– 1143 9. Jung J, Tan C, Kobayashi C, Ugarte D, Sugita Y (2023) Acceleration of residue-level coarsegrained molecular dynamics by efficient parallelization. Biophys J 122(3):425a 10. Karplus M, Petsko GA (1990) Molecular dynamics simulations in biology. Nature 347(6294):631–639 11. Meyer R (2013) Efficient parallelization of short-range molecular dynamics simulations on many-core systems. Phys Rev E 88(5):053309 12. Morillo J, Vassaux M, Coveney PV, Garcia-Gasulla M (2022) Hybrid parallelization of molecular dynamics simulations to reduce load imbalance. J Supercomput:1–32 13. Páll S, Zhmurov A, Bauer P, Abraham M, Lundborg M, Gray A, Hess B, Lindahl E (2020) Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS. J Chem Phys 153(13)
274
P. Mathur et al.
14. Rovigatti L, Šulc P, Reguly IZ, Romano F (2015) A comparison between parallelization approaches in molecular dynamics simulations on GPUs. J Comput Chem 36(1):1–8 15. Sedova A, Eblen JD, Budiardja R, Tharrington A, Smith JC (2018) High-performance molecular dynamics simulation for biological and materials sciences: challenges of performance portability. In: 2018 IEEE/ACM international workshop on performance, portability and productivity in HPC (P3HPC), pp 1–13. IEEE 16. Shen L, Yang W (2018) Molecular dynamics simulations with quantum mechanics/molecular mechanics and adaptive neural networks. J Chem Theory Comput 14(3):1442–1455 17. Vuillemot R, Mirzaei A, Harastani M, Hamitouche I, Fréchin L, Klaholz BP, Miyashita O, Tama F, Rouiller I, Jonic S (2023) Mdspace: extracting continuous conformational landscapes from cryo-EM single particle datasets using 3D-to-2D flexible fitting based on molecular dynamics simulation. J Mol Biol 435(9):167951 18. Watanabe H, Suzuki M, Ito N (2011) Efficient implementations of molecular dynamics simulations for lennard-jones systems. Progress Theoret Phys 126(2):203–235 19. Watanabe H, Suzuki M, Ito N (2013) Huge-scale molecular dynamics simulation of multibubble nuclei. Comput Phys Commun 184(12):2775–2784 20. Wu C, Bandara S, Geng T, Sachdeva V, Sherman W, Herbordt M (2021) System-level modeling of GPU/FPGA clusters for molecular dynamics simulations. In: 2021 IEEE high performance extreme computing conference (HPEC), pp 1–8. IEEE 21. Yin Q, Luo R, Guo P (2012) Parallelization and optimization of molecular dynamics simulation on many integrated core. In: 2012 Eighth international conference on computational intelligence and security, pp 209–213. IEEE
Prediction of HDFC Bank Stock Price Using Machine Learning Techniques Yogesh Gupta
Abstract Stock market prediction is one of the research areas which demands more accuracy because of the involvement of money. A more accurate prediction of a stock’s price increases the gain of investors. In this paper, HDFC bank’s stock prices are predicted using machine learning techniques. The main target of this work is to predict the next day’s opening price based on open price (the price at which the stock opened on a specific day), high price (the highest price of the stock on a specific day), low price (the lowest price of the stock on a specific day), close price (the price at which the stock closed on that specific day), volume (number of transactions that occurred for the company, i.e. HDFC bank on a specific day), 5 DMA (5 days moving average of the opening price), 10 DMA (10 days moving average of the opening price), 20 DMA (20 days moving average of the opening price), and 50 DMA (50 days moving average of the opening price). Furthermore, a comparative study is presented to ascertain which moving average yields improved accuracy. Keywords Prediction · Machine learning · Accuracy · Average price · Regression
1 Introduction Stock price prediction has been a serious topic for years because of the involvement of money. It is much more difficult to predict any stock price due to the random-walk behavior of that stock. Predicting stock prices is no easy task as they are affected by so many factors such as positive or negative company news, public or investor sentiment, economic factors, and activities in related markets. Hence, it makes sense to apply machine learning models. Thus, it is obvious that a successful prediction of a stock’s future price would boost an investor’s gains and generate a sizable profit. Researchers have recently applied Artificial Neural Networks (ANNs) [1, 2], Support Vector Machine (SVM) [3], and other machine learning approaches to forecast the stock’s forthcoming prices. These methods typically have an over-fitting issue. The Y. Gupta (B) School of Engineering and Technology, BML Munjal University, Gurugram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_23
275
276
Y. Gupta
vast number of parameters that need to be adjusted and the scant prior knowledge of the applicability of inputs are the major causes of this issue [4]. A few researchers also used evolutionary algorithms (EAs) with machine learning techniques for predicting stock prices [5–7]. Guo et al. [8] used Particle Swarm Optimization (PSO) with SVM on time series data forecasting. Xie [9] proposed an optimized SVM to predict share price. Huang et al. [10] presented a new wavelet kernel SVM for financial time series prediction. A few researchers also used computational intelligence approaches to predict stock processes [11, 12]. Bernardo [13] proposed a type-2 Fuzzy Logic-based approach for forecasting of stock processes. Ochotorena [14] presented a stock price prediction method based on Decision Tree. Fazel et al. [15] also presented a new hybrid fuzzy intelligent agent-based approach to predict stock prices. Hu et al. [16] proposed an enhanced sine and cosine approaches to improve the neural network using optimized weights for stock market predictions. Torres et al. [17] predicted the forthcoming stock price of Apple Inc. using different machine learning techniques. They used random trees and multilayer perceptron techniques for prediction. Pathak et al. [18] presented a combination of multiple machine learning approaches for predicting the Indian stock market. Akbar [19] analyzed the performance of machine learning techniques for stock market prediction. Pahwa et al. [20] presented a review of using machine learning techniques in stock market prediction. Gupta et al. [21] presented a machine learning and statistical based study for vaccine hesitancy. Gupta et al. [22] demonstrated diabetes prediction using deep learning techniques. They also compared quantum-based machine learning with deep learning in their work. Gupta et al. [23] developed machine learning techniques to select the important features and to predict the wine quality. The authors performed all the experiments on Red and White wine datasets. Gupta et al. [24] applied machine and deep learning techniques to determine the impact of various weather factors on COVID-19 spread. In this paper, we are predicting next-day opening price of HDFC bank stock using machine learning techniques. This paper also includes the analysis to ascertain which moving average yields improved accuracy. With the help of this study, stock market analysts no longer need to spend as much time predicting how the stock price will change in the near future. The movement of the trend can also help to analyze the movement of the stock price in terms of whether the stock prices will go up or down. This research also analyzes the number of days to be used to compute the moving average so that a greater accuracy is achieved.
2 Dataset Description To perform all the experiments, we collected the data from Yahoo Finance [16] to get authentic and accurate data. The data is collected for HDFC bank only. Different days moving averages and the opening price for the next day are computed programmatically. The dataset encompasses various attributes such as date, highest and lowest
Prediction of HDFC Bank Stock Price Using Machine Learning Techniques
277
Table 1 Ten sample records of the used dataset Date
Open
High
Low
Close
Volume
5 DMA
10 DMA
20 DMA
50 DMA
Next day open
2018/ 01/01
1873
1881
1851
1855
1,645,129
1870
1872
1866
1847
1859
2018/ 01/02
1859
1875
1859
1872
1,194,079
1870
1870
1869
1848
1875
2018/ 01/03
1875
1878
1851
1853
1,132,822
1869
1871
1873
1850
1853
2018/ 01/04
1853
1866
1853
1860
593,444
1865
1868
1872
1852
1863
2018/ 01/05
1863
1868
1856
1864
717,717
1865
1866
1872
1853
1865
2018/ 01/06
1865
1871
1858
1861
1,142,577
1860
1865
1872
1854
1862
2018/ 01/07
1862
1870
1855
1864
1,326,382
1863
1864
1869
1855
1865
2018/ 01/08
1865
1868
1857
1864
1,153,972
1864
1864
1868
1856
1864
2018/ 01/09
1864
1876
1856
1873
1,014,774
1864
1863
1867
1856
1873
2018/ 01/10
1873
1878
1860
1865
1,084,177
1866
1965
1867
1857
1870
price of the stock, opening and closing price of the stock, volume for the day, different days moving average prices (5 to 50 days, 10 to 20 days, 20 to 50 days), and opening price of the stock for the next day. The last attribute, i.e. opening price of the stock for the next day, is used as the target variable in this work, and all the remaining attributes are considered as predictors or input variables. Different days moving averages are used as features in different cases. Table 1 shows the 10 sample records of the dataset.
3 Experiments and Result Analysis As mentioned above, HDFC Bank’s stock prices are considered in this paper. The goal is to forecast the opening price for the following day, using various factors such as a stock opening price on a specific day, highest and lowest prices it reached on that day, stock closing price on the same day, volume of transactions conducted for the company, i.e. HDFC bank on that day, as well as the 5-day, 10-day, 20-day, and 50-day moving averages of the opening price. The data for prediction is collected through Python version 3.5.3 using Pandas and NumPy libraries. SKlearn and Keras libraries are used to implement machine learning techniques. Pandas and Matplotlib libraries are used for Data Analysis. It is important to note that stock prices prediction
278
Y. Gupta
is not completely dependent on previous dataset. There are various factors already discussed which have direct and indirect roles in influencing the prices of a stock. The dataset used in this work is stored in MySQL database to ease the querying and data formation part. All the moving averages are computed using MySQL query. The query for the same is mentioned below. Linear Regression is applied to forecast next day stock opening price by considering different days moving averages one at a time. A comparative study is also presented to ascertain which moving average yields improved accuracy. Graphs for this comparative analysis can be found below. To ensure that the data is authentic and accurate, data is collected from Yahoo Finance. The data that is prepared using MySQL queries is stored in a separate csv file. The visualizations for different parts of the project are completed using matplotlib in Python 3.
3.1 System Architecture The work done in this paper can be classified into four steps as shown in Fig. 1. This figure clearly shows that the data is collected in the initial phase and then the data is sent through the stage of pre-processing where it is split into testing and training data. The ratio is taken as 70:30, which means that 70% of the data is used for training and the rest 30% of data is used for testing. Further, this pre-processed data is passed to the application layer, where various algorithms do their magic and then the results are predicted and displayed in the form of various plots.
3.2 Visualizations The visualization is performed using Japanese Candlestick Pattern [19] and an additional linear chart to show the moving average price of 5 days. This plot demonstrates the entire data since 2015 as shown in Fig. 2. It plots the OHLC and 5 DMA. Since this graph is difficult to understand, this can be seen (also mentioned below). This graph plots the OHLC price and 5 DMA. This graph plots the data for the latest
Fig. 1 The used System architecture
Prediction of HDFC Bank Stock Price Using Machine Learning Techniques
279
Fig. 2 Japanese Candlestick Pattern for entire dataset
100 days as shown in Fig. 3. A positive day is represented by the color green, while a negative day is denoted by the color red. Procedural Steps The following steps are used for analysis: 1. Predicted opening price for the next day by linear regression. 2. Value of next day opening on the basis of 5 DMA, 10 DMA, 20 DMA, 50 DMA, respectively. Generally, this 5 DMA is also used as a method to predict the next day open.
Fig. 3 Japanese candlestick pattern for last 100 days
280
Y. Gupta
3. Determined the actual opening price for the next day. We have determined the opening price of HDFC Bank for the next day using different criteria, i.e. different days moving averages as shown in Figs. 4, 5, 6, and 7. Figure 4 presents the results for actual value and predicted values using 5 days moving average. This figure clearly depicts that prediction using 5 DMA gets better
Fig. 4 Result of linear regression model using 5 days moving average. The graph plots the prices of actual price, 5 days moving average, and the predicted price by linear regression model. An accuracy of 88% is achieved
Fig. 5 Result of linear regression model using 10 days moving average. The graph plots the prices of actual price, 10 days moving average, and the predicted price by linear regression model. An accuracy of 87% is achieved
Prediction of HDFC Bank Stock Price Using Machine Learning Techniques
281
Fig. 6 Result of linear regression model using 20 days moving average. The graph plots the prices of actual price, 20 days moving average, and the predicted price by linear regression model. An accuracy of 85% is achieved
Fig. 7 Result of linear regression model using 50 days moving average. The graph plots the prices of actual price, 50 days moving average, and the predicted price by linear regression model. An accuracy of 84% is achieved
accuracy as compared to others. Similarly, Figs. 5, 6, and 7 show the results for 10, 20, and 30 days moving averages, respectively.
282
Y. Gupta
Fig. 8 Result of linear regression model using different days moving averages of the stock. The graph plots the prices of actual price, 5 days moving average, and the predicted price by all linear regression models
Table 2 Accuracies of linear regression model with respect to different days moving averages
Moving average used
Accuracy (%)
5 days
88
10 days
87
20 days
85
50 days
84
Figure 8 presents the comparison of all results and depicts that 5 DMA gets better results. Table 2 tabulates the accuracy for all conditions using the Linear Regression model. This table shows that 5 days moving average gets an accuracy of 88%, which is the highest among all.
4 Conclusion In this paper, the stock price of HDFC Bank for the next day is predicted using linear regression using different days moving averages. Though it is practically impossible to predict stock prices at an accuracy of 100%, we tried to get better results using various factors like Open, Close, High, Low, and moving averages. From the analysis, it is clear that the maximum possible accuracy is achieved by the developed model, with 88% using 5 days moving averages. In future, more features may be added to the historical stock prices, which may increase the efficiency of the prediction model.
Prediction of HDFC Bank Stock Price Using Machine Learning Techniques
283
References 1. Olivier C (2007) Neural network modeling for stock movement prediction, state of art. Blaise Pascal University, Thesis 2. Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17:113–126 3. Vapnik V (1999) The nature of statistical learning. 2nd edition, Springer 4. Leng X, Miller HG (2004) Input dimension reduction for load forecasting based on support vector machines. In: IEEE international conference on electric utility deregulation, restructuring and power technologies 5. Carlos A, Gary BL, David A (2007) Evolutionary algorithms for solving multi-objective problems. Springer 6. Wilke DN (2005) Analysis of the particle swarm optimization algorithm. Master’s Dissertation. University of Pretoria 7. Khalil AS (2001) An investigation into optimization strategies of genetic algorithms and swarm intelligence. Artificial Life 8. Guo Z, Wang H, Liu Q (2012) Financial time series forecasting using LPP and SVM optimized by PSO. Soft computing methodologies and applications 9. Xie G (2011) The optimization of share price prediction model based on support vector machine. International conference on control, automation and systems engineering (CASE), 1–4 10. Huang C, Huang L, Han T (2012) Financial time series forecasting based on wavelet kernel support vector machine. Eighth international conference on natural computation (ICNC) 11. Jui Y (2012) Computational intelligence approaches for stock price forecasting. IEEE international symposium on computer, consumer and control (IS3C), 52–55 12. Ling J, Chih C, Chi L, Chih H (2012) A hybrid approach by integrating wavelet-based feature extraction with MARS and SVR for stock index forecasting. Decision support system 13. Bernardo D (2012) An interval type-2 fuzzy logic based system for model generation and summarization of arbitrage opportunities in stock markets. In: 12th IEEE computational intelligence workshop, 1–7 14. Ochotorena CN (2012) Robust stock trading using fuzzy decision trees. In: IEEE conference on computational intelligence for financial engineering & economics (CIFE), 1–8 15. Fazel Z, Esmaeil H, Turksen B (2012) A hybrid fuzzy intelligent agent-based system for stock price prediction. Int J Intell Syst 27(11):947–969 16. Hu H, Li T, Zhang S, Wang H (2018) Predicting the direction of stock markets using optimized neural networks with Google Trends. J Neuro Comput 285(12):188–195 17. Torres EP, Alvarez MH, Torres E, Yoo SG (2019) Stock market data prediction using machine learning techniques. In: proceedings of ICITS 2019, 539–547 18. Pathak A, Shetty NP (2018) Indian stock market prediction using machine learning and sentiment analysis. Computational intelligence in data mining, 595–603 19. Akbar SI (2019) Analysis on Stock Market Prediction Using Machine Learning Techniques. Int J Creative Innov Res All Studies 1(8):30–34 20. Pahwa NS, Khalfay N, Soni V, Vora D (2017) Stock prediction using machine learning a review paper. Int J Comput Appl 163(5):36–43 21. Gupta H, Verma OP (2023) Vaccine hesitancy in the post-vaccination COVID-19 era: a machine learning and statistical analysis driven study. Evol Intel 16(3):739–757 22. Gupta H, Varshney H, Sharma TK, Pachauri N (2022) Comparative performance analysis of quantum machine learning with deep learning for diabetes prediction. Complex Intell Syst 8(4):3073–3087 23. Gupta Y (2018) Selection of important features and predicting wine quality using machine learning techniques. Procedia Comput Sci 125:305–312 24. Gupta Y, Raghuwanshi G, Ahmadini A, Sharma U, Mishra AK, Mashwani WK, Goktas P, Alshqaq S, Balogun O (2021) Impact of weather predictions on covid-19 infection rate by using deep learning models. Complexity 5520663:1–11
Real-Time Applicability Analysis of Lightweight Models on Jetson Nano Using TensorFlow-Lite Kamath Vidya, A. Renuka, and J. Vanajakshi
Abstract Deep learning models have recently acquired prominence due to their adaptability to constrained devices. Because of this possibility, a significant number of studies in the fields of IoT and Robotics are being done with the goal of deploying deep learning models on resource-constrained applications. A variety of lightweight models are now available that can perform computer vision tasks on constrained devices including the Jetson Nano. However, several enhancements are still needed if this field of research has to prosper in the future. This study was carried out with the aim of comparing and contrasting the lightweight models provided by TensorFlow in order to assess them and ascertain how close they are to practical reality. The conclusions not only present the observed outcomes but also provide insight into the models, attempting to identify potential improvements. Keywords Computer vision · Deep learning · Jetson nano · Lightweight · Real-time · Resource-constrained
1 Introduction Deep learning models have become immensely popular in the modern world due to the rise in the number of models being proposed as well as their enhanced application potential. These models have several prospective applications in the fields of IoT (Internet of Things) and robotics, mostly for computer vision tasks. However, these domains happen to be where devices with constrained resources have to be employed [1]. Because of the availability of pretrained models on popular publicly available datasets that can be used in any application via transfer learning, the initial problem of these models being too large to deploy and requiring a huge amount of data for training, has been considerably resolved [2]. K. Vidya · A. Renuka (B) · J. Vanajakshi Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_24
285
286
V. Kamath et al.
Many lightweight models that have been pre-trained on the Imagenet [3] dataset are available for use. These models are available in TensorFlow Lite(.tflite) formats, which may be executed on a TensorFlow Lite runtime interpreter. This interpreter is intended to work on constrained devices such as Android/ IOS Smartphones, Raspberry Pi, and Jetson Nano boards. However, how accurate these models are for immediate application without re-training is unclear. This is while assuming that the application target classes are one of the available Imagenet class labels, which avoids the need for transfer learning. Several factors must be considered while assessing the practical applicability of these pretrained models. When using constrained devices in real time, the model’s speed in frames per second becomes the most important factor. The ease of deploying models by simply downloading them from the cloud and using a runtime interpreter looks to be too simple, and it attracts a significant number of researchers who desire to work on constrained devices. When attempting to achieve speeds, the accuracy of these models is frequently compromised. As a result, analysis of these models in practical situations might be quite valuable for those who want to try this sort of model deployment. The trade-off is an essential drawback for these models when the target applications demand a high level of accuracy as well as real-time performance in terms of speed. While some researchers are determined to create new models in addition to the open-source models [4], they may find it difficult to match these pretrained models due to the library support provided with them. Developing new models may be difficult without innovative concepts in model designs that focus on overcoming the challenge of constrained device deployments. The purpose of this work is to analyze the practical applicability of the lightweight models available in the TensorFlow model hub [11] on a Jetson Nano board. A summary of the model’s architecture in terms of depth and peak memory is also provided, which may be useful for those seeking to develop similar models. The background on the framework is given in Sect. 2, followed by Sect. 3 Methodology. The Results and Discussion are covered in Sect. 4 followed by Sect. 5: Conclusion and Future Scope.
2 Background Several works have been published that attempt to analyze the solutions proposed by pretrained models [1, 4–7]. Real-time image recognition using Jetson Nano is possible using several frameworks [2, 8] which include TensorFlow, PyTorch, Caffe, and TinyML to name a few. Use of TensorRT has become increasingly popular due to the ability to optimize models in this format [7]. The model can be translated into ONNX for use on the TensorRT runtime engine when using PyTorch [2]. NVIDIA’s Jetson Inference API and the use of stored models in PyTorch in ‘.pth’ format are two other possibilities. Because of the change in operators, different input preprocessing,
Real-Time Applicability Analysis of Lightweight Models …
287
different rounding modes, or different operator implementations, converting a model from one type to another can usually result in accuracy loss [9, 10]. When using TensorFlow, the model can be downloaded in ‘.pb’ format if retraining is essential and the downloaded ‘.pb’ model can be converted to a ‘.tflite’ model for deployment on constrained devices [5]. However, TensorFlow also offers the pretrained models on Imagenet [3], directly in ‘.tflite’ format to be deployed in applications that do not require further training. This is the approach adopted in this paper.
3 Methodology This work attempts to answer the questions posed, in a pragmatic way by choosing to deploy existing pretrained models in ‘.tflite’ format on a Jetson Nano board. The overall methodology followed is shown in Fig. 1. A system is used to store the models, which are then run through the TensorFlow Lite Model Analyzer, which helps to study the architecture of these models by providing details of the operations performed at each layer of the model, as well as insight into the peak memory used by the model during execution. The models are directly deployed on the device without any retraining for image recognition, which is the simplest task in computer vision. The Jetson Nano is used along with a Raspberry Pi v2 camera module (as shown in Fig. 2) to capture real-time streaming using GStreamer. The precision of these Imagenet-pretrained models is used to determine the top-5 probabilities to decide the final image recognized. The captured frames are fed into the interpreter, which performs the recognition task and displays the top-5 accuracy along with the class names. Correct predictions are determined by the class with the highest accuracy. The results are then documented in terms of FPS speed as well
Fig. 1 Overall methodology followed to analyze the models for constrained usage
288
V. Kamath et al.
Fig. 2 Jetson Nano developer board with Raspberry Pi V2 camera module
as the number of true correct and incorrect identifications made by these models throughout testing. Figure 3 shows a sample view of the output screen during testing. Several objects are placed in front of the camera in each trail with each model to find to what extent these models are able to detect the object correctly and with what accuracy. This is required to bring a cross-comparison among the models which could give us an insight into the model in terms of their reliability and accuracy. The recorded outcomes are then used to determine the advantages and drawbacks of these models by exploring how they relate the structure of their architecture to their on-device performance in order to uncover any relationship between the two. This can prove to be beneficial knowledge for researchers seeking to design new lightweight models and balance trade-offs. The models that were chosen from the TensorFlow model hub [11], were those which are available in .tflite format with
Fig. 3 A view of output screen from Inceptionv1 during real-time testing showing the top-5 matches
Real-Time Applicability Analysis of Lightweight Models …
289
int/unit operations which include the Mobilenet models, Efficientnet models, and the Inception model. Models using float operations were excluded from this study which included the Mnasnet and Squeezenet models. Also, models offering less than 2 FPS were eliminated from the analysis which included the Efficientnet4 model which gave 1.2 FPS during the experiments.
4 Results and Discussion This section provides the results of the experiments and compares the models within themselves as well as the model’s output to their corresponding architecture. The architecture’s memory usage plots of the Mobilenet models v1 [12], v2 [13], and v3 [14], are provided in Fig. 4, that of Efficientnet models 0–3 [15], are provided in Fig. 5, and the same for Inceptionv1 [16] is provided in Fig. 6. The peak memory usage and the depth of the models obtained from the TFLite Model Analyzer are given in Table 1, along with their on-device speed in FPS and accuracy on the Imagenet Dataset. The relationship between the model’s depth, FPS, Size, and Peak Memory can be clearly visualized from the plots given in Fig. 7. It is apparent that peak memory utilization is directly related to model size, and models with higher peak memory usage tend to be larger in size (Fig. 7a). The Mobilenetv3 being the model with the highest peak memory and highest depth however provides a lower size due to quantization and hence gives the fastest speeds (Fig. 7c, d). However, during the experiments, this model did not perform well in terms of accuracy and mostly detected the objects incorrectly (Fig. 8).
Fig. 4 Memory usage versus operators plot of Mobilenet models obtained from the TensorFlow Lite model analyzer
290
V. Kamath et al.
The Inceptionv1 model, despite its 6.5MB size, has a peak memory use of 1003KB. In comparison to Efficientnet1 and Efficientnet2, the Inceptionv1 model has less peak memory and offers a noticeable architectural difference as it performs additional operations in the network’s later stages without affecting peak memory usage. The speed it offers is also comparable with the aforementioned Efficientnet models.
Fig. 5 Memory usage versus operators plot of Efficientnet models obtained from the TensorFlow Lite model analyzer
Fig. 6 Memory usage versus operators plot of Inceptionv1 model obtained from the TensorFlow Lite model analyzer
Real-Time Applicability Analysis of Lightweight Models … Table 1 Summary of model comparisons Model Size (MB) PeakAccuracy memory(KB) (imagenet) Mobilenetv1 [12] Mobilenetv2 [13] Mobilenetv3 [14] Efficientnet0 [15] Efficientnet1 [15] Efficientnet2 [15] Efficientnet3 [15] Inceptionv1 [16]
4.177 3.494 2.582 5.308 6.265 6.998 9.373 6.531
1204.224 1505.280 401.408 1505.280 1728.000 2028.000 3528.000 1003.520
70.89 72.83 67.40 74.83 76.67 77.48 79.83 68.70
291
On-device FPS (seconds)
Layer depth
8.1 10.2 21.5 8.0 5.6 4.2 2.9 4.1
31 65 119 64 84 84 96 83
Fig. 7 Graphs showing the relationship between the model variables. a Peak memory usage versus size, b FPS versus depth of the models, c FPS versus peak memory usage, d FPS versus size
Figures 8 and 9 show some examples of model outputs during the real-time testing on the Jetson Nano board. The Mobilenetv3(small) model was the fastest in speed; however, the model failed to correctly identify the objects in most cases, and the accuracy in case of correct identifications was also too low. The Mobilenetv1 model and the Inceptionv1 model are the best candidates for correct identification. Even though its speed is less than that of the Mobilenetv3 model, Mobilenetv1, which is faster and smaller than Inceptionv1, may be the ideal choice.
292
V. Kamath et al.
Fig. 8 Sample snapshots-1 of model outputs during real-time testing on Jetson Nano
5 Conclusion and Future Scope Deep learning models are renowned for being quite huge and are frequently not considered to be a viable option when employing constrained devices. However, due to the recent advancement in the field, a number of frameworks now offer lightweight models that have been pretrained on publicly available datasets. This has prompted many researchers to pay attention on employing deep learning models for computer vision applications on devices with constrained resources. In this work, the applicability of these models in the practical scenario was investigated using the light-weight models available in the TensorFlow Model library, and experiments were carried out to access the performance of these models in terms of speed, model size, accuracy, peak memory usage, and model architecture depth. An NVIDIA Jetson Nano board was utilized for the experiments conducted. The
Real-Time Applicability Analysis of Lightweight Models …
293
Fig. 9 Sample snapshots-2 of model outputs during real-time testing on Jetson Nano
Mobilenetv1 model was found to be an ideal choice due to the accuracy with which it performed in real-time, followed by the Inceptionv1. Models providing the fastest speeds, such as the Mobilenetv3-small, were not functioning satisfactorily in terms of real-time precision, thus future work must focus on preserving accuracy while making the models lighter. Furthermore, it was seen that increasing the peak memory usage of the model can result in an increase in model size, however, maintaining the peak memory low by utilizing the depth of the layer would not just help maintain the model size but would also aid in achieving improved accuracy. Models built for constrained devices must focus on maintaining moderate peak memory usage while attempting to improve the architecture in later levels. This can be an optimal trade-off for achieving architectures that can be adaptable on constrained devices.
294
V. Kamath et al.
Acknowledgements The authors would like to thank the Department of Mechatronics, MIT, MAHE, Manipal, for providing the Jetson Nano Developer Board required for the experiments conducted. This work was supported by the MAHE Dr. T.M.A Pai Research Scholarship under Research Registration No: 200900143-2021.
References 1. Kamath V, Renuka A (2023) Deep learning based object detection for resource constrained devices: systematic review, future trends and challenges ahead. Neurocomputing 531:34–60. https://doi.org/10.1016/j.neucom.2023.02.006 2. Huu PN, Ngoc TP, Hai TLT (2022) Developing real-time recognition algorithms on Jetson Nano hardware. In: Intelligent systems and networks. LNNS, vol 471. Springer, Singapore. https://doi.org/10.1007/978-981-19-3394-3_6 3. Krizhevsky A, Sutskever I, Hinton G (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386 4. Murthy CB, Hashmi MF, Keskar AG (2021) Optimized MobileNet+SSD: a real-time pedestrian detection on a low-end edge device. Int J Multimed Info Retr 10:171–184. https://doi.org/10. 1007/s13735-021-00212-7 5. Badhe NB, Bharadi VA (2021) Real time multiple object detection on low-constrained devices using lightweight deep neural network. In: Intelligent computing and networking. LNNS, vol 146. Springer, Singapore. https://doi.org/10.1007/978-981-15-7421-4_12 6. Kamath V, Renuka A (2021) Performance analysis of the pretrained EfficientDet for real-time object detection on Raspberry Pi. In: CCUBE-2021, IEEE, Bangalore, India, pp 1–6. https:// doi.org/10.1109/CCUBE53681.2021.9702741 7. Shin DJ, Kim JJ (2022) A deep learning framework performance evaluation to use YOLO in Nvidia Jetson platform. Appl Sci 12(8):3734. https://doi.org/10.3390/app12083734 8. Wang C, Carlson B, Han Q (2023) Object recognition offloading in augmented reality assisted UAV-UGV systems. In: Proceedings of the ninth workshop on micro aerial vehicle networks, systems, and applications (DroNet ’23), pp 33–38. ACM, NY, USA. https://doi.org/10.1145/ 3597060.3597240 9. Richard B (2021) PyTorch to TensorFlow Lite for deploying on Arm Ethos-U55 and U65. Website: arm community. https://community.arm.com/arm-community-blogs/b/ai-and-ml-blog/ posts/pytorch-to-tensorflow-lite-for-deploying-on-arm-ethos-u55-and-u65 10. Website: PyTorch, exporting a model from PyTorch to ONNX and running it using ONNX runtime. https://pytorch.org/tutorials/advanced/superresolution_with_onnxruntime.html 11. Website: TensorFLow model hub. https://tfhub.dev/s?dataset=imagenet-ilsvrc-2012-cls_ deployment-format=lite_subtype=module,placeholder 12. Howard AG et al (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. Preprint at arxiv: 1704.04861 13. Sandler M, Howard A et al (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on CVPR, Salt Lake City, UT, USA, pp 4510–4520. https://doi. org/10.1109/CVPR.2018.00474 14. Howard A (2019) Searching for MobileNetV3. In: IEEE/CVF ICCV, Seoul Korea (South), pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140 15. Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML, pp 10691–10700. Preprint at arxiv:1905.11946 16. Szegedy C et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on CVPR, Boston, MA, USA, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
An Efficient Fog Computing Platform Through Genetic Algorithm-Based Scheduling Shivam Chauhan, Chinmaya Kumar Swain, and Lalatendu Behera
Abstract The Internet of Things (IoT) has led to the adoption of fog computing for data-intensive applications that require low-latency processing. Fog computing uses distributed nodes near the network edge to reduce delays and save bandwidth. To optimize resource usage and achieve goals in fog computing, task scheduling plays a crucial role. To tackle this challenge, we propose an approach based on genetic algorithms, inspired by genetics and natural selection. We evaluate GA algorithm based on their ability to complete tasks and prioritize them effectively. The results show that the genetic algorithm performs exceptionally well, completing more tasks and handling priorities efficiently. Its adaptability to changing situations enables optimal resource usage and task completion. Our findings underscore the significance of genetic algorithms in fog computing systems. The insights gained from this study contribute to the advancement of fog computing systems, benefiting IoT applications and services. Keywords FCFS · NP-EDF · RTA · TGR · PGR · IoT · Fog · Genetic algorithm · Task scheduling
1 Introduction As we know, expansion in the field of communication and hardware technology, in which IoT devices played a vital role in many applications with real-time latency sensitivity. To host an IoT application, cloud can be used as an infrastructure. However, it can be difficult for cloud applications to support a large number of S. Chauhan (B) · L. Behera Department of Computer Science and Engineering, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India e-mail: [email protected] C. K. Swain Department of Computer Science and Engineering, SRM University, Amaravati, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_25
295
296
S. Chauhan et al.
geographically dispersed IoT devices. Due to this, the network became congested and experienced high latency. Therefore, to overcome these issues, fog computing was introduced [1]. Fog computing is a model for hierarchically distributed computing which act as a bridge between cloud and IoT devices as shown in Fig. 1. The fog computing ecosystem provides a platform and architecture for a variety of software services. It improves QoS by decreasing service delivery delays and freeing up network resources. Networking devices like routers, gateway routers, hubs, and proxy servers are commonly called fog nodes which have computing capabilities and can be used as fog nodes [1]. Fog nodes typically have a diverse range of resource capacities and environments for running applications. Due to their fundamental physical structure, most fog nodes, unlike cloud centers are severely resource-constrained and can be distributed around the edge. Large applications are represented as a set of lightweight, independent Application modules to align them in such fog computing environments. An application module carries out an operation according to input to produce appropriate output. Afterward, the result is provided to another module as input based on dependencies. Each module needs a specific number of resources, such as computational power, memory, and other resources to perform operations on input in a predetermined amount of time.
Fig. 1 Hierarchy of fog computing
An Efficient Fog Computing Platform Through Genetic …
297
2 System Model In fog computing, there is a hierarchy of nodes where lower level nodes are close to IoT devices and higher level nodes are far away, and they communicate with each other through some designated protocol. Lower level nodes have fewer resources, but they process data faster as they are closer to IoT devices and vice versa to high-level nodes [2]. It also forms a cluster of nodes so that there is no delay in processing the data. Fog infrastructure provides a special networking technique to ensure proper communication between the nodes without any delays or interruptions (Fig. 2). Fog Node Architecture—For managing applications running on fog nodes, we consider fog nodes to consist of three main components: the Controller Component, the Computational Component, and the Communication Component [1]. • Computational Component: It is where the fog node does its actual work. It is further divided into Micro Computing Instances (MCIs), which are individual parts of the fog node that can be used to run each application module. • Communication Component: If there are no MCIs, the computational Component is turned off and the node helps in forwarding the data packets as the only communication component. • Controller Component: It is responsible for managing the operations of the Computational and Communication Components. If the load on the fog node’s applications grows, the computational component can be restarted to manage the additional work.
2.1 System Environment The system environment is the fog computing system that consists of multiple fog nodes and IoT devices. The fog nodes are organized into clusters based on their proximity to the IoT devices, and each set has a Master Node that coordinates
Fig. 2 Organization of fog
298
S. Chauhan et al.
the processing of data among the fog nodes in the set [3]. The fog nodes provide resources, such as computing, storage, and networking capabilities, to execute Application Modules, which are programs that process the data from the IoT devices. The fog nodes communicate with each other and with the IoT devices using communication protocols. The system environment also includes a Controller Component, which is responsible for monitoring and managing the operations of the Computational and Communication Components of the fog nodes. The Controller Component ensures that the fog nodes are operating efficiently, and that the data is being processed on time [1].
2.2 Application Environment In the fog computing system, the application environment involves data processing from IoT devices. Fog computing, which extends the capabilities of cloud computing by placing computing resources closer to the data source, is utilized [4, 5]. In this application environment, data signals generated by the IoT devices are transmitted to fog nodes for processing [6]. However, IoT devices cannot carry out the necessary processing themselves due to resource and energy limitations. As a result, fog computing serves as an intermediary layer between the Internet of Things-connected devices and the cloud. The fog nodes are organized into clusters based on their proximity to the IoT devices, and each cluster is overseen by a Master Node responsible for coordinating data processing among the fog nodes within the cluster [2]. The fog nodes offer computing, storage, and networking capabilities to execute Application Modules, which consist of programs responsible for processing the data.
2.3 Optimization Goal The main focus is to minimize the latency (delay) in processing data from IoT devices in a fog computing environment. This paper proposes a scheduling strategy for managing Application Modules on fog nodes in a way that minimizes the time it takes for the data to be processed. The optimization goal of this paper is by assigning the data to the fog nodes that can process it with the lowest latency. The paper proposes a mathematical model to optimize the allocation of data to fog nodes, taking into account the processing time, communication time, and latency requirements of the Application Modules. The main goal is to cut down on the time it takes to process data from IoT devices. This is important for real-time applications like self-driving cars and industrial control systems. This will enhance the performance and efficiency of fog computing environments [5].
An Efficient Fog Computing Platform Through Genetic …
299
3 Methodology When processing tasks need to be distributed among multiple fog nodes, as they are in fog computing, effective scheduling strategies are crucial to ensuring optimal system performance. FCFS, NP-EDF, and Priority-based Scheduling are the three methods we selected for comparison with Genetic Algorithm as a scheduler, to evaluate these algorithms based on their ability to complete tasks and prioritize them effectively.
3.1 Genetic Algorithm-Based Scheduler. Initiate Population: The method “initiate population” is used to start the genetic algorithm’s first population. It makes a group of people, also called genes or solutions, stand in for possible solutions to the problem at hand. Most of the time, each person is shown as a set of genes or factors. Evaluate: The evaluate function measures the fitness of each solution in the group. The fitness number shows how well a problem is solved or how close the answer is to what would be the best answer. As part of the review process, each solution takes a fitness test, which compares their performance to standards or goals that have already been set (Fig. 3). Selection: Selection in genetic algorithms means choosing chromosomes from the present population to be the parents of future generations. During this process, chromosomes with higher fitness values are often chosen because they are thought to offer better answers. Most of the time, the selection process relies on the fitness of the solution. A solution with a higher fitness value is more likely to be chosen since it has a better chance of passing on good traits to its offspring. Crossover: The crossover function joins the genes of two parents’ solutions to make children’s solutions for the next generation. In spontaneous reproduction, the idea of recombining genes is copied. Crossover often involves the swap of genetic material (genes or segments) between two parent solutions. This is done so that one or more children’s solutions have a mix of their parents’ solution traits. Mutation: Mutation is another form of a genetic operator that can be utilized to explore new regions of the search space by introducing random, minor chromosomal changes. Through the process of mutation, chromosomal segments can endure random modification or transformation. The rate at which mutations occur determines the probability that a particular gene or chromosome will become mutated.
300 Fig. 3 Flowchart of genetic algorithm
S. Chauhan et al.
An Efficient Fog Computing Platform Through Genetic …
301
Algorithm 1 Genetic Algorithm for Task Scheduling Input: taskset, num processors, scheduling algorithm, population size=100, num generations=100, elitism ratio=0.2, crossover rate=0.8, mutation rate=0.1 Output: Scheduled tasks and blocked tasks
1:population = Intitalize(taskset,num processor,population size,“genetic”) 2:Set best fitness to negative infinity 3:Set best chromosome to None 4:Set blocked tasks to None 5:for generation in range(num generations) do 6: elite = chromosomes(population, taskset, num processors, elitism ratio) 7: Create next generation and copy elite chromosomes into it 8: while length of next generation ai + ei, and we used a priority range from 1 to 20 as well. This process was performed in Algorithm 1. To evaluate the performance of the different scheduling algorithms, we measured the task job when jobs were running concurrently scheduled on different virtual processors. We found that the Genetic algorithm outperformed the other methods in terms of Task Guarantee Ratio (TGR) [7, 8] and Priority Guarantee Ratio (PGR) [7, 8]. The improvement in TGR and PGR for Genetics was around 97% and 98%, respectively, compared to the other approaches.
4.2.1
Task Guarantee Ratio
Figure 4 represents the results of the Priority Guaranteed Ratio from the scheduling algorithm for a set of 200 tasks and 3 processors as shown in Fig. 4. In our study, we examined the TGR values, as shown in Fig. 4, to determine the effectiveness of four task scheduling algorithms: FCFS, Priority, NP-EDF, and Genetic. The TGR indicates how many tasks were completed before their deadline.
An Efficient Fog Computing Platform Through Genetic …
303
Fig. 4 Task guarantee ratio
The FCFS algorithm follows a simple approach of executing tasks in the order they arrive. However, this approach does not consider deadlines or priorities, leading to a relatively lower TGR. As the number of tasks increases, the TGR improves slightly, but it remains relatively low compared to other algorithms. The Priority algorithm assigns tasks based on their priority, giving higher priority tasks precedence. This approach significantly improves the TGR compared to FCFS by 39.28%, as critical tasks are prioritized for timely completion. As the number of tasks increases, a higher TGR is achieved, and the algorithm’s ability to meet deadlines remains relatively consistent. NP-EDF assigns tasks based on their deadlines, ensuring that tasks with earlier deadlines are executed first. This optimization based on deadlines improves the TGR significantly compared to both FCFS and Priority. NP-EDF showed a slight advantage over Priority by approximately 0.98%. As the number of tasks increases, NP-EDF continues to perform well, meeting deadlines efficiently and leading to a higher TGR. The Genetic Algorithm employs an evolutionary optimization approach, allowing it to adapt and optimize task schedules over generations. This flexibility and adaptability make it highly effective in meeting deadlines and minimizing task completion times. As the number of tasks increases, the Genetic Algorithm consistently outperforms other algorithms, indicating its robustness and ability to handle varying workloads effectively. The Genetic Algorithm outperformed all other algorithms by a significant margin, with an average improvement of about 42.98% over FCFS, Priority, and NP-EDF algorithms.
304
S. Chauhan et al.
Fig. 5 Priority guarantee ratio
4.2.2
Priority Guarantee Ratio
The PGR metric measured their effectiveness in meeting deadlines as shown in Fig. 5. The four task scheduling algorithms whose PGR values were revealed in the analysis of the PGR graph (Figure) were FCFS, Priority, NP-EDF, and the Genetic Algorithm. PGR calculates the percentage of all priorities that were completed before their due dates. This metric reveals how well each algorithm performs in terms of meeting project deadlines [7]. Among the four scheduling algorithms evaluated in the study, the Genetic Algorithm emerged as the top performer in meeting task deadlines. The Genetic Algorithm achieved an impressive Task Guaranteed Ratio (TGR) of 0.974, indicating that approximately 97.4% of the jobs were completed before their due dates. This outstanding performance can be attributed to the Genetic Algorithm’s adaptive and evolutionary nature. On the other hand, the other three algorithms, namely FCFS, Priority, and NP-EDF, also demonstrated respectable performance in meeting task deadlines but fell short compared to the Genetic Algorithm. They achieved TGR values of 0.377, 0.8985, and 0.9, respectively. The Priority algorithm prioritizes highpriority tasks for timely completion, and the NP-EDF algorithm optimizes resource use by focusing on task deadlines. These characteristics allowed them to outperform the simple FCFS algorithm, which schedules tasks based on their arrival time. However, their performance was still not as robust as the Genetic Algorithm’s.
4.2.3
Total Time Spent
In our research, to measure the total time spent is the total quantity of time between when an assignment is submitted and when it is completed. The results showed significant differences between the algorithms. The FCFS algorithm, which schedules
An Efficient Fog Computing Platform Through Genetic …
305
Fig. 6 Total time spent
tasks in the order of their arrival, had a total time spent of 15,093 units. The Priority algorithm, which assigns tasks based on priority, took 15,230 units longer than FCFS. While it outperformed FCFS, it did not perform as well as other algorithms. On the other hand, the NP-EDF algorithm used the earliest deadline for task distribution and produced 15,291 units. While it optimizes resource use based on deadlines, it may not always be the most suitable approach for minimizing total time spent. The Genetic Algorithm, with a total time spent of 11,639 units, proved to be the most efficient scheduler as shown in Fig. 6. Inspired by natural selection, its adaptive and flexible nature allowed for iterative optimization of task schedules in dynamic and unpredictable fog computing environments. As a result, the Genetic Algorithm efficiently assigned jobs to processors, leading to improved efficiency and completion times.
5 Conclusion and Future Work This study considered four scheduling algorithms: FCFS, NP-EDF, Priority-based scheduling, and the Genetic Algorithm. We used the ratio of guaranteed jobs and the ratio of guaranteed priority to find out how well these algorithms worked. In both of these ways of measuring how well a scheduling method works, the results showed that the Genetic Algorithm did better than the other methods. It worked better than FCFS, NP-EDF, and Priority-based scheduling at getting more tasks done and meeting priority requirements. One of the best things about the Genetic Algorithm is that it can change and improve scheduling results as the system’s needs and limits change. Using an evolutionary process, the algorithm can find almost perfect solutions. This makes it very good at handling difficult scheduling situations. While this study indicated that the
306
S. Chauhan et al.
Genetic algorithm achieved some promising results, there is still much to learn and improve in the field of scheduling algorithms. Here are some ideas for further research: (a) Performance optimization: While the Genetic algorithm did better at balancing jobs and priorities, researchers can improve its overall performance by finetuning its parameters and experimenting with different methods. Investigating alternative genetic operators, fitness functions, and selection techniques may help to enhance scheduling algorithms. (b) Hybrid approaches: It might be interesting to look into hybrid scheduling approaches that combine the best characteristics of multiple algorithms. Adding machine learning approaches to the Genetic algorithm and integrating it with other well-known scheduling algorithms should produce improved results. (c) Real-world case studies and real-world implementation: It would aid in confirming the study’s conclusions. Implementing the Genetic algorithm or other improved scheduling algorithms in real-world systems and evaluating how well they function under different conditions can yield real-world insights and demonstrate how well they perform.
References 1. Intharawijitr K, Iida K, Koga H (2016) Analysis of fog model considering computing and communication latency in 5G cellular networks. 2016 IEEE international conference on pervasive computing and communication workshops (PerCom workshops). IEEE 2. Takouna I et al (2013) Communication-aware and energy-efficient scheduling for parallel applications in virtualized data centers. 2013 IEEE/ACM 6th international conference on utility and cloud computing. IEEE 3. Selvarani S, Sudha Sadhasivam G (2010) Improved cost-based algorithm for task scheduling in cloud computing. 2010 IEEE international conference on computational intelligence and computing research. IEEE 4. Swain CK, Sahu A (2022) Reliability-ensured efficient scheduling with replication in cloud environment. IEEE Syst J 16(2):2729–2740 5. Kang Y, Zheng Z, Lyu MR (2012) A latency-aware co-deployment mechanism for cloud-based services. 2012 IEEE Fifth international conference on cloud computing. IEEE 6. Cheng L et al (2022) Cost-aware real-time job scheduling for hybrid cloud using deep reinforcement learning. Neural Comput Appl 34(21):18579–18593 7. Swain, Kumar C, Sahu A (2022) Interference aware workload scheduling for latency sensitive tasks in cloud environment. Computing 104(4):925–950 8. Swain CK, Sahu A (2018) Interference aware scheduling of real time tasks in cloud environment. 2018 IEEE 20th international conference on high performance computing and communications; Exeter, UK, pp 974–979 9. Schwiegelshohn, Uwe, Yahyapour R (1998) Analysis of first-come-first-serve parallel job scheduling. SODA 98 10. Panda, Kumar S, Nanda SS, Bhoi SK (2022) A pair-based task scheduling algorithm for cloud computing environment. J King Saud Univ Comput Inf Sci 34(1):1434–1445 11. Andersson, Björn, Baruah S, Jonsson J (2001) Static-priority scheduling on multiprocessors. Proceedings 22nd IEEE real-time systems symposium (RTSS 2001)(Cat. No. 01PR1420). IEEE
An Efficient Fog Computing Platform Through Genetic …
307
12. Lehoczky JP (1990) Fixed priority scheduling of periodic task sets with arbitrary deadlines. [1990] Proceedings 11th real-time systems symposium. IEEE 13. Lee J, Shin KG (2012) Preempt a job or not in EDF scheduling of uniprocessor systems. IEEE Trans Comput 63(5):1197–1206 14. Guevara JC, da Fonseca NLS (2021) Task scheduling in cloud-fog computing systems. Peerto-peer networking and applications 14(2):962–977 15. Soltani N, Soleimani B, Barekatain B (2017) Heuristic algorithms for task scheduling in cloud computing: a survey. Int J Comput Netw Inf Security 11(8):16 16. Gupta I, Kumar MS, Jana PK (2016) Transfer time-aware workflow scheduling for multi-cloud environment. 2016 international conference on computing, communication and automation (ICCCA). IEEE 17. Aazam M, Huh E-N (2015) Dynamic resource provisioning through fog micro datacenter. 2015 IEEE international conference on pervasive computing and communication workshops (PerCom workshops). IEEE 18. Fan Y et al (2018) Energy-efficient and latency-aware data placement for geo-distributed cloud data centers. Communications and Networking: 11th EAI international conference, ChinaCom 2016 Chongqing, China, September 24–26, 2016, Proceedings, Part II 11. Springer International Publishing
Development of a Pixhawk-Based Quadcopter: A Bottom-Up Approach Anuranjan Mishra, Varun Chitransh, Jitendra Kumar, and Navneet Tiwari
Abstract In this paper, the step-by-step design procedure for the development of a Pixhawk-based quadcopter drone is presented. The purpose of this article is to provide detailed information about building a quadcopter drone in a systematic manner. This article covers the working principle of quadcopter drones along with the operating procedure of the Pixhawk-Cube flight controller. The essential steps involved in drone development such as firmware updating, radio communication, motor testing, and calibration are discussed in detail. To confirm the effectiveness of these development steps, a Pixhawk-based quadcopter drone is developed and its performance is tested by plotting a current to throttle graph. All the steps and procedures suggested by the authors have been suggested from their own experience of drone development. Keywords Unmanned aerial vehicle (UAV) · Pixhawk flight controller · Bottom-up approach development
1 Introduction Unmanned Ariel Vehicle (UAV) is a type of aerial vehicle without any onboard pilot. In recent times, many different categories of UAVs have been developed such as Miniature Air Vehicles (MAV), Nano Air Vehicles (NAV), Pico Air Vehicle (PAV), and Smart Dust (SD) with their own applications and design challenges [1]. These days, we can see a lot of deliveries being done using drones [2]. Due to the availability of open-source software such as RflySim, building a drone from the ground up has never been this easy [3]. Furthermore, works in the field of UAV path optimization using algorithms like gray wolf optimization (GWO), Archimedes optimization algorithm (AOA), particle swarm optimization (PSO), or their combined algorithms A. Mishra · J. Kumar · N. Tiwari Center for Advance Studies, Lucknow, India V. Chitransh (B) IIT(BHU), Varanasi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_26
309
310
A. Mishra et al.
which enable the UAVs to fulfill their assigned civilian or military missions have also been developed in the recent past [4–6]. The Pixhawk “Black Cube” or Pixhawk 2.1, as previously known as the cube family of Pixhawks, was chosen for this build as it is an open-source platform-based Flight Controller. It has a community of thousands of enthusiasts who are constantly helping as well as developing new support systems for it. The Pixhawk “Cube Black” Flight Controller has several advantages. It has minimal attenuation caused by motors on its built-in sensors, three redundant IMUs and two barometers, which improves its accuracy; Additionally, it has a heated element that allows it to work in cold climates [7]. One can try to develop their own Pixhawk-based quadcopter and take help from various sources like YouTube and Mission Planner sites [8, 9], but the amount of information present on these platforms is too overwhelming for a new person who wants to develop their own UAV which may result in the person losing his/her interest for developing UAVs. Some work has also been done in the development of a UAV [10, 11]. But these works go in depth and do not cater to a layman. Therefore, with the help of this paper, we would like to share our UAV developing experience with a new person so that entering into this field becomes easier for him/her. The statistics indicate that the drone manufacturing potential may hit a whopping US $4.2 Billion by 2025, which will ultimately help in India’s goal of achieving US $5 Trillion economy. The Drone Shakti Initiative which was rolled out during Budget 2022 aimed to promote drones through the use of start-ups. This along with Government’s Production-Linked Incentive (PLI) scheme aimed at drones expects the rise in drone development in the domestic market [12]. Therefore, this paper aims to provide basic information to any person who wishes to enter the field of drone development. This paper presents a detailed step-by-step procedure to build a quadcopter using the Pixhawk Cube ‘Black’ Flight Controller. Section 2 describes the basic working principle of a quadcopter which is then followed by a list of basic components selected to build a quadcopter along with a block diagram showing connections of different components in Sect. 3. Section 4 describes the setup of the quadcopter in a detailed step-by-step manner. Section 5 includes results, which shows the pattern of current drawn by individual motors as well as the total current drawn by the quadcopter while performing basic maneuvers. Finally, a conclusion is drawn in Sect. 6.
2 Working Principle of a Quadcopter The term quadcopter means that it uses four propellers to fly. The quadcopter works on Newton’s Third Law of Motion. The propellers, when they spin, push the air downward which in turn creates an upward thrust that pushes the quadcopter in an upward direction. When this upward thrust exceeds the force of gravity, the quadcopter lifts and begins to move up. Figure 1 shows the application of Newton’s Third Law of
Development of a Pixhawk-Based Quadcopter: A Bottom-Up Approach
311
Fig. 1 Newton’s Third Law explaining the upward motion
Motion on the quadcopter. The lift produced by the propellers can be considered as an action while the upward movement of the quadcopter is the reaction. The quadcopter can show 3 types of movements.
2.1 Vertical Movement This is the upward and downward movement shown by the quadcopter. For upward movement, the F must be more than W. It is also called the altitude of the drone.
2.2 Lateral Movement It is the movement of the drone in the horizontal plane, at an altitude. For this to happen, the two propellers on any one side will provide less lift than the other two
312
A. Mishra et al.
propellers. The direction of movement is in that side whose pair of propellers produce less lift. In this way, a drone moves forward to backwards and left to right.
2.3 Rotational Movement This is the movement produced when the net torque acting on the drone becomes unequal in magnitude. The Propellers 1 and 3 rotate in the same direction that is clockwise direction, while Propellers 2 and 4 rotate in the opposite direction that is anti-clockwise direction. The net torque acting due to all the propellers is zero. Clockwise rotation of the drone happens when Propellers 1 and 3 produce more torque than Propellers 2 and 4 and vice versa. This movement is also known as the Yaw movement of the quadcopter. Figure 2 shows the direction of the rotation of Motors 1, 2, 3, and 4. The Motors 1 and 3 rotate in the clockwise direction, while Motors 2 and 4 rotate in the anticlockwise direction. Hence, we can say that a quadcopter has six degrees of freedom. This means that six variables are needed to express its position and orientation in space (x, y, z, ϕ, θ, and ψ). The governing equations for the operation of a quadcopter drone are given as (L1 + L2 + L3 + L4) > W, Upward Movement
(1)
(L1 + L2 + L3 + L4) < W, Downward Movement
(2)
(L1 + L4) > (L2 + L3), Rightward Movement
(3)
(L1 + L4) < (L2 + L3), Leftward Movement
(4)
(L1 + L2) > (L4 + L3), Backward Movement
(5)
Clockwise Rotation
CounterClockwise Rotation
Fig. 2 Propellers spin in different directions to cancel out the torque
Counter-Clockwise Rotation
Clockwise Rotation
Development of a Pixhawk-Based Quadcopter: A Bottom-Up Approach
313
(L1 + L2) < (L4 + L3), Forward Movement
(6)
(T1 + T3) > (T2 + T4), Clockwise Rotation
(7)
(T1 + T3) < (T2 + T4), Anti − clockwise Rotation
(8)
where L1, L2, L3, and L4 are the lifts produced by Propeller 1, 2, 3, and 4, respectively. ‘W’ is the weight of the drone. T1, T2, T3, and T4 are the torque produced by Propellers 1, 2, 3, and 4, respectively.
3 List of Components and Their Connections To build a drone, one would need certain basic components. Figure 3 shows a block diagram of the interconnection of different components for drone development. The list of these components used in drone development is given as follows:
3.1 Frame A frame is a structure that holds every component of the drone in its respective place. It provides the drone with mechanical strength and protects delicate electronic
Fig. 3 Block diagram showing connections of components for drone development
314
A. Mishra et al.
components. That is, it functions as a structure which houses the different components of the drone. Modern frames are generally made up of carbon fiber material. Here, we have used a Hexsoon edu-450 frame.
3.2 Motors This is the component which provides flight and motion capabilities to the quadcopter. A quadcopter generally uses Brushless DC motors. Features like high power-toweight ratio, low maintenance, better control over torque and speed, high speed, etc. make them excellent to be used in quadcopters. Here, we have used four T-Motor 2216 920 kV Brushless DC motors. There are always certain alpha-numeric codes written on the motor they imply. Four-digit number. These show the dimensions of the stator. The first two digits show the stator’s width and last two digits signify its height. Ky. This shows the speed of the motor (in rpm) if 1 Volt is applied to its terminals.
3.3 Electronic Speed Controller (ESC) It is an acronym for Electronic Speed Control. It is an electronic circuit which is used to control the speed of the motors. Here, we have used four Hobbywing XRotor 20 A Brushless ESCs to control the speed of all the four motors of the quadcopter.
3.4 Battery It is a charge storing device which stores chemical energy and converts it into electrical energy. Due to its high charge density, Li-Po batteries are used in quadcopters. Here, we have used a Lemon 3S 11.1 V 5000 mAh battery.
3.5 Power Distribution Board (PDB) As its name suggests, it is a PCB that is used to deliver power to the different components in a controlled manner. It consists of current sensors, and it may or may not have in-built voltage regulators in which case we use a BEC. Here, we have used a Hexsoon 40 A Power Distribution Board. Its acronym is PDB.
Development of a Pixhawk-Based Quadcopter: A Bottom-Up Approach
315
3.6 Flight Controller (FC) It is like the brain of the quadcopter. It controls different components of the quadcopter based upon either the inputs it receives from the receiver or path planned during automated flight. Here, we have used Pixhawk ‘Cube Black’ Flight Controller.
3.7 Transmitter and Receiver Set It is a pair of remote controls and a receiver which delivers the command given by the person controlling the quadcopter to the flight controller. It is used so that the quadcopter flies according to the will of the pilot. Here, we have used a FlySky FS-i6X transmitter paired with a FS-iA10B receiver module.
3.8 Ground Control Station (GCS) It is a software which is used either to set up or set a flying path or to monitor the parameters (such as altitude, pressure, speed etc.) during the flight of the quadcopter. Here, we have used Mission Planner.
3.9 Battery Eliminator Circuit (BEC) It is an electronic voltage regulator used to power the sub-circuits which, in this case, is everything other than the battery. It is used in those quadcopters whose PDB does not come with in-built voltage regulators.
4 Setup of a Quadcopter The setup of the quadcopter is a very crucial part of building a drone. The setup of the drone has been explained in the following steps (Fig. 4).
316
A. Mishra et al.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
Fig. 4 Drone development steps: a Step 1, b Step 2, c Step 3, d Step 4, e Step 5, f Step 6, g Step 7, h Step 8, i Step 9, and j Step 10
Development of a Pixhawk-Based Quadcopter: A Bottom-Up Approach
317
4.1 Step 1 In this step, we first assemble all the components and mount them on the drone frame. Then we make the connection as per the given figure. Once the connection is done, use a multimeter to check for any short circuits in the electronic components.
4.2 Step 2 In this step, we use a ground station (in this case Mission Planner) to connect with the Flight Controller. Connect the Flight Controller with Mission Planner using a suitable USB cable. Open Mission Planner and select the port and baud rate (115200) in the drop-down menu at the top right corner of the software’s interface. After selecting these, click on connect.
4.3 Step 3 Once connected to the computer, we need to upload the firmware to the Flight Controller. For this, select the Setup tab given at the top left corner. In there, select Install Firmware and in there select the option which represents the quadcopter. Once selected, the green bar at the bottom will fill up and a confirmation pop-up window will open showing the successful upload of the Firmware. Press OK and move to the next step.
4.4 Step 4 In this step, select the Mandatory Hardware option and then select the Frame Type of the drone. In here, select frame class and frame type which is best suited with your drone’s frame.
4.5 Step 5 Once Frame Type is set, now comes the calibration of various sensors such as inbuilt sensors of the Flight Controller and those connected to it. First, we calibrate the Accelerometer of the Flight Controller. For this, select the option Accel Calibration in the same Mandatory Hardware tab. In here, click the first option ‘Calibrate Accel’.
318
A. Mishra et al.
Then change the orientation of the frame as per the given instructions. Once done, a pop-up window of confirmation will show up. Press OK and move to the next step.
4.6 Step 6 In this step, we calibrate the compass (GPS system) connected to the Flight Controller. For this, select the compass tab below the accel calibration tab. In this, click ‘START’ button and rotate the drone frame in every direction until all the 3 bars fill up. Once done, a pop-up window of confirmation will show up. Press OK and move to the next step.
4.7 Step 7 In this step, we calibrate the radio system (i.e., Transmitter and Receiver). Select the tab Radio Calibration. In there, click the button which reads ‘Calibrate Radio’. In this, move the sticks to their extreme positions. Once done, a pop-up window of confirmation will show up. Press OK and move to the next step.
4.8 Step 8 In this we calibrate the ESCs. Select the ESC Calibration tab. In here, select the type of ESC protocol which is best suited for your ESC or else leave it selected to normal. Once this is done, click on the button ‘Calibrate ESCs’. Once this button is clicked, the ESCs will start beeping which shows that they are in calibration mode. Now follow the steps which are shown to the side of the button. Once done, a pop-up window of confirmation will show up. Press OK and move to the next step.
4.9 Step 9 In this, we select the Flight Modes from the Flight Mode tab. Select ‘Stabilize’ in all the drop-down menus.
Development of a Pixhawk-Based Quadcopter: A Bottom-Up Approach
319
4.10 Step 10 To complete the setup, finally we set the ‘Fail Safe’ for the drone from the ‘Fail Safe’ tab. Here, we can set three types of ‘Fail Safe’. Battery. In this, we select the voltage and Reserved MAH values (i.e., the values below which the drone will not ARM). Radio. In this, we select the action the Flight Controller would perform if connection to the transmitter is cut. Select ‘Enable always RTL’ option from drop down menu. Ground Control Station (GCS). In this, we can enable or disable the fail-safe if the connection to the Ground Station is lost.
4.11 Step 11 In this step, we take the drone for a Test Flight. For this, put the propellers on the motors, connect the battery to the Power Module, and after 2–3 seconds when the buzzer stops beeping press and hold the safety switch. Finally, hold the throttle stick of the transmitter in bottom right position for 1 second. A final beep can be heard. Now the drone will fly according to the command given from the Transmitter by the pilot.
5 Results Once the setup of the drone is done, now comes the current analysis of this drone. To do the current analysis, we have used a Clamp Meter which is used to record the current drawn by Individual Motor and the total current drawn by the UAV from the battery. Figure 5a shows the current consumed by an individual motor at different throttle stick positions, while Fig. 5b shows the total current drawn by the UAV components at different throttle stick positions. To take the measurements, a clamp meter was used. In the second graph, we can see that even at zero throttle, there is some current which is drawn by the UAV; this is the value of current that all the other components (excluding motors) draw from the battery. As we can see, the total current consumed by the UAV is four times the current consumed by an individual motor plus a constant current consumed by flight controller and all the peripherals attached to it.
320
A. Mishra et al.
0.33 0.87 0.15
0.4
1.87
0.27 1.71
0.36 0
1.59
(a)
(b)
Fig. 5 a Current consumed by individual motor; b Total current consumed by UAV
6 Conclusion In this study, the working principle of a quadcopter drone was presented, and the operating procedure of a Pixhawk-Cube flight controller was discussed. Further, the step-by-step design procedure of a Pixhawk-flight controller-based drone was presented. The important steps of drone development such as establishment of radio communication, calibration, and motor testing was discussed in detail. To confirm the effectiveness of the design steps, a Pixhawk-based quadcopter drone was developed and the experimental measurement results were provided to validate the effectiveness of the given design steps.
References 1. Hassanalian M, Abdelkefi A (2017) Classifications, applications, and design challenges of drones: a review. Progress Aerosp Sci 91:99–131 2. Gunturu R et al (2020) Development of drone based delivery system using Pixhawk flight controller. In: Proceedings of the 2nd international conference on IoT, social, mobile, analytics & cloud in computational vision & bio-engineering (ISMAC-CVB 2020). Townsend A et al (2020) A comprehensive review of energy sources for unmanned aerial vehicles, their shortfalls and opportunities for improvements. Heliyon 6(11):e05285 3. Wang S et al (2021) RflySim: A rapid multicopter development platform for education and research based on Pixhawk and MATLAB. 2021 International conference on unmanned aircraft systems (ICUAS). IEEE 4. Sreelakshmy K et al (2022) Metaheuristic optimization for three dimensional path planning of UAV. Soft computing: theories and applications: proceedings of SoCTA 2021. Singapore: Springer Nature Singapore, pp 791–802 5. Gupta H et al (2022) Synergetic fusion of Reinforcement Learning, Grey Wolf, and Archimedes optimization algorithms for efficient health emergency response via unmanned aerial vehicle. Expert Syst, e13224 6. Sreelakshmy K, Gupta H, Verma OP, Kumar K, Ateya AA et al (2023) 3d path optimisation of unmanned aerial vehicles using q learning-controlled gwo-aoa. Comput Syst Sci Eng 45(3):2483–2503 7. Delvaux L, Di Naro L (2023) Autopilot and companion computer for unmanned aerial vehicle: survey 8. https://www.youtube.com 9. https://ardupilot.org/planner
Development of a Pixhawk-Based Quadcopter: A Bottom-Up Approach
321
10. Priandana K, Hazim M, Wulandari, Kusumoputro B (2020) Development of autonomous UAV quadcopters using Pixhawk controller and its flight data acquisition. 2020 International conference on computer science and its application in agriculture (ICOSICA), Bogor, Indonesia, pp 1–6 11. Lim H, Park J, Lee D, Kim HJ (2012) Build Your Own Quadrotor: open-source projects on unmanned aerial vehicles. IEEE Robot Autom Mag 19(3):33–45 12. Indiatimes, https://economictimes.indiatimes.com/industry/transportation/airlines-/-aviation/ what-indias-drone-industry-is-expecting-from-nirmala-sitharamans-budget-this-time/articl eshow/97440183.cms, 2023/01/30
Inattentive Driver Identification Smart System (IDISS) Sushma Vispute, K. Rajeswari, Reena Kharat, Deepali Javriya, Aditi Naiknaware, Nikita Gaikwad, and Janhavi Pimplikar
Abstract Every year, distracted driving is one of the leading causes of fatalities on the road. Distracted driving practices may include texting, eating, drinking, or other such interruptive actions that may shift the focus of the attention of the driver while driving. Several measures have been undertaken to identify and prevent such mishaps, and many authors have put forth deep learning techniques to recognize these driving patterns. Taking note of it, this study analyzes the proposed systems and suggestions made by other authors to mitigate distracted driving and aims to propose a hybrid system that would be trained to recognize the distracted action of the driver in the car with the aid of deep learning models and alert the driver about the same with the help of IoT devices. The proposed sytem is capable of handling multiple tasks simultaneously with an optimized Deep Learning model and IoT devices. The ultimate goal of this paper is to propose a system that deploys a transfer learning approach using pre-trained deep learning models to produce a highly accurate classification result. The accuracy of determining whether the driver D. Javriya · A. Naiknaware (B) · N. Gaikwad · J. Pimplikar Department of Computer Engineering, Pimpri Chinchwad College of Engineering, SPPU, Pune, India e-mail: [email protected] D. Javriya e-mail: [email protected] N. Gaikwad e-mail: [email protected] J. Pimplikar e-mail: [email protected] S. Vispute · K. Rajeswari · R. Kharat Professor, Dept. of Computer Engineering, Pimpri Chinchwad College of Engineering, SPPU, Pune, India e-mail: [email protected] K. Rajeswari e-mail: [email protected] R. Kharat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_27
323
324
S. Vispute et al.
is distracted or not will be impacted using the transfer learning concept and finetuning on the pre-trained models like VGG16, VGG19, Xception, ResNet-50, and InceptionV3. A sensor and microprocessor would simultaneously work to detect the actions of the driver, classify the data, and would then deliver an appropriate reaction to the alarm system. The alert system consists of a display system which will accept data from the classification algorithm and will inform the driver of any distracted driving tendencies. When acts of distracted driving are identified, the alarm system will notify the driver. Among the deep learning models trained and tested, ResNet-50 gives the highest validation accuracy of 99.8% in classifying whether the driver is distracted or not. The microprocessor takes this result and sends it to the alert system. Keywords Machine learning · Deep learning · Convolutional neural networks (CNNs) · Image classification · Computer vision · Internet of Things (IoT) · Smart systems
1 Introduction Driving is an activity that requires constant attention, and it may cause loss of life and property if done inattentively. Distracted driving is an essential reason for traffic accidents. Every year, distracted driving results in greater deaths than any other human accidents worldwide. This is typically attributed to secondary activities including speaking over the phone, glancing behind, smoking, texting, running the radio, talking to co-passengers, etc. Based on a study by the National Highway Traffic Safety Administration (NHTSA) [1], 9,560 persons were killed in road accidents in the first 8,935 fatalities predicted for the same period in 2021. Figure 1 shows the death rate from 2011 to 2020 due to distracted driving of drivers. Fig. 1 Distracted driving death count (2010–2020) [2]
Inattentive Driver Identification Smart System (IDISS)
325
Due to its multitude of applications, including driver and passenger safety, it offers driving assistance and even autonomous vehicles where it might be appropriate for humans to take control of the vehicle. Identifying driver’s actions has slowly gained interest over the past decade, and much research is being done on driver distraction. For this purpose, a review is put forth to discuss and identify suitable techniques to accurately detect distracted driving habits. According to [3], distracted driving is defined as driving while engaged in other tasks that pull the driver’s attention away from the road. Vehicle distraction is reflected in passenger, driver, and human activities. It means “any behavior that distracts the driver’s attention from the task of driving”, including surrounding-based driving, behavioral, and physiological. Vehicle surrounding-based driving occurs when a motorist is distracted by events occurring around the automobile, such as collision sites, road work, persons, locations, and objects of interest along the route. Physiological distraction means distraction due to the driver’s biological signals like increased heart rate, heavy breathing, pupil dilation, and slowed digestion. Behavioral distraction is due to the physical behavior of the driver. Moreover, behavior is further sub-categorized into manual distraction, visual distraction, and cognitive distraction, and has been explained as follows. • Visual distraction means the driver’s gaze wanders out of the lane. The actions which come under this category include staring outside the car window or viewing a GPS device while driving. • Manual distraction means that the motorist lifts his hands off the steering wheel. The actions which come under this category are drinking, eating, changing car settings, reaching out for objects, or doing one’s hair and makeup while driving. • Cognitive distraction means the driver’s mind moves away from driving. The actions which come under this category include texting on the phone, talking on the phone, or talking to passengers. The main aim is to propose a real-time recognition system using Deep Learning and IoT to monitor and mitigate inattentive driving patterns. The purpose of this paper is to analyze and address different abnormal behavioral patterns of people driving on the road, to devise a real-time system using Deep Learning and IoT to detect and help prevent potentially hazardous driving habits, and to promote the idea of road safety by devising an appropriate response methodology to detect inattentive behavior among drivers (Fig. 2). The rest of the paper is structured as follows: Sect. 2 briefs on the existing literature work by other researchers concerning the methodologies used. The section contains prior research and recent work and describes a generic process flow for detecting distracted drivers and focuses on a comparative literature review survey, i.e., it compares the works proposed by researchers concerning their efficiencies, advantages, and disadvantages. Section 3 includes a proposed methodology and a brief description of the data used and the work. Implementation of the system is discussed in Sect. 4. Finally, results and conclusions are mentioned in Sects. 5 and 6.
326
S. Vispute et al.
Fig. 2 Types of distraction among drivers [3]
2 Literature Review With a clear understanding of the importance of avoiding distracted driving habits, a literature review is conducted to become acquainted with current driving distraction detection techniques for driver monitoring within the automobile cabin, examine them, and extract the data flow for distraction detection. During the procedure, the specified forms of distractions, such as visual, manual, and cognitive, are recognized using a framework for distracted driving identification. Developers of driver detection systems can improve on this fundamental information in the future by specifically stating ways for detecting different forms of driver distraction and incorporating
Inattentive Driver Identification Smart System (IDISS)
327
them into this detection framework. This subsection focuses on the literature review of past papers’ work done in this domain. As proposed in [4], a computer vision technology that processes picture data taken from inside automobiles was used to automatically determine whether people are driving distractedly. Their uniqueness is in giving context for the forecasts. This was accomplished by first identifying and localizing a range of elements in autos that lead to distracted driving from images (e.g., hands, cellphones, radio, and so on). The relative locations of these things were then evaluated inside an image using machine learning techniques to create predictions regarding distracted driving. It was predicted that real-time feedback to participants would aid in the correction of distracted driving when combined with the right contextual understanding of the precise components that lead to distracted driving. The author in [5] proposed a system to address the issues, successfully aid in accident prevention, and aid in saving lives. Using sensors like an accelerometer, this system can identify accidents and promptly notify those who need to know through text messages. This device also can tell if the driver or passenger is drunk and then take the proper precautions. Additionally, the system has a face recognition capability that ensures that only authorized individuals can operate the car, which helps combat the issue of underage driving and theft. They have developed an IoTbased Vehicle-to-Vehicle (V2V) solution. It will aid in avoiding approaching vehicles and help to avoid accidents brought on by abrupt braking. The authors in [6] design an accurate and dependable framework for detecting misdirected drivers. This research makes use of the State Farm distracted driver detection dataset, which includes eight classes: phoning, texting, daily driving, radio operation, inactivity, chatting with a passenger, glancing behind, and drinking. The transfer values of the pertained model EfficientNet are employed as the backbone of EfficientDet. Furthermore, they constructed five variations of the EfficientDet model for detection purposes, EfficientDet (D0-D4), and compared the best EfficientDet version with Faster R-CNN and YOLOv3. The study in [7] develops a crash prevention and safety aid device that uses a Raspberry Pi and a Pi camera to continually take frames from the driver’s recorded video. The facial landmarks are continually scanned and recorded using the Pi camera. These obtained landmark frames are sent into the MAR and EAR algorithms for assessing the mouth aspect ratio and detecting sleepiness, respectively. The proposed technique defines a threshold and predicts driver drowsiness, wherein if the EAR ratio falls below the specified limit, the eye is kept close for an extended length of time, or the MAR ratio climbs over the established limit. The device then employs a sound system to alert the driver. The authors’ [8] goal is to address the problems that lead to deadly collisions while also incorporating safeguards against fatigue, collisions, and impediments. The accident intimidation text may not reach the intended number as a result. By using technology concepts like big data and GPS to evaluate the resulting data collection and recognize and identify the patterns associated with the crashes, the proposed and later implemented techniques can be enhanced and changed. The suggested system makes use of an embedded system built on top of the Internet of Things and the Global
328
S. Vispute et al.
System for Mobile Communication (GSM). The authors creates a control system for monitoring various metrics or accident causes using various sensors, including alcohol sensors, blink sensors, vibration sensors, and IR sensors. The experimental findings demonstrate that the suggested model outperforms other devices currently on the market. The following is a comparison of a few literature publications and a comparative analysis of their methodologies, findings achieved, benefits and drawbacks, and recommended actions that were undertaken. This will greatly help in understanding the details of previous works more systematically and concisely (Table 1). After studying and gaining insights from the previous papers, the following points were inferred: • For the detection of the inattentiveness of a driver, no single parameter can be used. A combination of different parameters should be used for maximum accuracy. • Driver monitoring systems and in-vehicle adaptive distraction detection systems can be the two strategies used for real-time driver distraction detection. • As the definition of distraction is not universally formal, various distraction measures such as behavioral, manual, and physiological should be considered. • There are abundant machine learning and deep learning algorithms for efficient detection of driver distraction. • Usually, a hybrid system considers various external factors like traffic, roadway, age of driver, climate, etc., for maximum accuracy. • Different domains like IoT can be used to develop a smart system for detecting driver distraction and alerting the driver. • There are not many review papers that give detailed mechanisms and working algorithms used for the detection of a distracted driver. The study of prior research papers paved the way for some questions such as the ways to combine two different domains to create an efficient system for driver distraction, the existence of new technologies or methodologies which can be used for the inattentiveness of a driver with maximum accuracy, new work done in the field of Deep Learning regarding this topic, and other such questions. The answers to these questions would help in gaining meaningful insights into this topic. But most importantly, it will help in understanding how the application of different machine learning and deep learning algorithms to an IoT domain would help in making an accurate and efficient smart driver inattentiveness system.
3 Proposed Methodology This section puts forth a methodology for detecting distracted driving tendencies using state-of-the-art techniques. Firstly, a description of the used dataset is provided in brief along with an alternative dataset similar to it. This would help in understanding the data used as input and give insight into how it could influence the results of the proposed methodology. Furthermore, a brief working of the model is
2021
2020
Self-designed algorithm
ResNet + HRNN + Inception module
Simple CNN model + VGG16 + Inception V3 + LSTM model
[8]
[9]
[10]
2
3
4
2020
EfficientDet model, 2021 Faster R-CNN, YOLOv3
[6]
1
Year published
The technique/ model used
Paper reference No.
Sr. no.
Table 1 Comparative survey of existing literature Accuracy
The State Farm’s Distracted Driver Detection Dataset + custom-made dataset
State Farm Distracted Driver Detection
Self
85%
96.23%
93%
State Farm 99.16% Distracted Driver Dataset (SDDD)
Dataset used
Fusing sensor data with eyesight data and fine-tuning considerably improves the accuracy of the distracted driver identification job
It uses a small number of parameters and combines already existing models
Uses different sensors for different purposes independent of each other. If one sensor is damaged it won’t affect the other
Faster than previous object detection models in terms of running speed and success rate
Advantages
(continued)
The public dataset used for transfer learning contains photographs of several drivers, but the collected dataset has images of only one genuine driver
Larger models, such as Xception, VGG, and ResNet-50, are challenging to optimize on this dataset. The number of layers increases, but the accuracy decreases
The proposed system depends on hardware like sensors, microcontroller so the system may collapse if any of these are damaged
An enhanced version of the dataset may be offered to improve the system by incorporating other sensor modalities, such as a microphone
Disadvantages
Inattentive Driver Identification Smart System (IDISS) 329
E2DR model [E2DR (A1, A2) where A1, A2 ∈ {ResNet, VGG16, MobileNet, Inception}]
[11]
[12]
[13]
5
6
7
2020
2022
Year published
Handcrafted 2022 algorithms, DCNN, HOG
HCF: pre-trained model combining ResNet-50, InceptionV3, and Xception
The technique/ model used
Paper reference No.
Sr. no.
Table 1 (continued) Accuracy
96.74%
AUC 100 features: Distracted 95.5%; 250 Driver Dataset features: 95.8%; 500 features: 95.9%
State Farm Distracted Driver Detection
State Farm 92% Distracted (ResNet + Driver Dataset VGG16)
Dataset used
The combination of handmade features with deep features, according to performance data, can give considerably improved accuracy in identifying distracted driving
The proposed model can help prevent overfitting and enhance the deep learning capability. It outperforms other models w.r.t. accuracy, robustness, and processing time
It proposes E2DR, a new model that uses stacking ensemble methods to improve accuracy, enhance generalization, and reduce overfitting
Advantages
Limitation on achieving better accuracy rates
The model is greatly influenced by light so accuracy may falter at night; HCF output accuracy may degrade based on the position of the camera
The computational complexity while developing E2DR from scratch is difficult and time-consuming
Disadvantages
330 S. Vispute et al.
Inattentive Driver Identification Smart System (IDISS)
331
Fig. 3 State farm distracted driver dataset images [14]
elaborated upon, alongside mentioning how the model is different from the proposals put forth by other researchers.
3.1 Description of Dataset Detecting a distracted driver can be done using deep learning models wherein each model generates its accuracy. But firstly, to perform classification, we need to define the dataset on which the models would be trained and tested. The dataset used to detect driver distraction can be public as well as custom-made. The most used public datasets are the State Farm Distracted Driver Detection dataset [14] and the Distracted Driver Dataset by AUC [15]. Some custom-made datasets cover the classes that may be missing in public datasets. The State Farm Distracted Driver Detection dataset [14] is the most used. It is publicly available on Kaggle. The dataset contains two subfolders—the test folder contains around 79.97K images, and the train folder contains around 22.4 K images. Each folder contains images categorized into 10 classes (Figs. 3, 4 and Table 2).
3.2 Architecture Diagram The above diagram shows the proposed model for detecting distracted driving tendencies among drivers. The system consists of a Pi camera that would film real-time video footage and capture images of the driver intermittently. The image capturing is followed by identifying the region of interest—the driver’s posture. This can be achieved through localization, a method used to identify the main region of interest in the input image. With the help of deep learning models and machine learning algorithms, the system would be trained to identify the action of the driver in the car. The algorithms deployed such as supervised learning algorithms or convolutional neural networks would be able to categorize whether the driver is distracted or not distracted. However, this classification is bound to be more specific. This would
332
S. Vispute et al.
Fig. 4 State farm distracted driver detection dataset classes [16]
mean recognizing the distracted driver’s actions such as talking, eating, texting, and looking away from the front rather than just labeling the driver as distracted. Here, the accuracy of the model is of utmost importance since it would be a deciding factor in how efficient the proposed model would be (Fig. 5). The results of the classification of the driver as distracted or non-distracted would simultaneously be analyzed by a sensor which would accordingly send a response to the alert system. The alert system will consist of a sound or display system which will take input from the classification algorithm and this alert system will then alert the driver of any distracted driving tendencies via sound or display. The alert system will alert the driver according to the actions detected like eating, drinking, etc. This whole process will be processed by a microprocessor which will be finalized after the finalization of the classification algorithm.
Inattentive Driver Identification Smart System (IDISS)
333
Table 2 Dataset comparison Sr. Paper Dataset used no. ID
Public or custom-made
Capturing sensor
No. of No. of classes images
1
[17]
AUC Distracted Driver (AUCD2)
Public
ASUS ZenPhone
10
12,977
4331
2
[17]
State Farm Distracted Driver Detection (SFD3)
Public
Dashboard camera
10
22,424
79,728 102,152
3
[18]
Local dataset with 500 drivers
Custom-made Generic 10 RGB Cameras on the windscreen
17,600
4400
22,000
4
[19]
Self-made
Custom-made Mobile phone camera
8
20,520
5130
25,650
5
[20]
UAH-DriveSet Public
3
171,321 42,830 214,151
Smartphone app DriveSafe
Fig. 5 Proposed model of distracted driver identification
17,308
334
S. Vispute et al.
4 Implementation This section describes the tools and technologies used, different deep learning models implemented, various IoT components, and the implementation of our proposed model. The model combines two different technologies, i.e., Deep Learning algorithms and IoT. The software module describes the structure and working of deep learning models used in the system, whereas the hardware module elongates the working of the IoT part.
4.1 Tools and Technologies Used The software module was built using Keras and TensorFlow Python libraries and Kaggle IDE. The dataset used for this system is the State Farm Distracted Driver Detection Dataset, and VGG16, VGG19, Xception, ResNet-50, and InceptionV3 models were applied to it for classification. To implement the hardware components with these models, Proteas Software was used. As for the hardware module, various IoT components were used such as a camera, alcohol sensor, acceleration sensor, SD card, alert system (display screen), jumpers, breadboard, and RAM. The camera used was Raspberry Pi Camera Rev 1.3 and the alcohol and acceleration sensors used were MQ3 and ADXL345 models, respectively.
4.2 Deep Learning Models 4.2.1
VGG16
VGG16 is a 16-layer convolutional neural network. VGG16 is an object detection and classification algorithm that has a 92.7% accuracy rate when classifying 1000 images into 1000 distinct categories. It is a well-liked method for classifying images and is simple to use with transfer learning. The 16 in VGG16 stands for 16 weighted layers. Thirteen convolutional layers, five Max Pooling layers, three Fully Connected layers, and a total of 21 layers make up VGG16, but only sixteen of them are weight layers, also known as learnable parameters layers. The input tensor dimension for VGG16 is 224*244 with three RGB channels [21] (Fig. 6).
4.2.2
VGG19
VGG19 is a highly developed CNN with pre-trained layers and a keen understanding of how shape, color, and structure create an image. With millions of different images and challenging classification problems, VGG19 has undergone extensive training.
Inattentive Driver Identification Smart System (IDISS)
335
Fig. 6 Architecture of VGG16 [21]
Fig. 7 Architecture of VGG19 [22]
VGG19 is an acronym for 19 weighted layers. VGG19 consists of 24 layers in total, including 16 convolutional layers, 5 Max Pooling layers, 3 Fully Connected layers, and 16 convolutional layers; however, only 19 of these layers are weight layers, also referred to as learnable parameters layers. Three RGB channels make up the 224*244 input tensor size for VGG19 [22] (Fig. 7).
4.2.3
Xception
Xception stands for “Extreme Inception”. The feature extraction base of the network in the Xception architecture is composed of 36 convolutional layers. The entry flow, the middle flow, which is repeated eight times, and the exit flow are all the steps that the data must initially go through. Keep in mind that batch normalization (not shown in the diagram) is applied after each Convolution and Separable Convolution layer. There is no depth expansion and a depth multiplier of 1 is used for all Separable Convolution layers [23] (Fig. 8).
4.2.4
ResNet-50
The convolutional neural network, or CNN, with 50 layers is called ResNet-50. Residual Networks are referred to as ResNet. The bottleneck building block is used
336
S. Vispute et al.
Fig. 8 Architecture of Xception [23]
Fig. 9 Architecture of a ResNet-50
in the 50-layer ResNet. A bottleneck residual block also referred to as a “bottleneck” uses 1*1 convolutions to cut down on the number of parameters and matrix multiplications. This makes each layer’s training significantly faster. Instead of using a stack of two levels, it employs three layers [24] (Fig. 9).
4.2.5
InceptionV3
Inception V3 is a deep learning model based on CNN that is used to classify images. The Inception V3 is a better version of Inception V1, a foundational model that was first released as GoogLeNet in 2014. It was created by a Google team, as the name suggests. The Inception v3 model, which was introduced in 2015, has 42 layers overall and a reduced mistake rate than its forerunners. Let’s examine the various
Inattentive Driver Identification Smart System (IDISS)
337
Fig. 10 Architecture of Inception V3
improvements that the Inception V3 model has received [25]. The following are the main changes made to the Inception V3 model (Fig. 10): • • • •
Putting the Factor in Smaller Convolutions Asymmetric Convolutions through Spatial Factorization Auxiliary classifiers’ usefulness Reduce Grid Size Effective.
4.3 Software Module The software module comprises the architecture. The deep learning module of the project comprises pre-trained deep learning models selected from the Keras applications library as per referred literature papers. The models are used to perform transfer learning, by using the knowledge of the pre-trained models and tuning the weights according to the application; in this case, detecting distracted driving behaviors. The models are initially trained on the “State Farm Distracted Driver Dataset”. The combined transfer learning approach involves unfreezing the penultimate layer of each pre-trained model and then fine-tuning the models. The models are then trained for five epochs which will improve the performance of the pre-trained models in accurately classifying the driving behaviors. The output generated will provide probability values indicating the probability of an image belonging to a particular class (Fig. 11).
338
S. Vispute et al.
Fig. 11 Classification workflow
4.4 Hardware Module The hardware module consists of a microprocessor, sensors, and display. This system is responsible for differentiating some of the actions of the driver and providing the alert message to the driver. The system consists of different hardware gadgets which are mentioned below. • • • •
Raspberry Pi MQ3 (alcohol sensor) Adxl345 (acceleration sensor) Display.
The action detected by the deep learning module is provided to the microprocessor. It differentiates between different actions. If one of the actions is drinking, the MQ3 sensor is activated, and it checks whether the driver is drinking alcohol or not. This system keeps running until the vehicle is in motion. Once the vehicle comes to rest, the system stops. This is controlled by the acceleration sensor. The action is conveyed to the driver via the display.
5 Results After the model predicts the action of the driver, the message of action detected is encrypted and is provided to Raspberry Pi. The message is decrypted by Raspberry Pi, and it is searched for the different actions. If the driver is drinking, MQ3 (alcohol sensor) is activated, and it checks whether the driver is drinking alcohol or water. The alert message is displayed to the driver via a display. This system runs until and
Inattentive Driver Identification Smart System (IDISS)
339
Table 3 Results before fine-tuning Model Name
Before Fine-Tuning Training Accuracy
Training Loss
Validation Accuracy
Validation Loss
VGG16
0.9952
0.0126
0.9953
0.0146
VGG19
0.9967
0.0090
0.9937
0.0142
Xception
0.9939
0.0423
0.9897
0.0468
ResNet-50
0.9952
0.0470
0.9964
0.0371
InceptionV3
0.9873
0.0737
0.9850
0.0673
unless the vehicle is in motion. As the vehicle comes to a rest state, the system is switched off. This is controlled by the acceleration sensor. The accuracy of the Deep Learning models is studied, and the most efficient model is chosen. The accuracy taken is before fine-tuning the models and after fine-tuning. In this case, the model ResNet-50 gives the most efficient accuracy for classifying the images into correct categories.
5.1 Before Fine-Tuning The above-mentioned deep learning models were trained for 10 epochs on the State Farm Distracted Driver Dataset, and the accuracy values have been recorded in the table shown (Table 3).
5.2 After Fine-Tuning For the fine-tuning procedure, the penultimate layers of the earlier listed deep learning models were frozen, followed by fine-tuning the models on another 5 epochs to obtain the results as follows (Figs. 12, 13 and Table 4).
340
Fig. 12 Accuracy of ResNet-50 before and after fine-tuning Fig. 13 Loss of ResNet-50 before and after fine-tuning
S. Vispute et al.
Inattentive Driver Identification Smart System (IDISS)
341
Table 4 Results after fine-tuning Model Name
After Fine-Tuning Training Accuracy
Training Loss
Validation Accuracy
Validation Loss
VGG16
0.9991
0.0126
0.9973
0.0146
VGG19
0.9996
0.0090
0.9975
0.0142
Xception
0.9939
0.0423
0.9897
0.0468
ResNet-50
0.9997
0.0066
0.9984
0.0105
InceptionV3
0.9982
0.0161
0.9935
0.0297
6 Conclusion One of the leading causes of fatalities every year is distracted driving. Texting, eating, drinking, or other such actions are included in the practice of distracted driving. Over 80 percent of road accidents and 3000 casualties were reported in 2020 due to inattentive driving practices. Hence, this becomes a cause of concern regarding safety, for which many measures have been taken. This work aimed to propose an integrated model comprising deep learning and the Internet of Things that could detect and classify distracted driving behaviors among drivers. A few pre-trained models, namely VGG16, VGG19, Xception, ResNet-50, and InceptionV3, were trained on the State Farm Distracted Driver Dataset (SFDDD). The penultimate was unfrozen for each model. Transfer learning and fine-tuning were also performed to adjust the pre-trained weights and increase the model accuracy. ResNet-50 was found to show the best results with a training accuracy and validation accuracy of 0.99 each. The implementation of this model on the Raspberry Pi was carried out, providing images as input and generating the classification as output. This classification was subsequently supplied as input to the sensors, triggering an alert on detecting distracted driving behavior. However, the challenge of increasing the classification accuracy and storing the classified images in memory for faster detection remains.
References 1. “NHTSA Early Estimates Show Record Increase in Fatalities | NHTSA.” https://www.nhtsa. gov/press-releases/early-estimates-first-quarter-2022. Accessed Nov. 03, 2022 2. “Distracted Driving Statistics 2022 | Bankrate.” https://www.bankrate.com/insurance/car/dis tracted-driving-statistics/. Accessed Nov. 03, 2022 3. Review on driver distraction classification to avoid types of driver distraction visual manual 3(5):2630–2639 (2022) 4. Dey AK, Goel B, Chellappan S (2021) Internet of Things Context-driven detection of distracted driving using images from in-car cameras. Internet of Things 14:100380. https://doi.org/10. 1016/j.iot.2021.100380 5. Kn V, Nishant B, Manisha S, Vr N, NSR (2021) IOT based monitoring of driver parameters for the prevention of accidents in a smart city 25(5):4871–4878 6. An efficient deep learning framework for distracted driver detection_ Enhanced Reader.pdf
342
S. Vispute et al.
7. Eswari SUR (2021) Accident prevention and safety assistance using IOT and machine learning. J Reliab Intell Environ, 123456789. https://doi.org/10.1007/s40860-021-00136-3 8. Sabri Y (2021) Internet of things (IoT) based smart vehicle security and safety system 12(4) 9. Alotaibi M, Alotaibi B (2019) Distracted driver classification using deep learning. Signal, Image Video Proc 14(3):617–624. https://doi.org/10.1007/S11760-019-01589-Z 10. Omerustaoglu F, Sakar CO, Kar G (2020) Distracted driver detection by combining in-vehicle and image data using deep learning. Appl Soft Comput 96:106657. https://doi.org/10.1016/J. ASOC.2020.106657 11. Aljasim M, Kashef R (2022) E2DR: a deep learning ensemble-based driver distraction 12. HCF_ A Hybrid CNN framework for behavior detection of distracted drivers_enhanced reader.pdf 13. HSDDD_a hybrid scheme for the detection of distracted driving through fusion of deep learning and handcrafted features_ Enhanced Reader.pdf 14. “State Farm Distracted Driver Detection | Kaggle.” https://www.kaggle.com/competitions/ state-farm-distracted-driver-detection/data. Accessed Nov. 4, 2022 15. “AUC Distracted Driver Dataset - Yehya Abouelnaga.” https://abouelnaga.io/projects/auc-dis tracted-driver-dataset/. Accessed Nov. 4, 2022 16. “Types of Distracted Driving in New York City And Why it Occurs?” https://rmfwlaw. com/blog/car-accidents/types-of-distracted-driving-in-new-york-city-and-why-it-occurs/. Accessed Nov. 03, 2022 17. Qin B, Qian J, Xin Y, Liu B, Dong Y (2021) Distracted driver detection based on a CNN with, 1–12 18. Abunadi I (2021)A novel ensemble learning approach of deep learning techniques to monitor distracted driver behaviour in real time, pp 8–13 19. Drivers D (2021) A hybrid deep learning model for recognizing actions of distracted drivers 20. Driver behavior classification system analysis using machine learning methods_enhanced reader.pdf 21. Everything you need to know about VGG16 | by Great Learning | Medium. https://medium.com/ @mygreatlearning/everything-you-need-to-know-about-vgg16-7315defb5918. Accessed May 14, 2023 22. Image Detection Using the VGG-19 Convolutional Neural Network | by Melisa Bardhi | MLearning.ai | Medium. https://medium.com/mlearning-ai/image-detection-using-convoluti onal-neural-networks-89c9e21ff 23. Image Classification With Xception Model | by Nutan | Medium. https://medium.com/@nutanb hogendrasharma/image-classification-with-xception-model-e8094a9de4d2. Accessed May 14, 2023 24. ResNet-50: The Basics and a Quick Tutorial. https://datagen.tech/guides/computer-vision/res net-50/. Accessed May 14, 2023 25. Inception V3 Model Architecture. https://iq.opengenus.org/inception-v3-model-architecture/. Accessed May 14, 2023
Convolutional Neural Network in Deep Learning for Object Tracking: A Review Utkarsh Dubey
and Raju Barskar
Abstract In recent times, there is an enormous growth of data, which raises the need for a system that handles it efficiently. The emergence of deep learning and machine learning algorithms has improved the performance of automated image classification tasks. The Convolutional Neural Network (CNN) architecture is designed to process raw images with less manual preprocessing which makes it the most suitable model for various complex deep learning tasks. Every neural network architecture comprises smaller units called Neurons that perform a specific operation on the input data, leading to extraction of useful information called features. The learning capability of the Convolutional Neural Network is enhanced with the utilization of a training algorithm like Backpropagation and apart from these optimizers, there are several hyperparameters that can be fine-tuned in the system, without compromising the accuracy of the model. The research work explores the scope of CNN which will help researchers to have a broad comprehension of CNN. Keywords Deep learning · Training algorithm · Convolutional neural network (CNN) · Image classification · Machine learning
1 Introduction The use of artificial intelligence (AI) and related algorithms has grown significantly in recent years. Artificial intelligence (AI) enables machines to replicate human behaviour and exhibit characteristics of the human mind. The biggest challenge to AI is to solve the problems that a human can solve intuitively, like recognizing words in speech and objects in images. In AI systems, machine learning (ML) is a viable approach that helps the computer to acquire knowledge and make decisions from U. Dubey (B) · R. Barskar Department of Computer Science and Engineering, University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal 462033, Madhya Pradesh, India e-mail: [email protected] R. Barskar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 O. P. Verma et al. (eds.), Machine Intelligence for Research and Innovations, Lecture Notes in Networks and Systems 832, https://doi.org/10.1007/978-981-99-8129-8_28
343
344
U. Dubey and R. Barskar
raw data of complicated, real-world environments. Precisely, deep neural network (DNN) algorithms are rapidly becoming the dominant paradigm for AI applications like computer vision (CV) [1]. A class of machine learning (ML) algorithms known as deep learning (DL) processes large amounts of data in multiple layers to model high-level data abstraction and extract features. DL is a term for artificial neural networks (ANN) that use intricate multi-layer perceptron models. Convolutional Neural Network, Recursive Neural Networks, and Recurrent Neural Networks comprise the fundamental network architecture of the complicated ANN known as Deep Learning (DL), which has a huge number of variables and layers. In 1943, McCulloch and Pitts developed the first model of ANNs inspired by biological neurons. Later in the late 50s, Frank Rosenblatt proposed the singlelayered perceptron algorithm that classifies the input data [2]. However, Minsky and Papert reported limitations of the perceptron model, which decreases the optimism in human-level intelligence-like activity. As the backbone of all supervised ANNs, backpropagation training algorithms were introduced in the 70s and 80s. Later in the 90s, [3] proposed Convolutional Neural Networks (CNNs) for handwritten character recognition. In CV applications, CNNs have revolutionized the accuracy of detection, recognition, and classification tasks among the other available DNNs [4]. Computer Vision trains AI systems to automate tasks like perception and cognition that are performed by a biological vision system. Object detection and recognition are a class of CV applications that identify objects in images or videos using deep learning and machine learning algorithms. CNNs use a variation of multi-layer perceptron models that require a minimum amount of preprocessing. As research and development of CNN models progress for performance optimization, the network size proliferates, leading to both computation-intensive and memory-intensive techniques.
1.1 Machine Learning Machine learning (ML) is a technique with a statistical nature to allow computer systems to progressively learn, i.e., gradual improvement in performance with data, without being programmed in a predetermined manner [5]. Machine learning algorithms require vast amounts of structured or unstructured data to learn and predict. The term machine learning was initially proposed by Arthur Samuel in 1959. The branch of artificial intelligence known as machine learning (ML) investigates and examines the development of algorithms that can learn from data and make predictions about it. It developed from the study of artificial intelligence’s pattern recognition and computational learning theory. ML applications are many and varied, and they include high performance algorithm creation and programming, which is ordinarily difficult or even impossible. The applications of ML include tasks such as email filtering, network filtering, object classification, and detection in the subfield of computer vision. Currently, with an increased amount of data and computing power available, an approach for machine learning called an artificial neural network
Convolutional Neural Network in Deep Learning for Object Tracking …
345
(ANN) [6] is growing in popularity. Based on the nature of learning, neural network models are broadly classified as follows. Supervised Learning Supervised Learning models gain knowledge by being trained using the available historical data and applying it onto new data to make a prediction. These algorithms are trained with data in which the expected outcome is known in prior. On repeated training, these algorithms use the patterns in data and develop the ability to make a prediction, which it has not seen before. Classification, Regression, and Gradient Boosting tasks can be performed by training a model in a Supervised manner. Unsupervised Learning Unsupervised Learning algorithms try to make inferences from data which do not have any historical information. In other words, Unsupervised Learning algorithms use unlabelled data for training the model. Clustering can be performed using a model trained in an Unsupervised manner. Semi-supervised Learning Semi-supervised Learning algorithms are also used for tasks like Classification and Regression, but train with data which have both labelled and unlabelled data. In practice, labelling a data is highly time-consuming and expensive. In such cases, where a small amount of labelled data is available, the model is trained with a collection of datasets which has both labelled and unlabelled data. The model tries to learn using the labelled data and develops further by exploring the unknown patterns in the unlabelled data. Reinforcement Learning Reinforcement Learning is best utilized in the field of Robotics and Navigation systems. The algorithm learns by trial-and-error, through repeated experimentation in the restricted environment. A positive feedback is given by the programmer when the model performs an expected action, and a negative feedback, if not. Such training enables the model to perform well in real time by making more precise decisions.
2 Literature Review The literature on object detection using deep learning is reviewed in this section. There are many papers related to Object Detection techniques in the literature with Convolutional Neural Networks. Object Detection is used in several walks of life like Security, Military, Medical Imaging, Biometric recognition, Iris recognition, Natural Language processing, video analysis, and weather forecasting. Deep Learning Models’ precise image recognition capabilities make them a preferred choice for Object Detection techniques. The publications listed below highlight the research on object detection using deep learning that has been done.
346
U. Dubey and R. Barskar
In order to detect stationary objects in the picture, Waqas Hassan et al. [7] suggested a pixel classification approach that employs segmentation methodology. Then, using a novel method called adaptive edge orientation, these objects are monitored. This technique detects objects with an accuracy of 95% which is an improvement over the state-of-the-art method. Radke et al. [8] surveyed various Image change detection algorithms and discussed the principles for comparing the performance of change detection algorithms. Zhao et al. [9] discuss various Deep Learning-based Object Detection frameworks. This work focuses on refinements that can be applied on Object Detection architectures to improve their performance further. Also, this work gave some promising directions for designing a better Object Detector. Gu et al. [10] studied various CNN models and analysed the advancement of CNNs on several parameters such as activation function, layer design, optimization algorithms, and loss functions. Also, various applications of CNNs like speech, natural language, and computer vision are reviewed. Chandan et al. [11] surveyed several Object Detection algorithms and proposed an algorithm that performs efficient Object Detection without compromising the performance.
3 Object Detection and Tracking Object tracking is the process of identifying the path of an object through time. The trajectory is represented either by the object locations or as the whole region that the object spans in each observation. Object Detection is the subtask of tracking. The object that needs to be tracked should be identified and represented using spatial or temporal features. In the video, the object of interest may be occluded by some other object or obstacles in the scene; the main aim of the object tracker is tracking the accurate position of objects throughout the video even in cases like occlusion, background clutter, deformation, etc. Object Detection and Tracking are closely related problems. Object that needs to be tracked should be detected as a whole first. Object Detection is the initial process of Object Tracking. In a video, the object to be tracked should initially be detected manually or using some features in the initial frames or key frames. Object detection should be performed periodically throughout the video sequence to check whether any new objects arrived or not. Object detection is the process of detecting the object of interest in the image using some position parameters. The detected objects can be represented in videos in two ways. – Boundary of object – Bounding Box In the first way using some positional parameters or feature vectors, the whole boundary of the object is represented and marked. In the second way, a bounding box is fixed in a video in such a way the object is enclosed with it. Mostly for video surveillance [12], the application bounding box method is enough for object
Convolutional Neural Network in Deep Learning for Object Tracking …
347
representation. Some sensitive applications like medical applications require the boundary of object. Object to be detected may be anomalous or specific. So, object detection methods are also classified as 1. Detection of Anomaly 2. Detection of Specific Object. Detection of Anomaly In anomaly detection, the object to be detected is anomalous. So, these methods require knowledge about the background to separate anomaly from the background. Detection of anomaly methods are as follows. Background subtraction methods—Background subtraction methods define a foreground object mask. Initially using basic knowledge about background, the background of the frame is defined and adaptively updated throughout the changes. Adaptive correlation filtering—Correlation filtering is one of the anomaly detection methods, in which the mask or kernel is used to search the image for an object with a similar pattern. The correlation coefficient between the template and image is taken as a weight for similarity in adaptive correlation filtering. Detection of Specific Object It is the method of detecting specific types of objects like humans, cars, vehicles, etc. In this method, object generalization is an important task and ensuring object generalization becomes a crucial task, requiring the analysis of shared properties among particular objects. The descriptors or features that describe the object uniquely play a major role in object detection. The features may be spatial, temporal, and spectral parameters that can uniquely represent the object. The features may be extracted directly from the object like SIFT, SURF, etc., or using deep learning methods like CNN and RCNN.
3.1 Challenges in Object Tracking Here are some of the challenges that researchers and practitioners were dealing with. The main challenges are listed below. Occlusion: When objects are partially or completely occluded by other objects, the tracker needs to accurately predict the object’s position and appearance even when it’s not fully visible. Scale Variation: Objects can change in size due to their distance from the camera or their intrinsic properties. Effective trackers need to handle such variations robustly.
348
U. Dubey and R. Barskar
Non-Rigid Deformation: Objects like animals or humans can undergo non-rigid deformations, making their shapes change significantly. Tracking these objects accurately requires models that can handle deformation. Fast Motion: Objects can move rapidly across frames, leading to motion blur or even disappearing from the field of view. Tracking algorithms need to predict object positions accurately even during fast motion. Illumination Changes: Lighting conditions can change between frames, causing the appearance of the object to vary. Robust tracking algorithms should be able to handle such changes. These are some major challenges that led to the development of various tracking algorithms, including those based on correlation filters, deep learning, particle filtering, and more. Researchers were actively working on addressing these challenges and creating more robust and accurate tracking methods.
4 Convolutional Neural Networks A Convolutional Neural Network is an evolved form of Neocognitron which was first proposed by Kunihiko Fukushima in 1987 [13]. Over the years, neocognitron was developed in various aspects and was referred to by different terms like Crestron [14], Perceptron [15], Multi-Layer Perceptron (MLP) [16], and then Convolutional Neural Network. The drawback of Multi-Layered Perceptron that led to the development of Convolutional Neural Network is, in Multi-Layer Perceptron where every pixel is handled by a separate Perceptron. This increases the complexity in managing the weights and further computations. And another problem with MLP is that it cannot handle spatial invariance in the images. These drawbacks can be handled only if the network is able to learn the pixel correlations. Convolutional Neural Network is designed with this ability of learning pixel correlations. The layers are discussed in detail as follows. The Convolutional Neural Network Architecture forms of hierarchical feed-forward network, with the ability to extract the prominent features from images using special operators called Convolutions. The capability of CNN to directly process image data without much preprocessing is provided by these Convolution operators. Further, the convoluted images are fed into a sequence of layers which perform Dimensionality Reduction and Classification. The several layers of a CNN are represented in Fig. 1.
4.1 Convolution Layer The first layer of the CNN receives the input in the form of raw pixel values and detects visual features like lines, edges, and different colour shades. This layer preserves the
Convolutional Neural Network in Deep Learning for Object Tracking …
349
Fig. 1 Architecture of a convolutional neural network
correlations between pixels by learning them iteratively using small squares of values forming patterns called Kernels or Filters [17]. Convolution layer differs from the layers in a conventional neural network in that the entire picture is divided into tiny areas (like a n x n matrix), and weights and bias are applied over them instead of every pixel (or neuron) being linked to the next layer with a weight and bias. When combined with each tiny region in the input picture, these weights and bias, known as kernels or filters, would produce feature maps. These filters are the basic characteristics that the convolution layer searches for in the input picture. Depth Depth refers to how many filters were utilized in the convolutional layer to convolve with the input picture. In different areas of the input picture, each filter searches for a certain pattern. If n filters are used, n distinct patterns or characteristics are looked for across the picture. The depth and number match in the top layer. Stride Stride refers to the step size at which the convolutional filter moves across the input data (e.g., an image) during the convolution operation. The stride parameter controls how many pixels the filter moves horizontally and vertically at each step. Generally, larger strides lead to smaller output volumes (feature maps), as the filter covers a larger area of the input, effectively reducing the spatial dimensions. Padding Padding refers to the technique of adding extra border pixels (usually filled with zeros) to the input data before applying the convolution operation. The primary purpose of padding is to control the spatial dimensions of the output feature maps that result from the convolutional layers. The reduction in spatial dimensions can be problematic, as it leads to the loss of information from the input data. Padding addresses this issue in two main ways: Preserving spatial dimensions: By adding extra border pixels to the input data, the output feature map’s spatial dimensions can be preserved. Padding ensures that the
350
U. Dubey and R. Barskar
convolutional filter can extend beyond the original borders of the input, allowing the filter to visit all the positions and produce an output. Retaining information at the edges: Padding also helps in retaining information at the edges of the input data. Without padding, the pixels at the borders may not be covered fully by the convolutional filter, leading to the loss of information.
4.2 Pooling Layer Pooling helps to decrease the computational complexity of the network and increase the system’s resistance to changes in the input data. The pooling layer operates independently on each feature map, and its main function is to summarize the information present in small local regions. It does this by sliding a fixed-size window (typically 2 × 2 or 3 × 3) over the input feature map and applying a pooling operation within that window. These are the most typical used pooling operations. Max Pooling In max pooling, the output value for each window is the maximum value found within that window. It effectively retains the most prominent feature present in the region and discards less relevant information. Max pooling helps the network to learn spatial invariance and reduces the spatial dimensions by keeping only the most significant values. Max pooling works by slides a n by n window across and down the input with a stride value s, taking the largest value in the n × n area at each point to minimize the input size. The two hyperparameters in this layer are the filter size ‘n’ and stride size’s’. Average Pooling Another method of pooling is average pooling, in which the average of all the elements in the filter focused area is computed and given as output. The drawback of this method is it considers even the lower magnitude elements, which in turn affects the stronger activations caused by some nonlinear activation units. Stochastic Pooling The drawbacks of max pooling and average pooling have been met with stochastic pooling [18]. For a small n × n region, the normalized activations are computed. Then, the probability of every region is calculated and from the multinomial distribution, samples are drawn for each region based on the probability p at any location l.
Convolutional Neural Network in Deep Learning for Object Tracking …
351
Table 1 Parameters and hyperparameters of CNN Layers
Parameters
Hyperparameters
Convolution
Kernels
Kernel size and number, activation function, stride, and padding values are all variables to consider
Pooling
None
Stride and padding values, pooling mechanism, and filter size are all factors to consider
Fully connected
Weights
Number of weights and activation function
Others
–
The model’s learning rate, epoch, weight initialization minibatch, and dataset splitting values, as well as the kind of loss function, regularization, and optimizer it uses
4.3 Fully Connected Layer This layer, also known as a dense layer, is a fundamental building block in artificial neural networks, particularly in deep learning models. It is one of the simplest and most common types of layers used in many neural network architectures. The pooled feature map is transformed into a one-dimensional vector by this layer. The connections between neurons are represented by weights, and each neuron also has a bias term. During the forward pass of the network, the input data is multiplied by the weights, and the biases are added to the result for each neuron. Table 1 below represents the parameters for CNNs.
5 Conclusion Deep learning has the major benefit of being able to independently find significant characteristics in large dimensional data. This work has clarified the fundamental principles of CNN, a deep learning technique used to address a variety of challenging issues. The CNN general model has been discussed in this paper. As a popular method for categorization based on contextual data, CNN has developed. The difficulties associated with pixel-by-pixel classification have been resolved because of its extraordinary capacity for learning contextual variables. The number of parameters needed is also greatly decreased. The categorization of high-resolution data using visual object tracking, picture classification, traffic sign recognition, and audio scene is a common use for CNN. Researchers that desire to explore this topic will benefit greatly from this study’s comprehensive insight. It will serve as a resource for students, researchers, and anybody with an interest in this area.
352
U. Dubey and R. Barskar
6 Future Scope Although the current generation CNN for object detection studied in this research work is robust in many of the conditions, it can be made more robust by extending this work to the following areas like Time of Processing: Fine-tuning the optimum value of the hyperarameters more. Depending on the size of the dataset, this process takes a lot of time. When the dataset is large in size, the time taken for calculation is longer and vice versa can be reduced. Computational Requirement: A large amount of computing is required in order to perform the fine-tuning process. A high-end computer with minimum 16 or 32 GB of RAM and a dedicated graphics processor is generally used for Deep Learning. When the size of the dataset is larger, the required computing facility is much higher. The processing of high-resolution images is also a challenging task, requiring more computational effort as the Prediction Accuracy depends on the image resolution so designing a system which requires less computing resource is desirable.
References 1. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Proceedings of the conference on computer vision and pattern recognition, pp 770–778 2. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–391 3. LeCun Y, Bottou L, Bengio Y, Ha_ner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 4. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, (ICLR), pp 1–14 5. Goodfellow I, Bengio Y, Courville A (2016) Machine learning basics. Deep Learning 1:98–164 6. Shanmuganathan S, Samarasinghe S (2016) Artificial neural network modelling: an introduction. Springer, vol 65 7. Hassan W, Birch P, Mitra B, Bangalore N, Young R, Chatwin C (2013) Illumination invariant stationary object detection. IET Comput Vision 7(1):1–8 8. Radke RJ, Andra S, Al-Kofahi O, Roysam B (2005) Image change detection algorithms: a systematic survey. IEEE Trans Image Process 14(3):294–307 9. Zhao ZQ, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. arXiv preprint arXiv: 1807.05511 10. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, Chen T (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377 11. Chandan G, Jain A, Jain H, Mohana (2018) Real time object detection and tracking using deep learning and OpenCV. 2018 International conference on inventive research in computing applications (ICIRCA), Coimbatore, pp 1305–1308 12. Berg A, Ahlberg J, Felsberg M (2016) Channel coded distribution field tracking for thermal infrared imagery. Proceedings IEEE conference computer vision, pattern recognition. Workshops, pp 1248–1256 13. Fukushima K (1988) Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw 1(2):119–130 14. Weng JJ et al (1997) Learning recognition and segmentation using the cresceptron. Int J Comput Vis 25(2):109–143
Convolutional Neural Network in Deep Learning for Object Tracking …
353
15. Freund Y et al (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296 16. Hastie T et al (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media 17. B¨aumer B, Lumer G, Neubrander F (1999) Convolution kernels and generalized functions. Chapman and Hall CRC research notes in mathematics, pp 68–78 18. Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. 1st international conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, Conference Track Proceedings