159 101 35MB
English Pages 1163 [1128] Year 2023
Lecture Notes in Networks and Systems 757
G. Ranganathan George A. Papakostas Álvaro Rocha Editors
Inventive Communication and Computational Technologies Proceedings of ICICCT 2023
Lecture Notes in Networks and Systems Volume 757
Series Editor Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
G. Ranganathan · George A. Papakostas · Álvaro Rocha Editors
Inventive Communication and Computational Technologies Proceedings of ICICCT 2023
Editors G. Ranganathan Department of Electronics and Communication Engineering Gnanamani College of Technology Namakkal, Tamil Nadu, India
George A. Papakostas Department of Computer Science (HUMAIN-Lab) International Hellenic University Thessaloniki, Greece
Álvaro Rocha Information Systems and Operations Management (ISEG) University of Lisbon Lisbon, Portugal
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-5165-9 ISBN 978-981-99-5166-6 (eBook) https://doi.org/10.1007/978-981-99-5166-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023, corrected publication 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
This book is dedicated to all the young research aspirants in computing and communication field and to all the participants, technical committee members and guest editors of ICICCT 2023.
Preface
The Seventh International Conference on Inventive Communication and Computational Technologies (ICICCT 2023) was held on Gnanamani College of Technology, Namakkal, India, during 22–23, May, 2023. ICICCT 2023 aims to cover the recent advancement and trends in the area of communication and computational technologies to facilitate knowledge sharing and networking interactions on emerging trends and new challenges. ICICCT 2023 tends to collect the latest research results and applications on Data Communication and Computer Networking, Software Engineering, Wireless communication, VLSI Design and Automation, Networking, Internet of Things, Cloud and Big Data. It includes a selection of 75 papers from 336 papers submitted to the conference from universities and industries all over the world. All of the accepted papers were subjected to strict peer-reviewing by 2–4 expert referees. The papers have been selected for this volume because of the quality and the relevance to the conference. ICICCT 2023 would like to express its sincere appreciation to all authors for their contributions to this book. We would like to extend our thanks to all the referees for their constructive comments on all papers and our keynote speaker Dr. R. Dhaya, Department of Computer Engineering, King Khalid University, Kingdom of Saudi
vii
viii
Preface
Arabia; especially, we would like to thank to organizing committee for their hard working. Finally, we would like to thank the Springer publications for producing this volume. Dr. G. Ranganathan Dean, Department of Electronics and Communication Engineering Gnanamani College of Technology Namakkal, Tamil Nadu, India Dr. George A. Papakostas Professor, Department of Computer Science (HUMAIN-Lab) International Hellenic University Thessaloniki, Greece Dr. Álvaro Rocha University of Lisbon Lisbon, Portugal
Contents
Mitigating Vanishing Gradient in SGD Optimization in Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aswathy Ravikumar and Harini Sriraman
1
A Comparative Analysis of Heart Disease Diagnosis with Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Le Thi Thanh and Dang N. H. Thanh
13
The Impact of Information System and Technology of Courier Service During Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andre Timiko Siahaan, Bagas Rizkyka Pinajung, Kennidy, Ford Lumban Gaol, and Tokuro Matsuo Event Detection in Social Media Analysis: A Survey . . . . . . . . . . . . . . . . . G. Akiladevi, M. Arun, and J. Pradeepkandhasamy Artificial Intelligence Mechanism to Predict the Effect of Bone Mineral Densıty in Endocrıne Diseases—A Review . . . . . . . . . . . . . . . . . . Vivek Duraivelu, S. Deepa, R. Suguna, M. S. Arunkumar, P. Sathishkumar, and S. Aswinraj
27
39
55
Evaluation of Deep Learning CNN Models with 24 Metrics Using Soybean Crop and Broad-Leaf Weed Classification . . . . . . . . . . . . J. Justina Michael and M. Thenmozhi
71
Bluetooth Controlled Integrated Robotic Arm with Temperature and Moisture Sensor Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. C. Sriharipriya, R. Shivani, K. Sai Ragadeep, and N. Sangeetha
89
Image Dehazing Using Generic Model Agnostic Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gurditya Khurana, Rohan Garodia, and P. Saranya
105
ix
x
Contents
A Novel Approach to Build Privacy and Trust in Vehicle Sales Using DID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aju Mathew Thomas, K. V. Lakshmy, R. Ramaguru, and P. P. Amritha Bipolar Disease Data Prediction Using Adaptive Structure Convolutional Neuron Classifier Using Deep Learning . . . . . . . . . . . . . . . M. Ramkumar, P. Shanmugaraja, B. Dhiyanesh, G. Kiruthiga, V. Anusuya, and B. J. Bejoy Oceanographic and Hydrological Study of the Moroccan Atlantic Coast: Focus on the Upwelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kamal Hammou Ali, Aziz Bourass, Khalid Fariri, Jalal Ettaki, Thami Hraira, Khalid Doumi, Sima Boulebatt, Manal Maaroufi, Ahmed Talbaoui, Ali Srairi, Abdelmajid Dridi, Khadija Elkharrim, and Driss Belghyti
117
131
145
The Acceptance of Artificial Intelligence in the Commercial Use of Crypto-Currency and Blockchain Systems . . . . . . . . . . . . . . . . . . . . . . . . Mkik Marouane, Mkik Salwa, and Ali Hebaz
163
Deep Hybrid Model with Trained Weights for Multimodal Sarcasm Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dnyaneshwar Bavkar, Ramgopal Kashyap, and Vaishali Khairnar
179
InsuraChain: A Blockchain-Based Parametric Health Insurance Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Soham Panini, Ashish Anand, Rithika Pai, V. Parimala, and Shruti Jadon Automating Audio Attack Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chaitanya Singh, Mallika Sirdeshpande, Mohammed Haris, and Animesh Giri Child Detection by Utilizing Touchscreen Behavioral Biometrics on Mobile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sunil Mane, Kartik Mandhan, Mudit Bapna, and Atharv Terwadkar Hand Gesture-Controlled Wheelchair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minal Patil, Abhishek Madankar, Roshan Umate, Sumiti Gunjalwar, Nandini Kukde, and Vaibhav Jain Decentralized Evidence Storage System Using Blockchain and IPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neeraj Salunke, Swapnil Sonawane, and Dilip Motwani Investigation of DCF Length and Input Power Selection for Optical Transmission Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manjit Singh, Butta Singh, Himali Sarangal, Vinit Grewal, and Satveer Kour
195
211
227 245
259
281
Contents
xi
Resource Optimization with Digital Twins Using Intelligent Techniques for Smart Healthcare Management . . . . . . . . . . . . . . . . . . . . . . Sreekanth Rallapalli, M. R. Dileep, and A. V. Navaneeth
295
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile Sink in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . Nami Susan Kurian and B. Rajesh Shyamala Devi
307
A Machine Learning-Based Approach for Classifying Socially Isolated Individuals in a Pandemic Context . . . . . . . . . . . . . . . . . . . . . . . . . Md Ulfat Tahsin, Sarah Jasim, and Intisar Tahmid Naheen
327
Simulation on Natural Disaster Fire Accident Evacuation Using Augmented Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. A. Senthil, V. Mathumitha, R. Prabha, Su. Suganthi, and Manjunathan Alagarsamy
343
Verifi-Chain: A Credentials Verifier Using Blockchain and IPFS . . . . . . Tasfia Rahman, Sumaiya Islam Mouno, Arunangshu Mojumder Raatul, Abul Kalam Al Azad, and Nafees Mansoor
361
IOT-Based Cost-Effective Solar Energy Monitoring System . . . . . . . . . . Abhishek Madankar, Minal Patil, and Shital Telrandhe
373
Investigation of Assessment Methodologies in Information Security Risk Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Rajathi and P. Rukmani
385
TSCH Scheduling Mechanism Based on the Network’s Throughput with Dynamic Power Allocation and Slot-Frame Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Niaz Morshedul Haque, Nakib Ahmed Sami, and Abdus Shahid Al Masruf
401
Movie Recommendation System Using Hybrid Approach . . . . . . . . . . . . Nidhi Bharatiya, Shatakshi Bhardwaj, Kartik Sharma, Pranjal Kumar, and Jeny Jijo Highly Correlated Linear Discriminant Analysis for Dimensionality Reduction and Classification in Healthcare Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Rajeashwari and K. Arunesh Identification of Brain Tumor Images Using a Novel Machine Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y. Mahesha A Study on Real-Time Vehicle Speed Measurement Techniques . . . . . . . Prasant Kumar Sahu and Debalina Ghosh
415
431
447 459
xii
Contents
Deep Learning with Attention Mechanism for Cryptocurrency Price Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Yazhini, M. Nimal Madhu, B. Premjith, and E. A. Gopalakrishnan
471
Deep Learning-Based Continuous Glucose Monitoring with Diabetic Prediction Using Deep Spectral Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. Kiruthiga, L. Shakkeera, A. Asha, B. Dhiyanesh, P. Saraswathi, and M. Murali
485
Performance Analysis of DeeplabV3+ Using State-of-the-Art Encoder Architectures for Waterbody Segmentation in Remote Sensing Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Adarsh, V. Sowmya, Ramesh Sivanpillai, and V. V. Sajith Variyar An Exploratory Comparison of LSTM and BiLSTM in Stock Price Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nguyen Q. Viet, Nguyen N. Quang, Nguyen King, Dinh T. Huu, Nguyen D. Toan, and Dang N. H. Thanh Forecasting Intraday Stock Price Using Attention Mechanism and Variational Mode Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Arul Goutham, B. Premjith, M. Nimal Madhu, and E. A. Gopalakrishnan Decoding the Twitter Sentiment Using Artificial Intelligence Tools: A Study on Tokyo Olympics 2020 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Priya Sachdeva and Archan Mitra
499
513
525
539
Fast and Accurate YOLO Framework for Live Object Detection . . . . . . R. R. Ajith Babu, H. M. Dhushyanth, R. Hemanth, M. Naveen Kumar, B. A. Sushma, and B. Loganayagi
555
Chatbot-Based E-System for Animal Husbandry with E-Farming . . . . . Aishwary Sanjay Gattani, Shubham Sunil Kasar, Om Chakane, and Pratiksha Patil
569
Mood Classification of Bangla Songs Based on Lyrics . . . . . . . . . . . . . . . . Maliha Mahajebin, Mohammad Rifat Ahmmad Rashid, and Nafees Mansoor
585
A Machine Learning and Deep Learning Approach for Accurate Crop-Type Mapping Using Sentinel-1 Satellite Data . . . . . . . . . . . . . . . . . Sanjay Madaan and Sukhjeet Kaur
599
Size Matters: Exploring the Impact of Model Architecture and Dataset Size on Semantic Segmentation of Abdominal Wall Muscles in CT Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ankit P. Bisleri, Anvesh S. Kumar, Akella Aditya Bhargav, Surapaneni Varun, and Nivedita Kasturi
613
Contents
Proactive Decision Making for Handover Management on Heterogeneous Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Priyanka and C. Chandrasekar A Comprehensive Review on Fault Data Injection in Smart Grid . . . . . . D. Prakyath, S. Mallikarjunaswamy, N. Sharmila, V. Rekha, and S. Pooja JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Salah Uddin and Md Yeasin Munsi Utilizing Deep Learning Methodology to Classify Diabetic Retinopathy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vivek Kumar Prasad, Ved Nimavat, Kaushha Trivedi, and Madhuri Bhavsar Mapping the Literature of Artificial Intelligence in Medical Education: A Scientometric Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fairuz Iqbal Maulana, Muhammad Yasır Zaın, Dian Lestari, Agung Purnomo, and Puput Dani Prasetyo Adi Development of a Blockchain-Based On-Demand Lightweight Commodity Delivery System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bayezid Al Hossain Onee, Kaniz Fatema Antora, Omar Sharif Rajme, and Nafees Mansoor Flower Disease Detection Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vemparala Vigna Sri, Gullapalli Angel, and Yalamanchili Manasa Chowdary Detection of Natural Disasters Using Machine Learning and Computer Vision by Replacing the Need of Sensors . . . . . . . . . . . . . . Jacob Bosco, Lavanya Yavagal, Lohith T. Srinivas, Manoj Kumar Katabatthina, and Nivedita Kasturi
xiii
629 649
661
679
693
705
721
735
Prediction of Blood Pressure and Diabetes with AI Techniques—A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. R. Ashisha and X. Anitha Mary
749
Comparative Analysis of Various Machine Learning Algorithms to Detect Cyberbullying on Twitter Dataset . . . . . . . . . . . . . . . . . . . . . . . . . Milind Shah, Avani Vasant, and Kinjal A. Patel
761
Machine Learning for Perinatal Complication Prediction: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dian Lestari, Fairuz Iqbal Maulana, Satria Fadil Persada, and Puput Dani Prasetyo Adi
789
xiv
Contents
Multi-modal Biometrics’ Template Preservation and Individual Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Nithya and P. Sripriya
805
Emotion Recognition Through Physiological Signals and Brain Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disha Shah and Rashmi Rane
821
Uncovering Deception: A Study on Machine Learning Techniques for Fake News Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohd Faiz Zafar, Nikhil Rawat, Rishant Mishra, Purnendu Shekhar Pandey, and Naresh Kshetri
837
A Process Model for Intelligent Analysis and Normalization of Academic and Educational Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mariya Zhekova
855
Virtual Machine Consolidation Techniques to Reduce Energy Consumption in Cloud Data Centers: A Survey . . . . . . . . . . . . . . . . . . . . . Pankaj Jain and Sanjay Kumar Sharma
873
Machine Learning-Based Human Body Mass Index Prediction Using Facial Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eli Yaswanth Kalyan, Raparthi Akshay, and P. Selvi Rajendran
887
Performance Evaluation of P&O and PSO-Based MPPT for Wind Energy Conversion Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kannan Kaliappan, Ravichandran Sekar, G. Ramesh, and S. Saravanakarthi A Novel Algorithm for Genomic STR Mining and Phylogeny Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Uddalak Mitra, Soumya Majumder, and Sayantan Bhowmick Comparative Analysis and Evaluation of Pothole Detection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Medha Wyawahare, Nayan Chaure, Dhairyashil Bhosale, and Ayush Phadtare
899
911
925
Selective Kernel Networks for Lung Abnormality Diagnosis Using Chest X-rays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Divith Phogat, Dilip Parasu, Arun Prakash, and V. Sowmya
937
A Natural Language Processing Technique to Identify Exaggerated News Titles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tshephisho Joseph Sefara and Mapitsi Roseline Rangata
951
Contents
Pareto Optimization Technique for Protein Motif Detection in Genomic Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anooja Ali, H. V. Ramachandra, A. Meenakshi Sundaram, A. Ajil, and Nithin Ramakrishnan Communicable Disease Prediction Using Machine Learning and Deep Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nalin M. Rajendran, M. Karthikeyan, B. Karthik Raja, K. Pragadishwaran, E. A. Gopalakrishnan, and V. Sowmya Novel Visual Effects in Computer Vision of Animation Film Based on Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. SivaKrishna Reddy, M. Kathiravan, and V. Lokeswara Reddy
xv
963
979
993
Performance Evaluation of SDN Controllers . . . . . . . . . . . . . . . . . . . . . . . . 1009 Deepjyot Kaur Ryait and Manmohan Sharma An Ensemble for Satellite Image to Map Layout Translation . . . . . . . . . . 1023 Medha Wyawahare, Soham Dasare, Akash Bhadange, and Resham Bhattad How is Robotic Process Automation Revolutionising the Way Healthcare Sector Works? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037 Jaspreet Kaur A Survey of Visible and Infrared Image Fusion Methodologies . . . . . . . . 1057 Sejal Chaudhari, Grishma Deshmukh, Sai Gokhale, Raahi Kadu, Sunita Jahirabadkar, and R. Aditya Efficient Rent Price Prediction Model for the Development of a House Marketplace Website by Analyzing Various Regression-Based Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . 1071 Ojas Saraswat and N. Arunachalam Human–Computer Interactions with Artificial Intelligence and Future Trends of HCI—A Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089 S. Muthumari, G. Sangeetha, S. Venkata Lakshmi, and J. Suganthi Future Worth: Predicting Resale Values with Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1101 M. Karuppasamy, M. Prabha, and M. Jansi Rani Smart Digital E-portal for Indian Farmers . . . . . . . . . . . . . . . . . . . . . . . . . . 1113 Deepak Mane, Sunil Sangve, Atharv Halmadge, Atul Takale, Nikhil Choubhare, and Adesh Shinde Correction to: A Comparative Analysis of Heart Disease Diagnosis with Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . Le Thi Thanh and Dang N. H. Thanh
C1
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1129
Editors and Contributors
About the Editors Dr. G. Ranganathan working as a professor and head in Gnanamani College of Technology, Namakkal, India. He has done his Ph.D. in the Faculty of Information and Communication Engineering from Anna University, Chennai, in the year 2013. His research thesis was in the area of biomedical signal processing. He has total of 29+ years of experience both in industry, teaching and research. He has guided several project works for many UG and PG Students in the areas of biomedical signal processing. He has published more than 35 research papers in international and national journals and conferences. He has also co-authored many books in electrical and electronics subjects. He has served as a referee for many reputed international journals published by Elsevier, Springer, Taylor and Francis, etc. He has membership in various professional bodies like ISTE, IAENG, etc. Dr. George A. Papakostas received a diploma in Electrical and Computer Engineering in 1999 and an M.Sc. and Ph.D. degree in Electrical and Computer Engineering in 2002 and 2007, respectively, from the Democritus University of Thrace (DUTH), Greece. Dr. Papakostas is a Tenured Full Professor in the Department of Computer Science, at International Hellenic University, Greece. Dr. Papakostas has 10 years of experience in large-scale systems design, as a senior software engineer and technical manager in INTRACOM TELECOM S.A. and INTALOT S.A. companies. Currently, he is the Director of the “MLV Research Group” (Machine Learning and Vision Research Group). He has (co)authored more than 200 publications in indexed journals, international conferences and book chapters, 1 book (in Greek), 2 edited books and 5 journal special issues. His publications have more than 3000 citations with h-index 32 (Google Scholar). His research interests include machine learning, computer/machine vision, pattern recognition, and computational intelligence. Dr. Papakostas served as a reviewer in numerous journals and conferences, and he is a member of the IAENG, MIR Labs, EUCogIII and the Technical Chamber of Greece (TEE).
xvii
xviii
Editors and Contributors
Dr. Álvaro Rocha holds Habilitation in Information Science, Ph.D. in Information Systems and Technologies, M.Sc. in Information Management and BCS in Computer Science. He is a professor of Information Systems and Software Engineering at University of Coimbra, a researcher at Centre for Informatics and Systems of the University of Coimbra (CISUC) and a collaborator researcher at Laboratory of Artificial Intelligence and Computer Science (LIACC) and at Center for Research in Health Technologies and Information Systems (CINTESIS). His main research interests are Information Systems Planning and Management, Maturity Models, information systems quality, online service quality, intelligent information systems, software engineering, e-government, e-health and information technology in education.
Contributors S. Adarsh Center for Computational Engineering and Networking (CEN), Coimbatore, India Puput Dani Prasetyo Adi Telecommunication Research Research and Innovation Agency, Jakarta, Indonesia
Center,
National
R. Aditya U. R. Rao Satellite Centre (URSC), Bengaluru, India A. Ajil School of CSE, REVA University, Bengaluru, India R. R. Ajith Babu Department of CSE, S.E.A College of Engineering and Technology, Bangalore, Karnataka, India G. Akiladevi Department of Computer Applications, Kalasalingam Academy of Research and Education, Krishnankoil, India Raparthi Akshay Hindustan Institute of Technology and Science, Chennai, India Abul Kalam Al Azad North South University, Dhaka, Bangladesh Abdus Shahid Al Masruf Department of EEE, Leading University, Sylhet, Bangladesh Manjunathan Alagarsamy Department of Electronics and Communication Engineering, K. Ramakrishnan College of Technology, Trichy, India Anooja Ali School of CSE, REVA University, Bengaluru, India Kamal Hammou Ali Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco P. P. Amritha TIFAC-CORE in Cyber Security, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Ashish Anand Computer Science Engineering, PES University, Banashankari Stage III, Bangalore, Karnataka, India
Editors and Contributors
xix
Gullapalli Angel Department of CSE, VR Siddhartha Engineering College, Vijayawada, India X. Anitha Mary Department of Robotics Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India Kaniz Fatema Antora University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh V. Anusuya CSE, Ramco Institute of Technology, Rajapalayam, India R. Arul Goutham Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India M. Arun Faculty of Computer Applications, Kalasalingam Academy of Research and Education, Krishnankoil, India N. Arunachalam Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Chennai, India K. Arunesh Department of Computer Science, Sri S. Ramasamy Naidu Memorial College, (Affiliated to Madurai Kamaraj University), Sattur, Tamil N¯adu, India M. S. Arunkumar Vel Tech Rangarajan Dr, Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India A. Asha ECE, Rajalakshmi Engineering College, Chennai, Tamil Nadu, India G. R. Ashisha Department of Electronics and Instrumentation Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India S. Aswinraj Department of Computer Technology, Kongu Engineering College, Perundurai, Tamil Nadu, India Mudit Bapna Department of Computer Engineering, College of Engineering, Pune, India Dnyaneshwar Bavkar Department of Computer Science and Engineering, Amity University, Raipur, Chhattisgarh, India B. J. Bejoy CSE, CHRIST (Deemed to be University), Bangalore, India Driss Belghyti Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco Akash Bhadange Deparment of Electronics and Telecomunnication Engineering, Vishwakarma Institute of Technology, Pune, India Nidhi Bharatiya PES University, Bangalore, India Shatakshi Bhardwaj PES University, Bangalore, India Akella Aditya Bhargav Department of Computer Science and Engineering, PES University, Bangalore, India
xx
Editors and Contributors
Resham Bhattad Deparment of Electronics and Telecomunnication Engineering, Vishwakarma Institute of Technology, Pune, India Madhuri Bhavsar Nirma University, Ahmedabad, Gujarat, India Dhairyashil Bhosale Department of Electronics and Telecommunication Engineering, VIT, Pune, India Sayantan Bhowmick Siliguri Institute of Technology, Darjeeling, India Ankit P. Bisleri Department of Computer Science and Engineering, PES University, Bangalore, India Jacob Bosco PES University, Bangalore, India Sima Boulebatt Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco Aziz Bourass Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco Om Chakane Department of Computer Engineering, Vishwakarma Institute of Technology (Affiliated to SPPU), Pune, India C. Chandrasekar Department of Computer Science, Periyar University, Salem, India Sejal Chaudhari MKSSS’s Cummins College of Engineering for Women, Pune, India Nayan Chaure Department of Electronics and Telecommunication Engineering, VIT, Pune, India Nikhil Choubhare JSPM’s Rajarshi Shahu College of Engineering, Pune, Maharashtra, India Yalamanchili Manasa Chowdary Department of CSE, VR Siddhartha Engineering College, Vijayawada, India Soham Dasare Deparment of Electronics and Telecomunnication Engineering, Vishwakarma Institute of Technology, Pune, India S. Deepa Department of Computer Technology, Kongu Engineering College, Perundurai, Tamil Nadu, India Grishma Deshmukh MKSSS’s Cummins College of Engineering for Women, Pune, India B. Dhiyanesh CSE, Dr. N.G.P. Institute of Technology, Coimbatore, Tamil Nadu, India H. M. Dhushyanth Department of CSE, S.E.A College of Engineering and Technology, Bangalore, Karnataka, India
Editors and Contributors
xxi
M. R. Dileep Department of Master of Computer Applications, Nitte Meenakshi Institute of Technology, Bengaluru, Karnataka, India Khalid Doumi Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco Abdelmajid Dridi I. N. R. H, Bd Sidi Abderrahmane Ain Diab Casablanca, Casablanca, Morocco Vivek Duraivelu Department of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Telangana, India Khadija Elkharrim Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco Jalal Ettaki Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco Khalid Fariri Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco Ford Lumban Gaol Computer Science Department, BINUS Graduate Program— Doctor of Computer Science, Bina Nusantara University, Jakarta, Indonesia Rohan Garodia Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India Aishwary Sanjay Gattani Department of Computer Engineering, Vishwakarma Institute of Technology (Affiliated to SPPU), Pune, India Debalina Ghosh School of Electrical Sciences, IIT Bhubaneswar, Arugul, Jatni, India Animesh Giri PES University, Bengaluru, India Sai Gokhale MKSSS’s Cummins College of Engineering for Women, Pune, India E. A. Gopalakrishnan Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bangalore, India; Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Vinit Grewal Department of Engineering and Technology, Guru Nanak Dev University, Jalandhar, India Sumiti Gunjalwar Y. C. College of Engineering, Nagpur, India Atharv Halmadge JSPM’s Rajarshi Shahu College of Engineering, Pune, Maharashtra, India Ali Hebaz University Chaouaïb Doukkali, National School of Commerce and Management, El Jadida, Morocco
xxii
Editors and Contributors
R. Hemanth Department of CSE, S.E.A College of Engineering and Technology, Bangalore, Karnataka, India Thami Hraira Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco Dinh T. Huu College of Technology and Design, University of Economics Ho Chi Minh City (UEH), Ho Chi Minh City, Vietnam Shruti Jadon PES University, Banashankari Stage III, Bangalore, Karnataka, India Sunita Jahirabadkar MKSSS’s Cummins College of Engineering for Women, Pune, India Pankaj Jain Banasthali Vidyapith, Newai, Rajasthan, India Vaibhav Jain Y. C. College of Engineering, Nagpur, India M. Jansi Rani Mepco Schlenk Engineering College, Sivakasi, Tamil Nadu, India Sarah Jasim Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh Jeny Jijo PES University, Bangalore, India J. Justina Michael Department of Computer Science and Engineering, SRMIST, Chennai, India Raahi Kadu MKSSS’s Cummins College of Engineering for Women, Pune, India Kannan Kaliappan Department of EEE, Sreenidhi Institute of Science and Technology (SNIST) Yamnampet, Ghatkesar Hyderabad, Telangana, India Eli Yaswanth Kalyan Hindustan Institute of Technology and Science, Chennai, India B. Karthik Raja Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India M. Karthikeyan Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India M. Karuppasamy Kalasalingam Academy of Research and Education, Krishnankoil, India Shubham Sunil Kasar Department of Computer Engineering, Vishwakarma Institute of Technology (Affiliated to SPPU), Pune, India Ramgopal Kashyap Department of Computer Science and Engineering, Amity University, Raipur, Chhattisgarh, India Nivedita Kasturi Department of Computer Science and Engineering, PES University, Bangalore, India Manoj Kumar Katabatthina PES University, Bangalore, India
Editors and Contributors
xxiii
M. Kathiravan Department of Computer Science Engineering, Hindustan Institute of Technology and Science, Padur, Kelambaakkam, Chengalpattu, India Jaspreet Kaur University School of Business, Chandigarh University, Mohali, India Sukhjeet Kaur Department of Computer Science, Punjabi University, Patiala, India Kennidy School of Information System, Bina Nusantara University, Alam Sutra, Tangerang, Indonesia Vaishali Khairnar Department of Information Technology, Terna Engineering College, Nerul, Navi, Mumbai, India Gurditya Khurana Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India Nguyen King College of Technology and Design, University of Economics Ho Chi Minh City (UEH), Ho Chi Minh City, Vietnam G. Kiruthiga CSE, IES College of Engineering, Thrissur, Kerala, India Satveer Kour Department of CET, Guru Nanak Dev University, Amritsar, India Naresh Kshetri Department of Math, CS & IT, Lindenwood University, St. Charles, MO, USA Nandini Kukde Y. C. College of Engineering, Nagpur, India Anvesh S. Kumar Department of Computer Science and Engineering, PES University, Bangalore, India Pranjal Kumar PES University, Bangalore, India Nami Susan Kurian Hindustan Institute of Technology and Science, Chennai, India K. V. Lakshmy TIFAC-CORE in Cyber Security, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Dian Lestari Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia B. Loganayagi Department of CSE, S.E.A College of Engineering and Technology, Bangalore, Karnataka, India V. Lokeswara Reddy Department of Computer Science Engineering, K.S.R.M. College of Engineering (Autonomous), Krishnapuramu, Kadapa, Y.S.R (District), Andhra Pradesh, India Manal Maaroufi Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco Sanjay Madaan Computer Science and Engineering (UCOE), Punjabi University, Patiala, India
xxiv
Editors and Contributors
Abhishek Madankar Department of ET Engineering, Y. C. College of Engineering, Nagpur, India Maliha Mahajebin University of Liberal Arts Bangladesh, Dhaka, Bangladesh Y. Mahesha ATME College of Engineering, Mysore, Karnataka, India Soumya Majumder Siliguri Institute of Technology, Darjeeling, India S. Mallikarjunaswamy Electronics and Communication Engineering, JSS Academy of Technical Education, Bengaluru, Karnataka, India Kartik Mandhan Department of Computer Engineering, College of Engineering, Pune, India Deepak Mane Vishwakarma Institute of Technology, Pune, Maharashtra, India Sunil Mane Department of Computer Engineering, College of Engineering, Pune, India Nafees Mansoor Computer Science and Engineering, University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh Mkik Marouane University Mohamed V, FSJES Souissi, Souissi, Rabat, Morocco V. Mathumitha Department of Information Technology, Agni College of Technology, Chennai, India Tokuro Matsuo Advanced Institute of Industrial Technology, Tokyo, Japan Fairuz Iqbal Maulana Computer Science Department, Bina Nusantara University, Jakarta, Indonesia A. Meenakshi Sundaram School of CSE, REVA University, Bengaluru, India Rishant Mishra Department of Computer Science Engineering, KIET Group of Institutions, Ghaziabad, India Archan Mitra School of Media Studies, Presidency University, Bangalore, India Uddalak Mitra Siliguri Institute of Technology, Darjeeling, India Mohammed Haris PES University, Bengaluru, India Dilip Motwani Vidyalankar Institute of Technology, Mumbai, India Sumaiya Islam Mouno Computer Science and Engineering, University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh Md Yeasin Munsi Computer Science and Engineering Department, East West University, Dhaka, Bangladesh M. Murali Sona College of Technology, Salem, Tamil Nadu, India S. Muthumari S. S. Duraisamy Nadar Mariammal College, Kovilpatti, Tamil Nadu, India
Editors and Contributors
xxv
Intisar Tahmid Naheen Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh A. V. Navaneeth Department of Master of Computer Applications, Nitte Meenakshi Institute of Technology, Bengaluru, Karnataka, India M. Naveen Kumar Department of CSE, S.E.A College of Engineering and Technology, Bangalore, Karnataka, India Md. Niaz Morshedul Haque Department of EEE, Bangladesh Army International University of Science and Technology (BAIUST), Cumilla, Bangladesh M. Nimal Madhu Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Ved Nimavat Nirma University, Ahmedabad, Gujarat, India B. Nithya Department of Computer Science, New Prince Shri Bhavani Arts and Science College, Medavakkam, Chennai, Tamil Nadu, India Bayezid Al Hossain Onee University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh Rithika Pai Computer Science Engineering, PES University, Banashankari Stage III, Bangalore, Karnataka, India Soham Panini Computer Science Engineering, PES University, Banashankari Stage III, Bangalore, Karnataka, India Dilip Parasu Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India; Amrita School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Coimbatore, India V. Parimala Computer Science Engineering, PES University, Banashankari Stage III, Bangalore, Karnataka, India Kinjal A. Patel Faculty of Computer Applications and Information Technology, Gujarat Law Society University, Ahmedabad, Gujarat, India Minal Patil Department of ET Engineering, Y. C. College of Engineering, Nagpur, India Pratiksha Patil Department of Computer Engineering, Vishwakarma Institute of Technology (Affiliated to SPPU), Pune, India Satria Fadil Persada Entrepreneurship Department, BINUS Business School Undergraduate Program, Bina Nusantara University, Jakarta, Indonesia Ayush Phadtare Department of Electronics and Telecommunication Engineering, VIT, Pune, India
xxvi
Editors and Contributors
Divith Phogat Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India; Amrita School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Coimbatore, India Bagas Rizkyka Pinajung School of Information System, Bina Nusantara University, Alam Sutra, Tangerang, Indonesia S. Pooja Department of Electronics and Communication Engineering, KS Institute of Technology, Bengaluru, India M. Prabha Velammal College of Engineering and Technology, Madurai, India R. Prabha Deptartment of Electronics and Communication Engineering, Sri Sai Ram Institute of Technology, Chennai, India J. Pradeepkandhasamy Faculty of Computer Applications, Academy of Research and Education, Krishnankoil, India
Kalasalingam
K. Pragadishwaran Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India Arun Prakash Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India; Amrita School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Coimbatore, India D. Prakyath Department of Electrical and Electronics Engineering, SJB Institute of Technology, Bengaluru, Karnataka, India Vivek Kumar Prasad Nirma University, Ahmedabad, Gujarat, India B. Premjith Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India A. Priyanka Department of Computer Science, Periyar University, Salem, India Agung Purnomo Entrepreneurship Department, BINUS Business School Undergraduate Program, Bina Nusantara University Jakarta, Jakarta, Indonesia Nguyen N. Quang College of Technology and Design, University of Economics Ho Chi Minh City (UEH), Ho Chi Minh City, Vietnam Arunangshu Mojumder Raatul Computer Science and Engineering, University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh Tasfia Rahman Computer Science and Engineering, University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh C. Rajathi School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
Editors and Contributors
xxvii
S. Rajeashwari Department of Computer Science, Sri S. Ramasamy Naidu Memorial College, (Affiliated to Madurai Kamaraj University), Sattur, Tamil N¯adu, India Nalin M. Rajendran Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India B. Rajesh Shyamala Devi Hindustan Institute of Technology and Science, Chennai, India Omar Sharif Rajme University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh Sreekanth Rallapalli Department of Master of Computer Applications, Nitte Meenakshi Institute of Technology, Bengaluru, Karnataka, India H. V. Ramachandra School of CSE, REVA University, Bengaluru, India R. Ramaguru TIFAC-CORE in Cyber Security, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Nithin Ramakrishnan School of CSE, REVA University, Bengaluru, India G. Ramesh Department of EEE, Sreenidhi Institute of Science and Technology (SNIST) Yamnampet, Ghatkesar Hyderabad, Telangana, India M. Ramkumar CSBS, Knowledge Institute of Technology, Salem, India Rashmi Rane Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India Mapitsi Roseline Rangata Council for Scientific and Industrial Research, Pretoria, South Africa Mohammad Rifat Ahmmad Rashid East West University, Dhaka, Bangladesh Aswathy Ravikumar School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India Nikhil Rawat Department of Computer Science Engineering, KIET Group of Institutions, Ghaziabad, India V. Rekha Department of Computer Science and Engineering, CHRIST (Deemed to Be University), Bengaluru, India P. Rukmani School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India Deepjyot Kaur Ryait School of Computer Applications, Lovely Professional University, Phagwara, India Priya Sachdeva Amity School of Communication, Amity University, Noida, India
xxviii
Editors and Contributors
Prasant Kumar Sahu School of Electrical Sciences, IIT Bhubaneswar, Arugul, Jatni, India K. Sai Ragadeep School of Electronics Engineering, VIT University, Vellore, India V. V. Sajith Variyar Center for Computational Engineering and Networking (CEN), Coimbatore, India Neeraj Salunke Vidyalankar Institute of Technology, Mumbai, India Mkik Salwa University Sultan Moulay Slimane, Beni Mellal, Morocco Nakib Ahmed Sami Department of EEE, Leading University, Sylhet, Bangladesh G. Sangeetha Arulmigu Subramania Swamy Arts and Science College, Vilathikulam, Tamil Nadu, India N. Sangeetha School of Electronics Engineering, VIT University, Vellore, India Sunil Sangve JSPM’s Rajarshi Shahu College of Engineering, Pune, Maharashtra, India Himali Sarangal Department of Engineering and Technology, Guru Nanak Dev University, Jalandhar, India P. Saranya Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India Ojas Saraswat Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Chennai, India P. Saraswathi IT, Velammal College of Engineering and Technology, Madurai, Tamil Nadu, India S. Saravanakarthi Department of EEE, Sreenidhi Institute of Science and Technology (SNIST) Yamnampet, Ghatkesar Hyderabad, Telangana, India P. Sathishkumar Department of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, India Tshephisho Joseph Sefara Council for Scientific and Industrial Research, Pretoria, South Africa Ravichandran Sekar Department of EEE, Sreenidhi Institute of Science and Technology (SNIST) Yamnampet, Ghatkesar Hyderabad, Telangana, India P. Selvi Rajendran Hindustan Institute of Technology and Science, Chennai, India G. A. Senthil Department of Information Technology, Agni College of Technology, Chennai, India Disha Shah Department of Computer Engineering and Technology, Dr. Vishwanath Karad MIT World Peace University, Pune, India
Editors and Contributors
xxix
Milind Shah Department of Computer Science and Engineering, Krishna School of Emerging Technology and Applied Research (KSET), Drs. Kiran and Pallavi Patel Global University (KPGU), Vadodara, Gujarat, India L. Shakkeera CSE, Presidency University, Bengaluru, Karnataka, India P. Shanmugaraja IT, Sona College of Technology, Salem, India Kartik Sharma PES University, Bangalore, India Manmohan Sharma School of Computer Applications, Lovely Professional University, Phagwara, India Sanjay Kumar Sharma Banasthali Vidyapith, Newai, Rajasthan, India N. Sharmila Electrical and Electronics Engineering, JSS Science and Technology University, Mysore, Karnataka, India Purnendu Shekhar Pandey Department of Computer Science Engineering, KIET Group of Institutions, Ghaziabad, India Adesh Shinde JSPM’s Rajarshi Shahu College of Engineering, Pune, Maharashtra, India R. Shivani School of Electronics Engineering, VIT University, Vellore, India Andre Timiko Siahaan School of Information System, Bina Nusantara University, Alam Sutra, Tangerang, Indonesia Butta Singh Department of Engineering and Technology, Guru Nanak Dev University, Jalandhar, India Chaitanya Singh PES University, Bengaluru, India Manjit Singh Department of Engineering and Technology, Guru Nanak Dev University, Jalandhar, India Mallika Sirdeshpande PES University, Bengaluru, India V. SivaKrishna Reddy Hindustan Institute of Technology and Science, Padur, Kelambaakkam, Chengalpattu, India Ramesh Sivanpillai Wyoming GIS Center, University of Wyoming, Laramie, WY, USA Swapnil Sonawane Vidyalankar Institute of Technology, Mumbai, India V. Sowmya Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India; Amrita School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Coimbatore, India Ali Srairi I. N. R. H, Bd Sidi Abderrahmane Ain Diab Casablanca, Casablanca, Morocco
xxx
Editors and Contributors
Vemparala Vigna Sri Department of CSE, VR Siddhartha Engineering College, Vijayawada, India K. C. Sriharipriya School of Electronics Engineering, VIT University, Vellore, India Lohith T. Srinivas PES University, Bangalore, India P. Sripriya Department of Computer Applications, VISTAS, Chennai, Tamil Nadu, India Harini Sriraman School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India J. Suganthi School of Sciences, Christ (Deemed to be) University, Bengaluru, India Su. Suganthi Department of Artificial Intelligence and Data Science, Sri Sai Ram Institute of Technology, Chennai, India R. Suguna Department of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, India B. A. Sushma Department of CSE, S.E.A College of Engineering and Technology, Bangalore, Karnataka, India Md Ulfat Tahsin Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh Atul Takale JSPM’s Rajarshi Shahu College of Engineering, Pune, Maharashtra, India Ahmed Talbaoui Department of Biology, Faculty of Sciences Rabat, Rabat, Morocco Shital Telrandhe Datta Meghe Institute of Higher Education and Research, Wardha, India Atharv Terwadkar Department of Computer Engineering, College of Engineering, Pune, India Dang N. H. Thanh Department of Information Technology, College of Technology and Design, University of Economics Ho Chi Minh City (UEH), Ho Chi Minh City, Vietnam Le Thi Thanh Department of Mathematics, Faculty of Applied Sciences, Ho Chi Minh City University of Technology and Education, Ho Chi Minh City, Vietnam M. Thenmozhi Department of Networking and Communications, SRMIST, Chennai, India Aju Mathew Thomas TIFAC-CORE in Cyber Security, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Editors and Contributors
xxxi
Nguyen D. Toan College of Technology and Design, University of Economics Ho Chi Minh City (UEH), Ho Chi Minh City, Vietnam Kaushha Trivedi Nirma University, Ahmedabad, Gujarat, India Mohammad Salah Uddin Computer Science and Engineering Department, East West University, Dhaka, Bangladesh Roshan Umate Datta Meghe Lnstitute of Higher Education and Research, Swangi, Wardha, India Surapaneni Varun Department of Computer Science and Engineering, PES University, Bangalore, India Avani Vasant Department of Computer Science and Engineering, Krishna School of Emerging Technology and Applied Research (KSET), Drs. Kiran and Pallavi Patel Global University (KPGU), Vadodara, Gujarat, India S. Venkata Lakshmi K.L.N College of Engineering, Madurai, India Nguyen Q. Viet College of Technology and Design, University of Economics Ho Chi Minh City (UEH), Ho Chi Minh City, Vietnam Medha Wyawahare Deparment of Electronics and Telecomunnication Engineering, Vishwakarma Institute of Technology, Pune, India Lavanya Yavagal PES University, Bangalore, India V. Yazhini Center for Computational Engineering and Networking, Amrita Vishwa Vidyapeetham, Coimbatore, India Mohd Faiz Zafar Department of Computer Science Engineering, KIET Group of Institutions, Ghaziabad, India Muhammad Yasır Zaın Informatics Engineering Study Program, University of Madura, Pamekasan, Indonesia Mariya Zhekova University of Food Technology, Plovdiv, Bulgaria
Mitigating Vanishing Gradient in SGD Optimization in Neural Networks Aswathy Ravikumar and Harini Sriraman
Abstract Deep structured learning has become a brand-new topic because of its many applications in signals and data processing. Two main elements of deep learning are often highlighted in high-level characterizations: its structural frameworks comprising many layers or stages designed for the nonlinear processing of information and its optimization strategies for either supervised or unsupervised training of feature extraction at successively higher abstract levels. The vanishing gradient problem (VGP) is a crucial issue when training on multilayer neural nets employing the backpropagation technique with SGD. Sigmoid function and ReLU are typical neural network (NN) activation functions. Even though several researchers have proposed solutions to this issue, there has yet to be an effective, workable answer. When utilizing sigmoid and tanh, Batch Normalization alone cannot address the issue of gradients that disappear. ReLU is superior at solving VGP since it does not saturate, and its gradients are constant and more significant. The most effective solution for VGP is combining ReLU with Batch Normalization. In this paper, a unifying review of solutions for VGP is explained. Keywords Vanishing gradient · Activation functions · Neural network · ReLU · Optimization · Sigmoid function · Gradient descent
1 Introduction Deep neural networks have advanced considerably in several sectors, including machine vision, image creation, and digital imaging. First-order approaches, including gradient descent and the stochastic gradient technique, are often used to A. Ravikumar (B) · H. Sriraman School of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, India e-mail: [email protected] H. Sriraman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_1
1
2
A. Ravikumar and H. Sriraman
train deep neural systems. However, since the associated error function is non-convex, convergence to an optimal solution is not guaranteed. First-order techniques may experience unstable training and disappearing or bursting gradients depending on the initialization. Deep multilayered neural networks describe hypotheses of significantly elevated polynomials that are sought for solving highly complicated problems [1, 2]. DNN, with several layers, obtains exceptional performance for problem fields, including unstructured information or model sequences. Typically, engineers must organize features for use in machine learning algorithms to tackle a particular issue. Unfortunately, most available data are unstructured or unprocessed data lacking characteristics, such as smartphone and Internet images with ambiguous filenames. Training deep multilayer neurons is complex and fraught with complications, such as the availability of enormous quantities of data, saddle locations, localized optimization techniques, and vanishing and exploding gradient challenges [2]. Stochastic gradient descent techniques are generally used for training complex neural network models using backpropagation, which resets the algorithm parameters arbitrarily and afterward trains them again by an adjusted ratio of individual contributions to the model’s loss. Accountability of the model parameters is measured using the derivative of the parameters using the chain rule. Nevertheless, the gradient sometimes explodes to a substantial value or vanishes to zero, and this causes a vanishing and exploding gradient problem (VGP/EGP). While in the case when a component is initialized with a small value, a partial derivative nears zero as system complexity increases, resulting in a VGP. The problem of vanishing gradients decelerates learning and prevents value changes for more profound variables. In the most extreme situation, disappearing might impede continued training of the deep, multilayered neural network. When a variable with a significant value is inserted, its partial derivative extends as the model depth develops, leading to the expanding gradient problem, also called an overflow. On the other hand, an exploding problem is more accessible than a vanishing issue because it can be handled by dividing massive quantities into small pieces. The vanishing problem in developing deep interconnected neural networks utilizing gradient descent learning is persistent and has yet to be resolved [3, 4].
2 Gradient Descent Optimization In the case of neural networks, the increased cost of performing reverse propagation across the entire training dataset leads to a loss of efficiency, a problem acknowledged by specialists. Stochastic Gradient Descent (SGD) [5] is the most used optimization method in a neural network for the probabilistic measure of the loss and locating the minima and maxima. Whenever the volume of information is significant, SGD performs well since it does not employ each training image for every iteration, reducing the amount of computation needed and enhancing the processing speed. In addition, based on the error metric, SGD can dynamically adjust the estimate of gradient vectors for every parameter. Consequently, the likelihood that the
Mitigating Vanishing Gradient in SGD Optimization in Neural Networks
3
models would converge at a locally optimal solution may be diminished. Because of these benefits, using SGD might mitigate computation costs and result in rapid convergence [6]. The quantity of steps to attain a minimum is related to the pace of learning. The route of the goal function-generated surface’s decreasing slope is continued until a localized minimum is found.
2.1 Scope Identified • Comprehend how gradients burst or disappear in both perspectives of the mechanics of calculating gradient and the functional link between the hidden neurons at specific time steps. • Solutions for mitigating the VGP using activation functions, Adaptive Optimization Methods, Proper Weight Initialization, Gradient Clipping, and Batch Normalization.
3 Vanishing Gradient Problem Due to disappearing and exploding gradients, artificial neural networks frequently encounter training difficulties [7]. Particularly in deep learning, the learning issue becomes enormous. Deep learning relates to a complicated artificial neural network structure in which the backpropagation technique is used to train the layers in a hierarchical pattern [8]. The error is calculated during the backpropagation phase of the neural network for weight updating and obtaining proper training. The error is calculated from the output node to the input layer, but it fails when local minima occurs in the non-convex objective function [9]. This problem has led the uppermost buried layers to get saturated, increasing their intensity with depth. As more layers with distinct activation functions are added to neural networks, the slope of the error function decreases considerably, making network training more challenging. Furthermore, many activation functions, like the sigmoid function, condense a broad input region between 0 and 1 into a small input region between 0 and 1. Thus, a significant change in the source will lead to a negligible shift in the sigmoid function’s result. As a result, the derivative decreases. VGP is a significant issue for shallow networks with just a few layers that employ these activations. However, as additional layers are used, the gradient may become too narrow for practical training. Backpropagation is used to discover neural network gradients. Backpropagation identifies the network’s derivatives by going through each layer from the final to the starting layer. According to the chain rule, the derivatives of each layer are multiplied along the network to get the derivatives of the previous layers. N tiny derivatives are multiplied simultaneously, though, when n hidden units apply an activation such as the sigmoid function. As a result, the gradient diminishes exponentially as it propagates to the early layers. The biases and weights
4
A. Ravikumar and H. Sriraman
of the first layers will not be updated adequately with each training session if the gradient is modest. The choice of activation function is the underlying cause of gradients that disappear or explode. It is possible to differentiate between saturated and unsaturated activation functions within the activation function. During the early stages of neural network formation, saturated activation functions were used to power neurons and find the decision limit. Among the saturated activation function—sigmoid was widely used [10]. In the unsaturated activation function—Rectified Linear Unit (ReLU) [11] was used to overcome the saturation problem in the network. In ReLU, the VGP is mitigated using the exponential components for nonlinearity. Selecting an appropriate optimizer can be essential for avoiding the vanishing gradient problem in deep neural networks. Here are some general guidelines for choosing an optimizer to address the vanishing gradient problem: • Adaptive optimization algorithms: Adaptive optimization algorithms, such as AdaGrad, Adam, RMSProp, and Nadam, can help to mitigate the vanishing gradient problem by adjusting the learning rate adaptively during training. These algorithms keep track of the previous gradients and use this information to scale the learning rate accordingly, which can be especially helpful when dealing with highly nonlinear, multi-scale models. • Momentum-based optimization algorithms: Momentum-based optimization algorithms, such as SGD with momentum, can help to mitigate the vanishing gradient problem by accumulating gradients over time. This can prevent the optimizer from getting stuck in a local minimum and allow the network to explore a larger area of the parameter space. • Learning rate scheduling: A learning rate schedule can also help to address the vanishing gradient problem. By gradually reducing the learning rate during training, the optimizer can more easily navigate the parameter space and avoid getting stuck in a local minimum. Learning rate schedules can be based on heuristics, such as reducing the learning rate by a factor of 10 every few epochs, or can be adaptive, such as using a cosine annealing schedule. • Regularization: Regularization techniques, such as L1 or L2 regularization, can help to prevent overfitting and improve the generalization performance of the network. This can also help to address the vanishing gradient problem by constraining the magnitude of the weights and reducing the likelihood of exploding or vanishing gradients. It is important to note that the choice of optimizer depends on the specific problem and model architecture. Therefore, it is recommended to experiment with different optimizers and hyperparameters to find the optimal choice for a particular problem.
Mitigating Vanishing Gradient in SGD Optimization in Neural Networks
5
4 Activation Functions Generally, there are two kinds of activation functions: saturated and unsaturated activation.
4.1 Saturated Activation Function The primary Sigmoid activation illustrates a saturable activation function that is used more frequently since it closely mimics the speed of biological activation. The Sigmoid activation function demonstrates a mixture of linear and nonlinear replies in the form of an S-shaped reply. As a result, the activation of each neuron is limited to 0 and 1, which is theoretically optimal for turning neurons on or off. Nevertheless, the gradients for the data falling inside the range of 0 or 1 are almost nil. Backpropagating saturated zero gradients into networks results in the loss of information. This briefly illustrates the issue of disappearing gradients with a saturated activation function. Due to the disappearance of the error function for optimization, the training cycle has been stopped.
4.2 Unsaturated Activation Function The role of the activation Rectified Linear Unit (ReLU) is to circumvent the VGP. ReLU is now the most prevalent activation function since most contemporary network architectures are often expansive and dense. ReLU provides 0 for negative inputs and the computed value for positive inputs. ReLU is sometimes referred to as the impacted behaviors because its output might be either 0 or a positive integer. Because the output of ReLU is not confined to positive numbers, explosive gradient difficulties may arise. The backpropagation method will not transmit any gradient if the input is negative as well as the output is zero. It causes the cell to die, blocking future neural activity and ensuring that it will never be active afterward. Leaky ReLU (LReLU) offered a nonzero slope for the harmful component of ReLU to remedy the dying ReLU issue, hence preventing the backpropagation of zero slopes into the network. Nevertheless, it has been demonstrated that LReLU yields contradictory results based on selecting a nonzero slope value, an issue that is overcome by Parametric ReLU (PReLU). PReLU uses the slope value as a training parameter instead of LReLU, specifying a constant value. In an Exponential Linear Unit (ELU), if z < 0, it assumes negative values, enabling the unit’s average output to be closer to 0, resolving the vanishing gradient issue. Gradients are nonzero when z < 0. This circumvents the issue of dying neurons. For z = 1, the value is smooth throughout; this accelerates the gradient descent because the function does not oscillate to the right and left around z = 0. Scaled versions of this function (SELU: Scaled ELU)
6
A. Ravikumar and H. Sriraman
Fig. 1 Gradient of activation functions
are often used in deep learning. The gradient of the primary activation function is shown in Fig. 1, and this clearly explains how the activations lead to VGP.
5 Adaptive Optimization Methods Consequently, selecting the optimal setup for deep learning may take time and effort. Adaptive optimization strategies are proposed as a solution to the issue of disappearing gradients. The adaptive gradient algorithm [12] substitutes the adaptive learning rate for intensive hyperparameter adjustment. AdaGrad calculates an effective gradient-based learning algorithm using the previous iterations’ geometrical data. AdaGrad is an optimization algorithm used in machine learning that adapts the learning rate for each parameter based on historical gradient information. However, AdaGrad can suffer from some issues, including: • Learning rate decay: AdaGrad decreases the learning rate for each parameter as training progresses, which can lead to the learning rate becoming too small, causing slow convergence, or getting stuck in a suboptimal solution. • Accumulation of squared gradients: As AdaGrad accumulates the squared gradients, the update equation’s denominator grows larger, leading to a diminishing learning rate and slow convergence. Here are some ways to overcome these issues in AdaGrad: • Learning rate annealing: To overcome the learning rate decay issue, use learning rate annealing. This means gradually reducing the learning rate over time as training progresses. There are several ways to implement learning rate annealing, including step decay, exponential decay, and cosine annealing. • Gradient Clipping: To prevent the accumulation of squared gradients from becoming too large, use Gradient Clipping. This involves setting a maximum value for the norm of the gradients. If the norm of the gradients exceeds this maximum value, the gradients are scaled down so that the norm is equal to the maximum value.
Mitigating Vanishing Gradient in SGD Optimization in Neural Networks
7
• Using alternative algorithms: AdaGrad is just one of many optimization algorithms for training machine learning models. Other algorithms like Adam, RMSprop, and Nadam are designed to overcome some of the limitations of AdaGrad and may perform better for certain types of models and datasets. • Tuning hyperparameters: Hyperparameters, such as the learning rate and regularization strength, can significantly affect the performance of AdaGrad. Therefore, it is essential to experiment with different hyperparameters to find the optimal values for the specific problem. This can be done through cross-validation or other optimization techniques. Because of the nature of the operation, this method produces more significant updates for rare training input and more minor updates for frequent training input. As the amount of training iterations rises, AdaGrad’s diminishing learning rate leads the optimization to an end. AdaDelta [13] was proposed to overcome the problems in AdaGrad using a perdimensional learning rate. The cumulative gradient adjustment method is implemented in a fading way in AdaDelta, and this in turn ensures the usage of few gradients in a large number of epochs. In adaptive momentum estimation [14], momentum adjustment for each parameter was introduced. Just like with accelerated per-parameter momentum, it keeps the exponential decay average of previous deltas, in contrast to AdaDelta. Due to its quick convergence and problem-solving ability, Adam is the most advanced adaptive optimization algorithm currently available. Nevertheless, stochastic diagonal approximation of maximum descent (SDAGD) [15, 16] is a substitute for current adaptive optimization strategies in which the concept of relative stride length in each training iteration was used to adjust the step length at the initial stage. Adam optimization is a combination of momentum and RMSProp, where momentum [6] is used for SGD acceleration and reduces oscillations of the gradient [17]. Nesterov-accelerated Adaptive Moment Estimation (Nadam) combines Adam optimizer and Nesterov-accelerated gradient [18]. Weight decay in the Adam optimizer is mitigated in AdamW [19], and QHAdam [20] takes the average of normal SGD with momentum SGD. AggMo [21] is used for the combination of optimization and momentum.
6 Other Solutions for VGP 6.1 Batch Normalization Applying correct weight initialization in conjunction with any form of the ReLU activation function may lessen the likelihood of early vanishing/explosion issues. It does not, however, ensure that the issue will not resurface throughout training. Batch Normalization was thus added in [22] to solve the issue of vanishing gradients. It involves introducing an operation immediately before or after the activation function of every hidden unit inside the network. This approach normalizes and zero-centers all inputs, then scales and shifts the output using two different parameter vectors
8
A. Ravikumar and H. Sriraman
per layer for scaling and sliding. The procedure allows the network to determine the optimal scale and meaning of the inputs for each layer. The algorithm must estimate each variable’s mean and standard deviation to normalize and zero-center the input.
6.2 Gradient Clipping Gradient Clipping entails pushing the gradient values (item by item) to a predetermined minimal or maximum level if the gradient exceeds the intended range. Gradient Clipping is a technique in which the error derivative is modified or trimmed to a cutoff during backward propagation through the system. The weights are updated using the clipped gradients. In rescaling the error derivative, the changes to the weights will likewise be rescaled, thus reducing the probability of an overflow or underflow. This optimizer will cut all gradient vector components to values between − 1.0 and 1.0. Derivatives of the loss function concerning each trainable parameter are trimmed between − 1.0 and 1.0. The threshold is a tunable hyperparameter. Due to all these, the position of a gradient could change; for example, if the initial gradient vector was [0.9, 100.0] pointing primarily in the path of the second axis, after clipping it by some value, it becomes [0.9, 1.0] which now points somewhere along the diagonal between the two axes. To guarantee that the orientation stays intact after clipping, we must clip according to the norm instead of the value. Rescaling is a technique that can be used to overcome the vanishing gradient problem in deep neural networks. Rescaling involves scaling the inputs or outputs of a layer to a specific range. Here are two standard rescaling techniques that can be used: • Batch Normalization: Batch Normalization is a rescaling technique that is applied to the inputs of a layer. It normalizes the inputs to have zero mean and unit variance across the batch of samples. This helps stabilize the distribution of the inputs, which helps reduce the magnitude of the gradients and mitigate the vanishing gradient problem. • Weight normalization: Weight normalization is a rescaling technique applied to a layer’s weights. It normalizes the weights to have a fixed norm, which can help avoid the vanishing gradient problem. By keeping the weights at a fixed norm, weight normalization can ensure that the gradients do not vanish as the network depth increases. It is important to note that rescaling techniques can introduce additional hyperparameters to the model, affecting its performance. Therefore, it is essential to experiment with different rescaling techniques and hyperparameters to find the optimal choice for a particular problem. It is also essential to consider other techniques, such as using appropriate weight initialization and selecting an appropriate optimizer, to address the vanishing gradient problem [23].
Mitigating Vanishing Gradient in SGD Optimization in Neural Networks
9
6.3 Weight Initialization With a high or 0 initialization for weights, it is unlikely that substantial results will be gained, even if the weights are appropriately initialized, and the training process is likely to take longer. The model is excessively huge and requires several days of training, plus there is an issue with vanishing/exploding gradients. The activationaware initialization resolved this issue for weights (for ReLU) suggested by He et al. [24]. ReLU and leaky ReLU tackle the issue of gradient vanishing. Another form of weight initialization is Xavier initialization [24], which is identical to “He initialization” but is utilized for the tanh () activation function. These approaches provide excellent beginning locations for initialization and reduce the likelihood of gradients inflating or disappearing. They fixed the weights to be neither significantly more than 1 nor smaller than 1. Therefore, gradients do not rapidly evaporate or explode. Some standard weight initialization techniques that can help to alleviate the vanishing gradient problem are: • Xavier initialization: This method scales the weight initialization based on the size of the input and output layers of the layer. Xavier initialization is effective for sigmoid and hyperbolic tangent activation functions. • He initialization: This method is similar to Xavier initialization, but it scales the weight initialization based only on the size of the input layer. He initialization is effective for Rectified Linear Unit (ReLU) activation functions. • Variance scaling: This method scales the weight initialization to ensure that the variance of the activation is preserved across the network. Variance scaling can help to avoid both vanishing and exploding gradient problems. • Uniform initialization: This method initializes the network weights uniformly across a range of values. Uniform initialization can prevent the weights from being too small and causing the vanishing gradient problem. • Random initialization: This method initializes the weights of the network randomly. Random initialization can help break the network’s symmetry and avoid the vanishing gradient problem. It is important to note that the effectiveness of these techniques may depend on the specific architecture and task at hand. Therefore, it has recommended experimenting with different weight initialization techniques to find the optimal choice for a particular problem.
7 Conclusion When utilizing sigmoid and tanh, Batch Normalization alone cannot address the issue of gradients that disappear. ReLU is superior at solving VGP since it does not saturate, and its gradients are constant and more significant. Therefore, the most effective solution for VGP is combining ReLU with Batch Normalization. The initial
10
A. Ravikumar and H. Sriraman
Batch Normalization study stated that “Internal Covariate Shift” is responsible for Batch Normalization’s effectiveness in improving the performance of deep neural systems. Under this idea, input allocation to hidden units in a deep neural system fluctuates irregularly when the model’s parameters are modified during backprop. Because the output of one layer serves as input for the subsequent layer and the weights of each layer are continually changed through backprop, the input data distribution of each layer is likewise continually changing. While there are many solutions to the vanishing gradient problem in deep neural networks, some have certain limitations. Here are some limitations of existing solutions to the vanishing gradient problem: Rescaling techniques, such as Batch Normalization and Weight Normalization, can effectively mitigate the vanishing gradient problem. However, they can also introduce additional hyperparameters and computation overhead, which can affect the performance and training time of the network. Activation functions, such as ReLU, can help to avoid the vanishing gradient problem by ensuring that the gradients do not vanish. However, they can also cause other problems, such as the “dying ReLU” problem, where some units can become inactive and never fire again during training. SGD with momentum can help to overcome the vanishing gradient problem, but it can also suffer from the same problems as vanilla SGD, such as slow convergence or getting stuck in local minima. The learning rate schedule can help to overcome the vanishing gradient problem by gradually reducing the learning rate during training. However, it can also introduce additional hyperparameters and require careful tuning to ensure optimal performance. Finally, weight initialization techniques, such as Xavier or He initialization, can help to avoid the vanishing gradient problem. However, they can also be sensitive to the architecture and network size, which can affect their effectiveness. It is important to note that the effectiveness of these solutions also depends on the specific problem and the architecture of the network. Therefore, it is recommended to experiment with different solutions and hyperparameters to find the optimal choice for a particular problem. New research is constantly being conducted to develop more effective solutions to the vanishing gradient problem. Future advances in this area may provide even better solutions with fewer limitations.
References 1. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 2. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256 3. Sengupta S, Basak S, Saikia P, Paul S, Tsalavoutis V, Atiah F, Ravi V, Peters A (2020) A review of deep learning with special emphasis on architectures, applications and recent trends. Knowl.-Based Syst 105596 4. Qawaqneh Z, Mallouh AA, Barkana BD (2017) Deep neural network framework and transformed MFCCs for speaker’s age and gender classification. Knowl-Based Syst 115:5–14
Mitigating Vanishing Gradient in SGD Optimization in Neural Networks
11
5. Hardt M, Recht B, Singer Y (2015) Train faster, generalize better: stability of stochastic gradient descent. arXiv 6. Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. arXiv, pp 1–14 7. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7533):436–444 8. Deng L (2012) Three classes of deep learning architectures and their applications: a tutorial survey. Trans Sig Inf Process 9. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics. PMLR, 13–15 May 2010, pp 249–256 10. Haykin S (1998) Neural networks: a comprehensive foundation, 2nd ed. Prentice Hall PTR, Upper Saddle River, NJ, USA 11. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323 12. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159 13. Zeiler MD (2012) ADADELTA: an adaptive learning rate method. CoRR, vol abs/1212.5701 14. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR, vol abs/1412.6980 15. Tan HH, Lim KH, Hendra HG (2017) Stochastic diagonal approximate greatest descent in neural networks. In: Proceedings of the IEEE international joint conference in neural networks 16. Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: Advances in neural information processing systems; neural information processing systems foundation, Inc.: La Jolla, CA, USA, 2010; pp 2595–2603 17. Dozat T (2016) Incorporating Nesterov momentum into Adam. ICLR Workshop 1:2013–2016 18. Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw Official J Int Neural Netw Soc 12(1):145–151 19. Oshchilov I, Hutter F (2019) Decoupled weight decay regularization. In Proceedings of ICLR 2019 20. Ma J, Yarats D (2019) Quasi-hyperbolic momentum and Adam for deep learning. In: Proceedings of ICLR 2019 21. Lucas J, Sun S, Zemel R, Grosse R (2019) Aggregated momentum: stability through passive damping. In: Proceedings of ICLR 2019 22. Yang G, Schoenholz SS (2017) Mean field residual networks: on the edge of chaos. CoRR, vol abs/1712.08969 23. Maas L, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30 th INTERNATIONAL CONFERENCE ON MACHINE Learning 24. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. CoRR, vol abs/1502.01852
A Comparative Analysis of Heart Disease Diagnosis with Machine Learning Models Le Thi Thanh and Dang N. H. Thanh
Abstract In this study, a framework to diagnose heart disease with different machine learning models is proposed. The framework includes three parts: data assessment and preprocessing, model training, and performance evaluation. A heart disease database of Cleveland is used for training the models. Experimental results showed that Random Forest has achieved the best performance in Accuracy (84%), F1-score (83.4%), and other relevant metrics, and this model is suitable to employ practical applications. Keywords Heart disease · Coronary artery disease · Machine learning · Data classification · Cardiology · Data analytics
1 Introduction By the statistics of Center for Disease Control and Prevention of U.S. (CDC), heart disease is the most linked to death in U.S. [1]. Heart disease is a general term related to some forms of heart conditions. The most popular class of heart disease is coronary artery disease. And it usually causes heart attacks. Coronary artery disease occurs when plaque buildup in the artery wall. These arteries supply blood with nutrition The original version of this chapter was revised: Reference list has been updated. The correction to this chapter is available at https://doi.org/10.1007/978-981-99-5166-6_76 L. T. Thanh Department of Mathematics, Faculty of Applied Sciences, Ho Chi Minh City University of Technology and Education, Ho Chi Minh City, Vietnam e-mail: [email protected] D. N. H. Thanh (B) Department of Information Technology, College of Technology and Design, University of Economics Ho Chi Minh City, Ho Chi Minh City, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023, corrected publication 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_2
13
14
L. T. Thanh and D. N. H. Thanh
and oxygen to the heart as well as to other parts of the body to function. They are also known as coronary arteries. Time by time, plaque is created by the deposit of cholesterol and other matters in the artery. The plaque buildup causes artery narrowing and blocks the blood flow to the heart which leads to heart diseases. There are several symptoms of coronary artery disease [1]. Chest pain is the most common symptom. Chest pain includes several types: asymptomatic myocardial ischemia, atypical angina, and typical angina. There are several potential factors leading to coronary artery disease, in particular, and heart diseases, in general, such as high blood pressure, high sugar blood index, heartbeat disorder. These factors usually appear in humans with passive lifestyles such as being overweight, physical inactivity, unhealthy eating, drinking alcohol, and smoking cigarettes [2]. Heart disease, in general, and coronary artery disease, in particular, can cause death and the treatment is very difficult and expensive. Therefore, early heart disease diagnosis is very important to save the patient’s life. In this paper, we propose a framework with the support of machine learning (ML) approaches to predict potential cases of heart disease. The framework incorporates eight different ML models: Decision Tree (Tree), Random Forest, Adaptive Boosting (AdaBoost), Naïve Bayes, k-Nearest Neighbors (kNNs), Support Vector Machine (SVM), Logistic Regression, and Neural Network. The finding of the work is to discover machine learning models that are efficient in heart disease diagnosis. The paper is structured as follows. Section 2 presents an overview of related works. Section 3 presents the materials and methods. Section 4 presents the experimental results and comparison of the performance of the ML models. Finally, Sect. 5 is for concluding the paper.
2 Related Works Due to the dangers of heart disease in human life, studies on predicting heart disease are an important research direction in medicine and healthcare technology. With the development of Artificial Intelligence (AI), different traditional machine learning (ML) and deep learning (DL) methods are incorporated. Qian et al. [3] proposed a cardiovascular disease prediction model using different models of traditional machine learning such as Logistic Regression with Lasso regularization, Random Forest, and Support Vector Machine. The authors use another dataset that they collected through a survey from 2010 to 2012. With this study, the authors confirmed that the Logistic Regression with Lasso regularization achieved the best performance. The major limitation of the research is that the dataset was not published. Also, the number of machine learning models is small. Karthick et al. [4] proposed another framework for heart disease risk prediction with several ML models such as Light-Gradient Boosting Machine (LightGBM), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machine, Naïve Bayes, and Logistic Regression. The authors test on the Cleveland
A Comparative Analysis of Heart Disease Diagnosis with Machine …
15
database. The findings showed that the Random Forest is the best model. Some limitations of the research can be listed: the author only uses Accuracy for evaluating the performance, and the authors split the database with a rate of 80:20 for the training and test sets. Therefore, the score is evaluated on a small test set (around 60 instances). Moreover, the number of machine learning models is also small. Especially, a major part (3/6 models) of the considered models relates to Decision Tree (multiple Decision Trees in RF, and the Boosting technique for Decision Tree—LightGBM and XGBoost models). Singh et al. [5] proposed an effective prediction system for heart disease with multi-layer perceptrons (MLPs). They also use the Cleveland database. The results showed that the system can successfully predict the danger level of heart disease. Several limitations of the research are: they only test with one machine learning model and did not compare the performance with different models. The data split rate is 60:40 for the training and test sets. This rate is not good for a deep learning model such as MLP because deep learning models need many data to learn. By the report, the accuracy, F1-score, Area Under the Curve (AUC), Precision, and Recall are 100%. This may lead to overlapping when spitting data. Almustafa et al. [6] proposed a system to diagnose heart disease with several ML models such as Decision Tree, AdaBoost, k nearest neighbors (kNNs), Naïve Bayes, and Support Vector Machine. The authors also test the models on the Cleveland database. They use k-fold cross-validation technique with k = 10, to incorporate the training procedure. The experimental results showed that kNN is the best model. There are some limitations in this study: the number of instances is 303, so each fold contains around 30 instances. Therefore, the test set is too small to get a trusty evaluation result. The authors used several metrics such as Accuracy, Relative absolute error (RAE), Kapp score, and Mean absolute error (MAE) to evaluate the performance. Important metrics such as F1-score and AUC are absent. Yang and Guan [7] proposed another model for heart disease prediction based on the combination of SMOTE technique and XGBoost. Basically, this model incorporates the XGBoost model with the SMOTE-Enn to balance the data of each labeling class. The proposed model was improved a little compared to XGBoost. The study was implemented on a non-public dataset. The major issue in this study that cannot apply to the Cleveland database is that the labeling data of the Cleveland database are balanced. Therefore, we do not need to use balancing methods. Another disadvantage of the study is that they only focus on improving performance for the XGBoost model. They did not consider other machine learning models. Shah et al. [8] proposed another system for predicting heart disease. The system uses different machine learning models such as Decision Tree, kNN, Naïve Bayes, and Random Forest. They also tested on the Cleveland database. The study confirms that kNN achieved the best accuracy. Detailed information on the split data was not given. Some limitations of the study are: the number of ML models is small, and they used another version of the Cleveland database and adjust the target variable into four labels. Basically, this is a multi-class classification that is different from the considered problem—the binary classification.
16
L. T. Thanh and D. N. H. Thanh
Yang et al. [9] developed an intelligent system for heart disease diagnosis. The study mainly focuses on the Random Forest model. Also, the authors collected data from hospitals in Eastern China and it was not shared. The findings confirmed that the Random Forest model achieved a very good performance. Vasudev et al. [10] proposed a stacked ensemble learning with Naïve Bayes and MLP. They also implemented the model on the Cleveland database. Even though the model achieved good performance, the study lacks a comparison with other machine learning models. In general, the above works have some limitations. Several works test on nonpublic datasets. Several works only focus on a concrete machine learning model or the same relevant types of machine learning models. For the studies with several machine learning models, the number of models is small, usually 4–6. In addition, they only use 2–3 metrics to evaluate the performance, usually, Accuracy and AUC. This is unsuitable for the case of not absolutely balanced data (the gaps between labeling data are higher than 1% but lower than 10%). Especially, most researches lack a part of data assessment and explanation. For the Cleveland database, the data split rate in all the above works is not really appropriate for a not large dataset.
3 Materials and Methods 3.1 Heart Disease Dataset The heart disease dataset is publicly available at the UCI repository at the address https://archive.ics.uci.edu/ml/datasets/heart+Disease. The dataset contains 75 attributes, one target variable, and 303 data instances. The dataset includes four databases: Cleveland, Hungary, Switzerland, and the VA Long Beach. In experiments, most studies use a subset of 13 features and one target variable of the Cleveland database. Hence, in this study, we also use this subset. The considered features contain categorical and numeric variables. The categorical variables include gender, chest pain, fasting blood sugar > 120, rest ECG (Electrocardiogram), exerc ind ang (exercise induced angina), slope peak exc ST (the slope of the peak exercise ST segment), and thal (thalassemia). The ST segment is the region from ventricular depolarization to ventricular repolarization on ECG. The numeric variables include age, rest SBP (resting systolic blood pressure), cholesterol, max HR (heart rate), ST by exercise, and major vessels colored (major vessels colored by fluoroscopy). The target variable is diameter narrowing and it denotes heart disease status. Its value is 0 ( 0, is used. • Decision Tree is a supervised learning model. Decision Tree can solve both classification and regression problems. In this model, a tree structure was used for representing features and the class label (the target variable). • Random Forest (RF) is a model developed by ensemble learning. It can be used for solving classification and regression problems. In RF, there are several decision trees. The number of trees in RF influences performance of the model. In this paper, we use ten decision trees. • Naïve Bayes classifier is developed on Bayes’ theorem. The well-known advantage of Naïve Bayes classifier is the speed. This classifier is popularly used for solving the text classification and sentiment analysis problems.
18
L. T. Thanh and D. N. H. Thanh
• AdaBoost is an ensemble learning and utilizes the boosting technique to improve the performance of decision trees. The goal of AdaBoost is to use weights for data points. Weights of the misclassified points will be minimized to gain the overall performance of the model. • kNN is a non-parametric model for supervised learning. It can solve classification and regression tasks. There are several types of distance used to define the nearest neighbors. In this paper, we use the Euclidean distance: d xi , x j = / 2 m (k) (k) . To improve the performance of kNN, weights are also k=1 x i − x j assigned to each data point. In this paper, we use uniform distribution to assign weights. • Logistic Regression (LR) is a well-known model of supervised learning which can process classification and regression evaluates the proba problems. The model (k) , α0 , . . . , αm which α x bility based on: p(x) = 1/ 1 + exp −α0 − m k=1 k are weights. To improve the performance of LR, the regularization technique is incorporated. Here, we use Ridge regularization (based on the L2 norm). • Neural Network is also well-known as a multi-layer perceptron algorithm with backpropagation. It is a simple architecture of the Artificial Neural Network (ANN). In this paper, we configured 100 neurons for the hidden layer, the tanh function, tanh(t) = (exp(t) − exp(−t))/(exp(t) + exp(−t)), for the activation function, the stochastic gradient descent (SGD) for the optimal algorithm of the network.
3.3 Proposed Framework The proposed framework for heart disease diagnosis is presented in Fig. 2. We can divide the framework into three parts: the data assessment and preprocessing, the model training, and the performance evaluation. (a) Data Assessment and Preprocessing To assess the data, we use the following widgets: Correlation, Feature Statistics, Distributions, and Violin Plot. Correlation. To evaluate the correlation between a pair of features, the Pearson method is used. The result is shown in Fig. 3 and there are no pairs of features with strong correlation (higher than 0.7). Therefore, we do not need to remove any features. Feature statistics. By using the feature statistics widget, we can see that there are several missing values of features: major vessels colored (1% missing), thal (1% missing). Therefore, for deeper analysis, we must process these missing values by the preprocess widget. Preprocess. To process the missing values, for the numeric variable (major vessels colored), the missing values are filled by the mean value, and for the categorical variable (thal), the most frequent value is used.
A Comparative Analysis of Heart Disease Diagnosis with Machine …
Fig. 2 Proposed heart disease diagnosis framework
Fig. 3 Pearson correlation
19
20
L. T. Thanh and D. N. H. Thanh
Fig. 4 Distribution of the target variable
Distributions. To verify the data balancedness (balanced or imbalanced) of the classification problem, the distribution of the target variable should be tested. The distribution of the target variable is shown in Fig. 4. The gap between the two classes is minor, only 8.25%. The studied problem can be considered a balanced classification. Violin plot. To show the summary statistics and the density of each numeric variable, the violin plot is a useful widget. The violin plots of six numeric variables (age, cholesterol, rest SBP, ST by exercise, max HR, and major vessels colored) are presented in Fig. 5. For class 0, the density of “major vessels colored” and “ST by exercise” focuses on [0, 1], “max HR”—on [140, 180], “cholesterol”—on [160, 300], “rest SBP”—on [110, 150], and “age”—on [35, 65]. For class 1, “max HR” focuses on [110, 170], “cholesterol”—on [160, 300], “rest SBP”—on [115, 155], “age”—on [50, 65], and the density of “major vessels colored” and “ST by exercise” is relatively equal. (b) Model Training For the model training stage, we use the following widgets: Data sampler and machine learning models. The data sampler split the database into two parts: 75% for the training set and 25% for the test set. The data of the training set are used to optimize parameters of the machine learning models (Naïve Bayes, Random Forest, SVM, AdaBoost, kNN, Tree, Logistic Regression, and Neural Network widgets) to optimize their parameters during the training process.
A Comparative Analysis of Heart Disease Diagnosis with Machine …
Fig. 5 Violin plots of numeric variables
21
22
L. T. Thanh and D. N. H. Thanh
(c) Performance Evaluation After training the models with the training set, we combine the trained models with the test set to evaluate scores. Three widgets Test and Score, Confusion Matrix, and ROC Analysis are used. Models with the best scores will be used for employment in practice.
4 Experimental Results 4.1 Settings For the framework design, we use Orange Data Mining software. The source code runs on the Python environment with necessary libraries such as scikit-learn, Matplotlib, NumPy, and Pandas. The system to demonstrate and test the framework is a MacBook Pro with an M1 chip and 16 GB of RAM.
4.2 Evaluation Metrics To compute the performance of the ML models, the following scores are used [19]: Accuracy, F1-score, Precision, Recall, and AUC. We also use the ROC and Confusion Matrix for a better understanding of the results.
4.3 Results and Discussion The values of the scores Accuracy, Precision, Recall, F1-score, and AUC are presented in Table 1. Because the considered problem is balanced classification, Accuracy and F1-score are the important metrics. It is clear that the best Accuracy and the best F1-scores are of the Random Forest model. Also, other scores such as Precision, Recall, and AUC of the Random Forest are the highest. The Neural Network, SVM, and Naïve Bayes models also achieved good results. The deeper analysis with the ROC as in Fig. 6 showed that the result of the Random Forest is the best for both classes 0 and 1. The Confusion Matrices of two of the best models (Random Forest and Neural Network) are presented in Fig. 7. The results verify that the correct prediction rate for both classes 0 and 1 is higher than 82%.
A Comparative Analysis of Heart Disease Diagnosis with Machine …
23
Table 1 Scores of the machine learning models for the test set Method
Accuracy
Precision
Recall
F1-score
AUC
AdaBoost
0.653
0.651
0.653
0.652
0.645
KNN
0.613
0.613
0.613
0.586
0.635
Logistic Regression
0.8
0.802
0.8
0.797
0.843
Naïve Bayes
0.787
0.787
0.787
0.785
0.861
Neural Network
0.827
0.827
0.827
0.826
0.831
Random Forest
0.84
0.842
0.84
0.838
0.882
SVM
0.8
0.8
0.8
0.799
0.854
Decision Tree
0.64
0.641
0.64
0.641
0.636
Bold is the highest value in each column and indicated the best value
Fig. 6 ROCs of the machine learning models for the test set
Neural Network
Random Forest
Fig. 7 Confusion Matrix of the best machine learning models (Neural Network and Random Forest)
24
L. T. Thanh and D. N. H. Thanh
4.4 Examples of Heart Disease Diagnosis In this section, we provide five examples with random data from the test set. Then, we use the best model (Random Forest) that was trained before to predict heart disease. The results are shown in Table 2. With the given data, there is one case of being predicted with heart disease (case No 3) and four cases with healthy. For the heart disease case (case No 3), the patient has high cholesterol (254), high fasting blood sugar (>120), and high max heart rate (163) and was diagnosed with thalassemia and chest pain with asymptomatic myocardial ischemia. Another noticeable case is No 1. This patient has many potentially dangerous factors such as very high resting systolic blood pressure (150), high cholesterol (212), and high fasting blood sugar (>120), but he/she was not diagnosed with heart disease. Perhaps, these factors have only appeared in recent times. However, these factors represent a very high risk of heart disease. Therefore, the patient needs to have a medical regimen for each disorder as soon as possible.
5 Conclusions In this paper, we have developed a framework for heart disease diagnosis with different machine learning models. Based on the experimental results, the Random Forest is the most ideal model to predict dangers for patients with heart disease. The results are a reference for doctors to build the medical plan for each patient. The framework has its limitation. It cannot predict potential cases of heart disease. In the future, we will plan further research to solve the limitation.
Male
Female Non-anginal
67
63
45
3
4
5
Non-anginal
112
212
160
252
254
197
1
0
0
1
0
Bold is the highest value in each column and indicated the best value
Female Atypical ang
135
Asymptomatic 125
140 163
116
157
Normal
138
0
0
0
0
0
0
0
0.2
1.1
1.6
Flat
upsloping
Flat
Flat
0
0
2
0
0
0
Heart disease prediction
Normal
Normal
0
0
Reversible 1 defect
Normal
Normal
Major thal vessels colored
Upsloping 0
Max Exerc ST by Slope HR ind exercise peak exc ang ST
Left vent 172 hypertrophy
Normal
ST-T abnormal
Normal
150
Male
Female Non-anginal
59
76
1
2
Rest Cholesterol Fasting Rest ECG SBP blood sugar > 120
No Age Gender Chest pain
Table 2 Five sets of input features for testing the framework with RF model
A Comparative Analysis of Heart Disease Diagnosis with Machine … 25
26
L. T. Thanh and D. N. H. Thanh
Acknowledgements This study is supported by the Ho Chi Minh City University of Technology and Education (HCMUTE), Vietnam. This research is funded by the University of Economics Ho Chi Minh City (UEH), Vietnam.
References 1. https://www.cdc.gov/heartdisease/coronary_ad.htm. Accessed 25 Dec 2022 2. https://my.clevelandclinic.org/health/diseases/21493-cardiovascular-disease. Accessed 25 Dec 2022 3. Qian X, Li Y, Zhang X, Guo H, He J, Wang X, Yan Y, Ma J, Ma R and Guo S (2022) A Cardiovascular Disease Prediction Model Based on Routine Physical Examination Indicators Using Machine Learning Methods: A Cohort Study. Front. Cardiovasc Med 9:854287. https:// doi.org/10.3389/fcvm.2022.854287 4. Karthick K, Aruna SK, Samikannu R, Kuppusamy R, Teekaraman Y, Thelkar AR (2022) Implementation of a Heart Disease Risk Prediction Model Using Machine Learning. Computational and Mathematical Methods in Medicine, Article ID 6517716. https://doi.org/10.1155/2022/ 6517716 5. Singh P, Singh S, Pandi-Jain GS (2018) Effective heart disease prediction system using data mining techniques. Int J Nanomedicine 13:121–124. https://doi.org/10.2147/IJN.S124998 6. Almustafa KM (2022) Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinformatics 21:278. https://doi.org/10.1186/s12859-020-03626-y 7. Yang J, Guan JA (2022) Heart Disease Prediction Model Based on Feature Optimization and Smote-XGBoost Algorithm. Information 13:475. https://doi.org/10.3390/info13100475 8. Shah D, Patel S, Bharti SK (2020) Heart Disease Prediction using Machine Learning Techniques. SN Comput Sci 1:345. https://doi.org/10.1007/s42979-020-00365-y 9. Yang L, Wu H, Jin X et al (2020) Study of cardiovascular disease prediction model based on random forest in eastern China. Sci Rep 10:5245. https://doi.org/10.1038/s41598-020-62133-5 10. Vasudev RA et al (2020) Heart Disease Prediction Using Stacked Ensemble Technique. Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology 39(6):8249–8257. https://doi.org/10.3233/JIFS-189145 11. Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support vector clustering. Journal of Machine Learning Research 2:125–137 12. Breiman L (2001) Random Forests. Machine Learning. 45(1):5–32. https://doi.org/10.1023/A: 1010933404324 13. John GH, Langley P (1995) Estimating Continuous Distributions in Bayesian Classifiers. Proc. Eleventh Conf. on Uncertainty in Artificial Intelligence. Morgan Kaufmann 14. Hastie T, Rosset S, Zhu J, Zou H (2009) Multi-class AdaBoost. Statistics and Its Interface. 2(3):349–360. https://doi.org/10.4310/sii.2009.v2.n3.a8 15. Hall P, Park BU, Samworth RJ (2008) Choice of neighbor order in nearest-neighbor classification. Annals of Statistics. 36(5):2135–2152 16. Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186. https:// doi.org/10.1023/A:1022699900025 17. Tolles J, Meurer WJ (2016) Logistic regression relating patient characteristics to outcomes. JAMA 316(5):533–534. https://doi.org/10.1001/jama.2016.7653 18. Collobert R, Bengio S (2004) Links between perceptrons, MLPs and SVMs. In: Proceedings of international conference on machine learning (ICML 2004). Alberta 19. De Diego IM, Redondo AR, Fernández RR et al (2022) General performance score for classification problems. Appl Intell 52:12049–12063. https://doi.org/10.1007/s10489-021-030 41-7
The Impact of Information System and Technology of Courier Service During Pandemic Andre Timiko Siahaan, Bagas Rizkyka Pinajung, Kennidy, Ford Lumban Gaol, and Tokuro Matsuo
Abstract Courier services are a means of transport that can be used to deliver orders to customers or claim orders from them, and customers can use technology from the courier service like track order to track their order. The use of courier service nowadays is common that every day, people use courier service as their daily means of transport for delivering goods. However, in situations like pandemic, the courier services need to put more effort into their services, so that customers can use courier services with trust. The courier service must be more careful to deliver customer orders or send orders to customers. For this research methodology, the data collection technique using Google Forms and collecting data directly from people about their opinion toward the research have been employed. The result of the data collecting technique is depicted as the usage of the courier service during COVID19 pandemic starting from March 2020. The results vary every month because not everyone uses the courier service every day. Moreover, the conclusion on why the months starting from March to June do not show high usage of courier service is that the pandemic has not been that worse than the later months, and so, people were going out to deliver or to do activities like shopping or buying groceries on their own.
A. T. Siahaan · B. R. Pinajung · Kennidy School of Information System, Bina Nusantara University, Alam Sutra, 13960 Tangerang, Indonesia e-mail: [email protected] B. R. Pinajung e-mail: [email protected] Kennidy e-mail: [email protected] F. L. Gaol (B) Computer Science Department, BINUS Graduate Program—Doctor of Computer Science, Bina Nusantara University, 11480 Jakarta, Indonesia e-mail: [email protected] T. Matsuo Advanced Institute of Industrial Technology, Tokyo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_3
27
28
A. T. Siahaan et al.
Keywords Information system · Courier services · Pandemic
1 Introduction Courier services are organizations that offer delivery of parcels, money, papers, or special data. Courier services typically offer quick delivery times than other methods of shipping papers, and several services, and the latest society depends on them [1]. The fact of couriers and courier services has existed for nearly as long as society, with rulers in ancient times using couriers as a means of making new policies and decrees named across their regions [2]. The biggest courier service in the society is the United Parcel Service (UPS), which delivers more than 12 million parcels worldwide every single day. UPS had its origins as a courier service in the beginning of the twentieth century when it was known as the American Messenger Company [3]. UPS endured the slump and the World Wars and went on to prosper in the universal age. Throughout the times, it developed other courier services, involving the Motorcycle Messengers and several smaller European companies [4]. Federal Express (FedEx) and Dalsey Hillblom Lynn, now Deutsche post-AG (DHL), are other infamous universal illustrations of the courier service, both with their beginning in the early 1970s [5]. Though not close to as huge as UPS, each company controls great portions of the demand. In the latest era of the international marketplace, courier services have become the foundation for companies, even as new technologies such as fax machines and the Internet have made them not become dependency in some areas [6]. Nonetheless, technology has substituted multiple conventional uses of the contrabandist service, and there still lives a stable need to transmit material wares and papers over one and other little and large lengths as rapidly as possible [7]. Herewith, the problem statements about courier services are [8]: (a) What is courier service? (b) What is the effect of the COVID-19 pandemic on courier service? (c) How do courier services deal with the COVID-19 pandemic? The purpose of this research is [9]: (a) To know about what courier services is. (b) To find out about the impact of courier service during a pandemic. (c) To find out how courier service deals with problems faced during a pandemic. Frequently Asked Questions from people on the courier services are [10]: (a) (b) (c) (d) (e)
Are there any problems using a courier service during the pandemic? How often are courier services used during the pandemic? Is using a courier service more effective than picking up the goods yourself? Is the courier service very helpful in terms of delivery of goods? Is the usage of courier service decreasing in the pandemic state?
The Impact of Information System and Technology of Courier Service …
29
(f) What type of courier service is often used and why do you choose the courier service? This research is subjected to know about how the courier service used, how effective the courier service is, and the usage of the courier service for shipping goods during pandemic. This research shows how much was the courier service impacted during the pandemic.
2 Literature Review Courier is a direct delivery of goods. The job of being a courier has existed in the past, where someone has to run to deliver the messages or goods. From there, the word courier in English comes from the Greek “curere” which means to run [11]. The rise in the urban population coupled with the acceleration of urbanization has led to an increase in the demand for freight transportation within cities. These occurrences contribute to environmental and mobility issues, such as those caused by pollution in the air and congestion on the roads [12, 13]. In recent years, researchers and institutional authorities have focused their attention on challenges pertaining to Urban Inventory and Delivery (UID). The UID concept calls for a more integrated inventory and delivery system, one in which shippers, carriers, and movements are all coordinated and in which the freight of various customers and carriers are merged into the same environmentally friendly vehicles. The most fundamental purpose of UID is to lessen the negative effects that urban freight distribution has on the surrounding environment. In addition, the investigations being done by UID are intended to locate other collaborative and alternative network designs. These include the development of new environmentally friendly vehicles (for example, hybrid vehicles), the introduction of urban distribution centers and hub-satellite systems, and the optimization of vehicle routing in terms of travel times, CO2 emissions, and traveled kilometers. Other potential solutions include the introduction of hybrid vehicles. Delivery Service Providers, often known as DSPs, are an integral part of the UID system. Distribution of goods to clients is the primary focus of their business activity [14]. In addition to this, it is anticipated of them to provide delivery services in metropolitan regions, which feature a number of peculiarities such as locations with restricted traffic and traffic congestion, that are of a high quality and are priced cheaply [12]. These features provide additional route limits and sources of uncertainty, which has an effect on the performance of UID. Current research conducted by UID has focused on these topics and aims to reduce distribution costs as well as negative effects on the environment through the implementation of pickup and delivery routes that are both more efficient and dependable [14, 15]. In addition, the widespread use of the Internet and the rise of online shopping, particularly in the business-to-consumer sector, have also contributed to the transformation of freight distribution in metropolitan areas. This is especially true of the
30
A. T. Siahaan et al.
growth of e-commerce in the B2C market. Adaptations to the downstream supply chain affect a variety of factors, including the distribution network, the quantity of shipments, the types of shipments, and the number of loads carried on each tour. In addition, there are problems regarding the location of the delivery, the number of delivery stops, the number of delivery failures, the frequency of deliveries, the delivery time windows, the number of vehicles necessary, and the size of the vehicles [16]. The majority of deliveries made today consist of single orders that are shipped in relatively tiny packaging. According to Lim and Shiode [17], the primary problem that the UIDs face is posed by the growing demand for frequent shipments of small sizes. This demand is mostly driven by the rise of online shopping. The implementation of this new method of distribution results in an increase in both the number of delivery stops and the number of delivery locations. Because each vehicle is responsible for serving up to 110 consumers each day, they must visit around 78 different places throughout the span of a single day. In the context of this discussion, the number of processed parcels is approximately equivalent to the number of stops that occur during a tour. So, in order to increase their overall productivity, UIDs strive to make as many stops as possible with each vehicle. Increasing the productivity of UIDs would require first gaining an understanding of the factors that determine the number of stops per trip. The goal of this research is to provide a first insight into this issue, based on an empirical investigation of genuine UID data. This is a challenge for the researcher. Specifically, the purpose of this article is to determine a group of operational variables that are likely to have an effect on the productivity of LSPs and list them out. The number of stops that a driver makes to pick up or deliver packages can be used as a measurement tool to determine how productive a UID is. Through the examination of data obtained from an Italian company, a variety of LSP service indicators are chosen, and their correlations with the total number of stops are researched. The following is the structure of the paper: To begin, it is proposed to conduct a literature evaluation of pertinent UID investigations. After that, a description of the approach and a presentation of the empirical analysis come next. In the final step, the results are discussed, after which implications are drawn along with conclusions. The benefits of courier services during pandemic are: (a) Reliability Using courier service is known for its reliability, because the courier will treat the package like their own and will handle it with care until it is delivered [18]. (b) Affordability Courier service is the most affordable service to do shipping of the package or documents [18]. (c) Easy to use It is easy to use because the person who wants to send his/her package does not need to travel to deliver the package to the destination, and also the receiver does not need to arrive to pick up the package [18].
The Impact of Information System and Technology of Courier Service …
31
Factors affecting the use of courier service during pandemic are: (a) Market position: If the courier services are strategically located, people will go to their nearest courier services [19]. (b) Quality service: Quality service becomes determinant for the people to use the courier services [19]. (c) Responsibility: The courier service will take full responsible for the goods to deliver it to the destination address [19].
3 Research Methodology 3.1 Types of Research In this research, the types of research used are qualitative and quantitative research. Qualitative research focuses more on the aspect of deep understanding of a problem than looking at problems for generalization research [20, 21]. Qualitative research involves important efforts such as asking questions and procedures, collecting the required data from participants, and interpreting the meaning of the data being studied. According to [22, 23], quantitative research is a research method that focuses more on social cases, clarifies phenomena by grouping the numerical data analyzed, and uses mathematical-based methods in certain statistics [24, 25].
3.2 Variable Variable as a definition is a specific or different of a single or a group of people which can be said as an organization that (a) researchers can make a count for the quantity or conform and (b) differs on the single or cooperative rehearing.
3.3 Population According to [26], the population is an existing geographical generalization, such as an object or subject that has certain qualities and characteristics that can be used by researchers to make conclusions.
32
A. T. Siahaan et al.
3.4 Sample As stated in [27], the sample must represent the population. As per [28], the sampling is a bunch of populations. This means that a good sampling should define the entirety of the population, so the sample for this research study is conceptualized as such.
3.5 Types of Data There are two types of data: the first one is primary data and the second one is secondary data. A primary data have a similar meaning for a unique and also original data, that the researcher is collecting directly from the source based on the requirements of the researcher [29]. Data that are collected by the researcher are meant for a specific purpose. Secondary data mean that data that already been collected for a main reason or purpose also have been documented in other platform [30]. Also the data is collected by other investigator or someone for some purpose.
3.6 Data Collection Technique Data collection techniques used in this research are questionnaires using “google form”. Using this technique, the link is shared to the group of classes or friends, to collect the qualitative and quantitative data [31].
3.7 Data Analysis Technique Data analysis is the procedure of systematically relating statistical or logical techniques to draw and illustrate the data. This process is needed so that the characteristics of the data become easier to understand and useful as a solution to a problem, especially those related to the research [32].
3.7.1
Descriptive Analysis
Foundation of every data insight is the simplest also common in every usage of data in today’s business. The descriptive analysis solves the question that is given by making a summary in the previous data and making it in the form of dashboard [33].
The Impact of Information System and Technology of Courier Service …
3.7.2
33
Diagnostic Analysis
Diagnostic analysis takes the insight found from descriptive analytics to find the cause of the outcome. Organizations use this type of analysis, and it creates more and more connection between the data and identify the behaviors’ pattern [34].
3.7.3
Predictive Analysis
Predictive analysis takes use of the data that has been summarized; so, it can aim to achieve logical predictions for the outcome of the event. The analysis depends on statistical modeling, that requires manpower, and technology to forecast. It is very important to know that forecasting is an estimate, and accuracy that affects the predictions relies on the detailed data and the quality itself [35].
3.7.4
Prescriptive Analysis
Prescriptive analysis is the most famous and also the leader of the analysis in the data analysis itself. By mixing all information from every previous analysis, prescriptive analysis concludes the flow of action so that it can take action in the current problem or decision [36].
4 Research Result and Discussion The data are collected from people who used courier services during the COVID-19 pandemic in 2020. Figure 1 shows the use of courier services during the COVID-19 pandemic in 2020. From the data above, it can be said that the level of use of courier services from March to September has increased and decreased, while in November and December there has been a high increase due to national online shopping days and year-end discounts which has made the use of courier services to increase. According to the findings of the research, the quality of the courier service in e-commerce, when viewed from the point of view of the clients shopping online, is determined by the following dimensions: F1: Credibility—it includes the most significant aspects that influence the overall quality of the courier service from the point of view of the receiver. The elements that are taken into consideration include the timing of the delivery, whether or not the delivery was successful, compliance, and the order’s fullness. It also includes the package arriving undamaged and the order being processed in a timely and effective manner.
34
A. T. Siahaan et al.
Fig. 1 Usage of courier service during pandemic
F2: Image—it contains aspects that contribute to the formation of the image of the courier firm, such as the aesthetic and neat appearance of the courier, a distinguishing brand, and colors that are consistently used. Also, it encompasses both aesthetically pleasing and practically useful company branches or Pick-Up and Drop-Off (PUDO) sites. F3: Service complexity—it is directly associated with the provision of courier services, which may comprise a wide variety of supplementary services, in addition to a comprehensive scope and variety of service options. F4: Relational capital—it contains components that contribute to the courier company’s and clients’ long-term partnerships. The relationship includes the pleasant experiences of clients who have used the courier service as well as the opinions of other clients. They include trust in the courier firm, a positive image and brand, experience and reputation, and personnel expertise and competence. F5: Social responsibility—it includes specifics concerning the ethical obligations of courier businesses. Aspects like social action participation and the implementation of eco-friendly technical solutions are examples of how courier companies are making a difference in the world. Automobiles, electric drones, bicycles, and environmentally friendly packaging are all examples of problems that need to be solved. F6: Responsiveness—it is most often linked to pleasant interactions with the courier service’s staff. It is also connected to how soon staff members are ready to
The Impact of Information System and Technology of Courier Service …
35
address consumer complaints and interruptions. Included in this are the abilities to select a delivery date and location, amend the date and location of delivery if necessary, and a streamlined process for returning previously ordered items. F7: Technical quality—it incorporates a selection of features connected to the logistical parts of the service, such as cutting-edge technical solutions designed specifically for consumers, such as a network of PUDO and parcel lockers and drones. It includes the choice of either mailing or delivering the package, as well as the availability of the service through the provision of a convenient location for PUDO and regular business hours. The results of empirical research and statistical analysis indicated the determinants of the quality of courier service in e-commerce as perceived by e-customers. These determinants, particularly relational capital and social responsibility, had not been mentioned in the literature that was analyzed. According to the findings, the most significant factor that determines the quality of the courier service is reliability, which is measured by on-time delivery, a successful delivery attempt, the delivery of the entire package, and the absence of any damage to the parcel. This pattern was also supported by findings from other studies [37, 38]. The research conducted by [39] is found in line with the findings of the studies conducted by other authors, that customers valued the responsiveness of staff members while utilizing a courier service. In addition, the technological characteristics of the courier service were particularly significant for e-customers, which is in line with the findings of previous study but goes against the findings of international research [39]. Taking into account the research that was analyzed in the past, a new scale was built that showed new features of the quality of the courier service. These new aspects are significant from the perspective of e-customers.
5 Conclusion This research work has exposed a lot of information relating to courier service information management. Also it has been observed that with the trend in technology, most businesses are computerized, and with the computerization of the process for parcel delivery record, the courier service company can easily track purchases and online product ordering records.
36
A. T. Siahaan et al.
References 1. Chuang SH, Liao C, Lin S (2013) Determinants of knowledge management with information technology support impact on firm performance. Inf Technol Manag 14:217–230 2. Atiqah NAR, Eta W, Hazana NA (2015) Service quality: a case study of logistics sector in Iskandar Malaysia using SERVQUAL model. Proc Soc Behav Sci 2015(172):457–462 3. Nurul I, Liu Y, Cheng JK (2016) A review of logistics related to knowledge management. J Sci Int 29:527–531 4. Liao C, Chuang S, To P (2011) How knowledge management mediates the relationship between environment and organizational structure. J Bus Res 64:728–736 5. Yuan X, David B, Grant A, McKinnon C, John F (2010) Physical distribution service quality in online retailing. Int J Phys Distrib Logist Manag 40:415–432 6. Weera C, Ratchasak S, Witthaya M (2015) The development of physical distribution center in marketing for small and micro community enterprise (SMCE) product in Bangkontee, Samut Songkram. Proc Soc Behav Sci 207:121–124 7. Rafay I, Clifford C, Defee B, Gibson J, Uzma R (2016) Realignment of the physical distribution process in omni-channel fulfilment. Int J Phys Distrib Logist Manag 46:543–561 8. Gabriel F, Michal W, Marek B (2012) The shipments flow simulations flow in courier company. In: Proceedings of the Carpathian logistics congress 2012, Jesenik, Czech Republic, 7–9 November 2012 9. George NT, Athanasopoulos V, Zeimpekis IM (2014) Integrated planning in hybrid courier operations. Int J Logist Manag 25:611–634 10. Kucera T (2020) Calculation of personnel logistics costs of warehousing. In: Proceedings of the 24th international scientific conference on transport means, online, 30 September–2 October 2020, vol 2020, pp 44–48 11. Fugate BS, Chad WA, Beth DS, Richard NG (2012) Does knowledge management facilitate logistics-based differentiation? The effect of global manufacturing reach. Int J Prod Econ 139:496–509 12. Benjelloun A, Crainic TG (2008) Trends, challenges and perspectives in city logistics. In: Proceeding of transportation and land use interaction, Tralsu 2008, Bucarest, Romania 13. Browne M, Allen J, Nemoto T, Patier D, Visser J (2012) Reducing social and environmental impacts of urban freight transport: a review of some major cities. Proc Soc Behav Sci 39:19–33 14. Ehmke JF, Mattfeld DC (2012) Vehicle routing for attended home delivery in city logistics. Proc Soc Behav Sci 39:622–632 15. Ehmke JF, Meisel S, Mattfeld DC (2012) Floating car based travel times for city logistics. Transp Res Part C Emerg Technol 21(1):338–352 16. Rotem-Mindali O, Weltevreden JW (2013) Transport effects of e-commerce: what can be learned after years of research? Transportation 40(5):867–885 17. Lim H, Shiode N (2011) The impact of online shopping demand on physical distribution networks: a simulation approach. Int J Phys Distrib Logist Manag 41(8):732–749 18. Achit A, Vinod KY (2015) Impact of technology in e-retailing operations: a consumer perspective. Proc Soc Behav Sci 189:252–258 19. Eleni G, Lidia G, Pierre-Jean B (2014) Creativity for service innovation: a practice based perspective. Int J Manag Serv Qual 24:23–44 20. Tierney P, Farmer SM (2002) Creative self-efficacy: potential antecedents and relationship to creative performance. Acad Manag J 45:1137–1148 21. Hershberger SL (2003) The growth of structural equation modelling: 1994–2001. Struct Equ Model 10:35–46 22. Runco MA, Garnet M, Selcuk A, Bonnie C (2010) Torrance tests of creative thinking as predictors of personal and public achievement: a fifty-year follow-up. Creat Res J 22:361–368 23. Shook CL, Ketchen DJ Jr, Hult GTM, Kacmar KM (2004) An assessment of the use of structural equation modelling in strategic management research. Strateg Manag J 25:397–404
The Impact of Information System and Technology of Courier Service …
37
24. Sarin S, McDermott C (2003) The effect of team leader characteristics on learning, knowledge application, and performance of cross-functional new product development teams. Decis Sci 34:707–739 25. Hair JF, Wolfinbarger CM, Money AH, Samouel P, Page MJ (2011) Essentials of business research methods. Sharpe, Armonk, NY, USA 26. Yusuf S (2009) From creativity to innovation. Technol Soc 31:1–8 27. De Brentani U (1991) Success factors in developing new business services. Eur J Mark 25:33–59 28. Melton HL, Hartline MD (2013) Employee collaboration, learning orientation, and new service development performance. J Serv Resour 16:67–81 29. Marco B, Eleonora DM, Roberto G (2012) Codification and creativity: knowledge management strategies in KIBS. J Knowl Manag 16:550–562 30. Kratzer J (2001) Communication and performance: an empirical study in innovation teams. Tesla Thesis Publishers, Amsterdam, The Netherlands 31. Poul HA, Hanne K (2013) Managing creativity in business market relationships. Ind Mark Manag 42:82–85 32. Sun YS, Jin NC (2012) Effects of team knowledge management on the creativity and financial performance of organizational teams. Organ Behav Hum Decis Process 118:4–13 33. Hair JF, Black WC, Babin BJ, Anderson RE (2010) Multivariate data analysis; prentice hall: Englewood cliffs. NJ, USA 34. Baruch Y, Holtom BC (2008) Survey response rate levels and trends in organizational research. Hum Relat 61:1139–1160 35. Cohen J, Cohen P (1983) Applied multiple regression/correlation analysis for the behavioural sciences, 2nd edn. Erlbaum, Hillsdale, NJ, USA 36. Armstrong JS, Overton TS (1977) Estimating nonresponse bias in mail surveys. J Mark Res 14:396–402 37. Ho JSY, Teik DOL, Tiffany F, Kok LF, Teh TY (2012) Logistic service quality among courier services in Malaysia. Int Proc Econ Dev Res 38(2012):113–117 38. Valaei N, Rezaei S, Shahijan MK (2016) CouQual: assessing overall service quality in courier service industry and the moderating impact of age, gender and ethnicity. Int J Manag Concepts Philos [e-journal] 9(2):144–169. https://doi.org/10.1504/IJMCP.2016.077770 ´ 39. Dmowski P, Smiechowska M, Zelma´nska M (2013) Jako´sc´ jako czynnik buduj˛acy przewag˛e konkurencyjn˛a na rynku usług kurierskich. In: Rosa G, Smalec A (eds) Marketing przyszło´sci. Trendy. Strategie. Instrumenty. Uniwersytet Szczeci´nski, Szczecin, pp 167–180
Event Detection in Social Media Analysis: A Survey G. Akiladevi, M. Arun, and J. Pradeepkandhasamy
Abstract An event is defined by the attributes who, what, where, when, and how, and an event tweet usually contains these basic aspects. Real-time events are events happening presently or happened a short time back. Social media is a way to associate different types of interrelated domains. The biggest social media platforms are YouTube, Facebook, Instagram, and Twitter. Social networking is a platform that allows people from similar background or with similar interest to connect online. The objective of event detection is to predict the local and the global event that happened. Events constrained by time and geography, those that occurred in the nearby areas are analyzed by local event detection. Contrarily, global event detection identifies events that have a greater worldwide impact, such as COVID, wars. Social media event detection is a tool for content analysis in which the processes automatically detect the topic present in text and reveal the hidden pattern in the corpus. The major goal is to provide a thorough summary of current revelations in the area, aiding the reader in comprehending the primary issues covered thus far and suggesting potential directions for future research. Keywords Event · Social media · Facebook · Instagram · Twitter · Event detection
G. Akiladevi (B) Department of Computer Applications, Kalasalingam Academy of Research and Education, Krishnankoil, India e-mail: [email protected] M. Arun · J. Pradeepkandhasamy Faculty of Computer Applications, Kalasalingam Academy of Research and Education, Krishnankoil, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_4
39
40
G. Akiladevi et al.
1 Introduction The social networking is used to spread the information of emergent events. Emerging events like natural disasters, political events, commercial news and silver screen news etc., need to be reported in real time when observed by people. However, it is challenging to filter and quickly scan through postings, given the volume of brief, noisy messages that are now available on social networks. Therefore, having a mechanism that can automatically perform this task in real time would be very beneficial in various aspects; for example, disaster detection and epidemic outbreak filtering, and analyzing in social network will be helpful for government departments in disasterresponse and epidemic prevention. Due to social posts’ short length, nosiness, and anonymity, the conventional method for processing large, formal, and structured documents is less effective. Recent event detection techniques address these issues by taking advantage of the opportunities provided by the huge amount of data available on social networks. Event detection is the process of evaluating event streams to uncover collections of events that match event patterns in a context. The patterns and conditions of events establish event types. Event type subscribers should be alerted if a set of events matching an event type’s pattern are found during the analysis. As part of the study, events are often filtered and aggregated [1]. The main contribution of this research is the description of different methods, such as machine learning-based, deep machine learning-based, rule-based, and other approaches. Support Vector Machines (SVM), K-Nearest Neighbor (KNN), Naive Bayes (NB), Random Forests (RF), and Locality Hashing (LSH) were classified as machine learning-based approaches. In contrast, Long Short-Term Memory (LSTM), Convolution Neural Network (CNN), Deep Neural Network (DNN), and Recurrent Neural Network (RNN) were classified as deep machine learning-based approaches. Rulebased classifiers and rule-based procedures are both components of rule-based approaches. Natural Language Processing (NLP) approaches include Probabilistic Latent Semantic Analysis (PLSA), Word2vec, Term Frequency-Inverse Document Frequency (TF-IDF), and E-divisive with Median. In the future, researchers could use graph neural networks and block chain-based classifiers, which provide more accuracy in classification. Event Detection Event detection refers to recognizing significant happenings in social media by analyzing the text. As a new, dependable source of news and information, social media platforms have emerged. Because social media is becoming more popular, event detection is gaining popularity. Social media delivers all the data required for event detection from text and imagery, identifying significant postings and posters, and establishing a warning system in the case of a disaster or other event. This study discusses numerous event detection techniques that have been suggested for identifying popular events, significant posts, and important posters. Various models for analyzing data from social media have also been described. Depending on the event
Event Detection in Social Media Analysis: A Survey
41
Fig. 1 Vector space model
detection, it is categorized into four phases. The first phase entails gathering raw data from various social media sites. The preprocessing technique has been applied in second phase. Data Preprocessing Data preparation is the process of transforming raw data into an understandable format. In the real world, data comes in the form of text, images, videos, and so on. These data are erratic, fragmentary, and lack a consistent format. In NLP, various data preprocessing models are available. Vector Space Model Each word in the vector space model is represented by a vector dimension. Each distinct word has its own dimension. Vector space model is shown in Fig. 1. Tokenization Tokenization is used in NLP to extract words from a phrase. Lower Case Lower case is used in NLP to transform sentences to lowercase. Stop Words Removal There are many stop words in each sentence, yet there is no indication of those in the sentence’s meaning. As a result, the stop words are disregarded. Stemming NLP processing requires transforming a word from its root node known as stemming. White Space Removal Extra white spaces are deleted in NLP since there is no meaning for those in the sentence.
42
G. Akiladevi et al.
Fig. 2 Event detection in social media
In third phase, an extraction method is utilized to reduce the quantity of data require to represent the large set of data. Finally, event detection based on local and global burst is shown. Figure 2 entitles the social media event detection. Event Extraction Social posts are short and easy to read, and they include pictures, time stamps, geolocation, user profiles, and social contacts, among other things, which are described by using 5 Vs, i.e., volume, velocity, variety, veracity, and value. The sheer volume of text generated each day is mind-boggling [2]. With millions of data feeds being produced every day in the form of articles, blog posts, comments, manuscripts, and other formats, the capacity to automatically organize and manage them is becoming more and more necessary. One of its popular applications is Event Extraction, which is the process of gathering knowledge about periodic incidents found in texts by automatically identifying what happened and when it happened. Understanding the event across streams of social media messages, using keywords and entities is referred to as event extraction. In the event of natural disasters or any other event, event extraction aids in taking speedier corrective action. How the events are collected from the database has been described in this study. There are three methods discussed: machine learning-based event extraction, deep
Event Detection in Social Media Analysis: A Survey
43
learning, and graphical neural networks. The event extraction based on machine learning is depicted in Fig. 3. The first step is to collect raw data, followed by preprocessing, word embedding for word segmentation, and the calculation of a distance matrix. Finally, the events that precipitated them are discovered. Figure 4 depicts deep learning-based event extraction. There are five stages covered. Initially, raw data such as text, image, and video are gathered. Secondly, the preprocess has been completed. Third, characteristics are extracted from the data. Lastly, the retrieved feature is sent into the input layer, which includes a convolutional layer, a pooling layer for extracting the most accurate features, and a training and validation process. Figure 5 shows event extraction using a graphical neural network. The event is detected using a graphical network that includes word embedding, graphical embedding, and a graphical neural network (input layer, convolutional layer, and pooling layer).
Fig. 3 Event extraction on machine learning
44
G. Akiladevi et al.
Fig. 4 Event extraction on deep learning
2 Literature Review Social media is exploited by billions of individuals worldwide to exchange information, interact with family and friends, learn new things, pursue interests, and enjoy themselves on a personal level. Social media can be utilized professionally to improve
Event Detection in Social Media Analysis: A Survey
45
Fig. 5 Event extraction on graphical neural network
one’s knowledge of a subject and establish a professional network by communicating with other business professionals. Social media at the business level facilitates interaction with the public, analyze consumer insights, and promotes corporate development. At a specific moment and place, an event takes place with consequences. As a result of the consequences, people may take action on social media, allowing the event to be replicated in activity on virtual communities. Figure 6 entitles the detected events during survey.
46
G. Akiladevi et al.
Fig. 6 Detected events during survey
3 Localized Burst Detection An event or incident that is restricted in time and has a limited spatial range is local event. Spatial event is analyzed by investigating, how graph clustering is applied to detect geo-located communities, and fast greedy optimization on modularity clustering algorithm is used to find out the similarity in community. Avventi [3] has proposed a designed support system for detection and damage assessment of earthquake in Italy. Yigitcanlar et al. [4] applied a burst detection algorithm in order to identify the outbreak events promptly. Outbreak occurrences were quickly identified in four ways. During the screening process, the term “disaster”
Event Detection in Social Media Analysis: A Survey
47
was used as a keyword to collect posts about natural disasters. Second, the Nvivo program screened all the tweets for repetition and discovered a word frequency. Thirdly, the word “co-occurrence” was discovered. Finally, a geographical analysis was performed. It is utilized in the context of environmental issues to comprehend how social media metrics might help government officials estimate the damage connected with natural catastrophe impacts on metropolitan regions. The social media analytics approach was used to analyze the sentiment and content of locationbased Twitter messages from Australia (n = 131,673), which is a test case study of Australian states and territories. This gives information about the effects that a certain catastrophic event might have on a city, a town, or a region. The other type of local event detection study is known as “offensive language,” which can be used to bully or harm the feelings of a person or a community. Cyber bullying that targets specific people, celebrities, politicians, products, and groups of people is started by abusive language in internet comments. Hajibabaee [5] has proposed a text-based offensive language detection system for social media that includes a modular cleaning phase, a tokenizer, three embedding techniques (Adaboost, SVM, and NLP), as well as eight classifiers. Dataset obtained from the dataset. On the other hand, the Community Detection based on Fish School Effect (CDFSE), is a novel way for dividing community structure. An ocean network contains a wide variety of fish species, and at the initial stage, fish of the same species attract one another due to their similar characteristics, forming an initial group. Similarity is used to build large-scale groups based on their shared characteristics. The Admic-Adar index expression is used to calculate individual similarity. Excellent community detection, no parameters, and significant scalability are all features of CDFSE [6]. In other aspects of localized events are traffic detection. Traffic accident detection is an important strategy for government for the implementation of accident reduction. To extract tweets about traffic accidents based on keyword match and the maximum occurrence of a word retrieved using n-grams, Suat-Rojas et al. [7] proposed a method as named entity recognition technique to detect location of accident and then passing the location through a geocoder, which returned their geographic coordinates. George et al. [8] proposed an online spatiotemporal community as dynamical communities in multilayer spatial interaction networks to analyze its spatiotemporal pattern. A spatiotemporal event detection system based on social media that can detect events at different time and spatial resolutions is examined using exploratory analytic approaches and the Leiden technique. This method includes a variety of social media datasets from Twitter and Flickr for numerous cities, including London, Paris, Melbourne, and New York. The challenge of the unknown spatial resolution of occurrences is addressed by using the quad-tree method to divide the geographical space into multiple scale regions based on the density of social media data. Then, an unsupervised statistical method using the Poisson distribution and a smoothing technique is employed to identify areas with an unusually high density of social media posts. By designing itineraries that avoid identified incidents like accidents, can also enhance the work of tour recommendations.
48
G. Akiladevi et al.
Jonnalagadda and Hashemi [9] proposed a multi-modal method for detecting traffic events. The traffic events are characterized by a multi-modal Deep Convolution Network structure in a semi-supervised architecture. The dataset is characterized as a four-month accumulation of social media and traffic sensor observations from the San Francisco Bay Area. An important issue, how to explain the temporal interaction of multi-modal data and derive more accurate representations from it using attention mechanisms is currently being considered. This model has the drawback of not supporting multi-class or multi-label classification, which would give users knowledge that is more comprehensible. An emerging topic seems to be the exploitation of traffic-relevant data from social media. Yuanyuan et al. [10] used LSTM models and their integration with CNN models. The trained word embeddings were used as input and LSTM-CNN to extract traffic-relevant micro blogs. Real-time surveillance of numerous portions of the Italian road network was done using the traffic detection system. The text mining and classification techniques were applied to find out the traffic. Summary of Local Event detection in social media is shown in Table 1.
4 Global Burst Detection A global event is an organized event in which people from all over the world take part. There are many distinct types of global events, and they all have a different impact. The host of global events are the Olympic game, Expos, world cups, Cultural festivals, etc. The diffusion rate of information over a social media platform or network is used to achieve global burst behavior. Jiang [16] proposed a transformative social media-based pandemic monitoring system that combines social media extraction and a strong web-based systems integration. It produces the dynamic knowledge graph known as entity recognition. The incidence rate and mortality of pandemics were estimated using a dynamic graph neural network. The dependable full-stack online solution also makes use of social media mining forecasts to provide users with a simple interface for tracking pandemics across numerous regions. Twitter analytics and public perceptions of COVID-19 offer insights into user sentiments around the pandemic epidemic. Andreadis et al. [17] delineated a novel framework for event detection, community detection, and topic detection, as well as visualizations. There is a new system for collecting, analyzing, and visualizing Twitter postings that has been developed to exclusively monitor the spread of the virus in severely afflicted Italy. It evaluates a deep learning geotagging method that places geotags on postings according to the textual descriptions of their locations; a face detection system that calculates how many persons are in posted pictures; and Twitter user communities are detected using a community identification method. Furthermore, it entails a deeper examination of the gathered posts in order to anticipate their dependability and identify trending topics and events. Last but not least, it presents an online platform with a visual analytics dashboard that displays the subject, community, and user results as well as
Chen et al. [12]/ Social media 2021
Capdevila et al. [13]/2017
Jianxiang et al. [14]/2021
Roy and Hasan [15]/2021
2
3
4
5
Social media
Social media
Social media
Social media
Shoyam et al. [11]/2021
1
Media
Author reference/year
S. No. Flood monitoring systems
Feature
IO-HMM model, word2vec
Conventional method (Lynchian elements), text mining, image processing, clustering analysis, kernel density estimation, and sentiment analysis
Tweet-SCAN (DBSCAN, GDBSCAN)
Evacuation traffic predictions
Urban planners (Tri-City Region of Gdansk, Sopot and Gdynia in Northern Poland)
Festival event
Multi-modal Traffic event generative adversarial detection network (RNN, LSTM)
E-divisive with Medians
Method
Table 1 Summary of local event detection in social media
Urban management (tourist)
Find local event geo space and time (wine tasting, and food market events)
Smart city expert system
Detect local multiple-flood events, to alert broad area
Advantage
Tweets (Florida Hurricane Evacuation-decision-making Irma-September 2, 2017 and September 19, 2017)
Social media—Instagram (photo and video), Twitter (short messages)
TwitterAPI (Barcelona local festival-Twitter—19th to the 25th of September)
Traffic sensor data (Caltrans performance measurement system, SanFran ciscoBay), geo—tagged twitter dataset
Twitter, MLIT
Dataset
Event Detection in Social Media Analysis: A Survey 49
50
G. Akiladevi et al.
an interactive map that exhibits and filters analyzed posts based on the outcomes of the localization technique. The bi LSTM-based model, which was trained specifically for the English and Italian languages, was tested on the CoNLL2003 dataset for English and the Evalita2009 dataset for Italian. Cantini et al. [18] proposed a new technique called TIMBRE (Time-aware opinion mining through BOT removal), which aims to determine the polarity of social media. Increasing politician and political events are enticing their electors the usage of social media. In America federal election of 2016, applicants from each event made heavy use of social media in a particular tweet. It will train in a deep CNN to classify place names by state. Budiharto and Meiliana [19] defined political campaign. Twitter is a very well social media site for micro blogging. Predicting the future of the presidential election can be done using the rapid Twitter data generated in response to the political campaign, as it has been in a number of nations, including the United States, Great Britain, Spain, and France. Using tweets from Indonesian presidential candidates and tweets with important hash tags gathered between March and July 2018, the authors make predictions about the outcome of the Indonesian presidential election. The authors analyzed Twitter data from March 2018, when the election discourse first began, through July 2018 (The time this experimental work was conducted). The Twitter app was used to find hash tags based on popular opinion. The Twitter API accepts arguments and returns data from Twitter accounts in order to access tweets. The following fields were preserved with the retrieved tweets: twitter-id, hash tag, tweet-created, tweet-text-retweet, and favorite-count. Researchers may retrieve hash tag data using the database. Using these facts as a basis, the authors proposed a novel method for predicting election results. Summary of Global Event detection in social media is shown in Table 2.
5 Conclusion Overview of event detection techniques is used in online social networks that enable users to hold discussions, share information, and create digital content. There are a variety of social media platforms, including social networking sites, wikis, blogs, micro blogs, webcasts, gadgets, virtual communities, and photo- and video-sharing websites. Modern algorithms for event detection on online social networks make use of text mining methods applied to pre-existing datasets that are handled with no restrictions on computational complexity or required execution time per document analysis. Two categories of event detection methods are localized events and global burst detection. Classification was carried out using supervised, unsupervised, or hybrid techniques, depending on the underpinning detection technique used in the event detection. Detecting local and global events from social media has a lot of benefits and also alerts people for disaster detection and management, traffic accidents, and pandemic outbreaks. The future scope of these analyses and the detected events must be trustworthy. In particular, several researchers avoided sharing untrustworthy
Event Detection in Social Media Analysis: A Survey
51
Table 2 Summary of Global Event detection in social media S. No.
Author Media reference/ year
Method
1
Kaleel Tweet and LSH [20]/2013 Facebook
2
Lejeune News et al. [21]/ 2015
3
Feature
Dataset
Advantage
Daniel system Epidemic events
Wikipedia
Multilingual system
Sun et al. Twitter [22]/2021
RS scoring method, word2vec, PLSA, expectation maximization
Natural event
Sina Weibo API and TwitterAPI
Better efficient and accuracy, high quality tweet
4
Aldhaheri Twitter and Lee [23]/2017
TF-IDF, g Naive Bayes classifier, Kalman, and particle filters
Location-based Sep. 17, and geo-spatial 2014 until event detection Nov. 20, 2014 Twitter data
Higher event detection model, high accuracy
5
Li et al. [24]/2017
TweetNLP, OpenCalais, word embedding, rule-based approach, and the temporal specific word embedding
New event, old event
Twitter, High 4 weeks, precision from 12/10/ 2012
6
Pradhan Twitter et al. [25]/ 2019
Jaccard similarity, Simpson similarity, and Silhouette Coefficient
Aspect event
TwitterAPI
First story detection
events by using deep learning, natural language processing, and state-of-the-art machine learning-based on supervised and unsupervised learning. Future block chain technology-based integrated systems will enhance system security and transparency to prevent the spread of false information about events on social media. Nowadays social media provide tremendous opportunity to forecast research opportunity. In several fields, the event analysis method aids decision-making and forecasting.
52
G. Akiladevi et al.
References 1. Nazmus Saadat Md, Kabir H, Shuaib M, Nassr RM, Husen MN, Osman H (2021) Research issues & state of the art challenges in event detection. In: 15th international conference on ubiquitous information management and communication (IMCOM) 2. Sreenivasulu M, Sridevi M (2018) A survey on event detection methods on various social media. Springer Nature Singapore Pte Ltd.; Sa PK et al (eds) Recent findings in intelligent computing techniques. In: Advances in intelligent systems and computing, vol 709. Springer Nature 3. Avvenuti M (2014) A real time decision support system for earthquake crisis management. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, pp 1749–1758 4. Yigitcanlar T, Regona M, Kankanamge N, Mehmood R (2022) Detecting natural hazard related diester impacts with social media analytics: the case of Australian States and territories, MDPI. Sustainability 14:810 5. Hajibabaee P (2022) Offensive language detection on social media based on text classification. In: IEEE 12th annual computing and communication workshop and conference (CCWC), pp 0092-0098 6. Sun Y, Sun Z, Chang X, Pan Z, Luo L (2022) Community detection based on fish school effect. IEEE Access 7. Suat-Rojas N, Gutierrez-Osorio C, Pedraza C (2022) Extraction and analysis of social networks data to detect traffic accidents. Information 13(1):26. https://doi.org/10.3390/info13010026 8. George Y, Karunasekera S, Harwood A, Lim KH (2021) Real-time spatio-temporal event detection on geotagged social media. J Big Data 9. Jonnalagadda J, Hashemi M (2021) A deep learning-based traffic event detection from social media. In: IEEE 22nd international conference on information reuse and integration for data science (IRI), pp 1–8. https://doi.org/10.1109/IRI51335.2021.00007 10. Chen Y, Lv Y, Wang X, Li L, Wang F-Y (2018) Detecting traffic information from social media texts with deep learning approaches. IEEE Trans Intell Transp Syst 20(8):3049–3058 11. Shoyama K, Cui Q, Hanashima M, Sano H, Usuda Y (2021) Emergency flood detection using multiple information sources: integrated analysis of natural hazard monitoring and social media data. Sci Total Environ 767:144371. ISSN 0048-9697 12. Chen Q, Wang W, Huang K, De S, Coenen F (2021) Multi-modal generative adversarial networks for traffic event detection in smart cities. Expert Syst Appl 177:114939. ISSN 0957-4174 13. Capdevila J, Cerquides J, Nin J, Torres J (2017) Tweet-SCAN: an event discovery technique for geo-located tweets. Pattern Recogn Lett 93:58–68. ISSN 0167-8655 14. Huang J, Obracht-Prondzynska H, Kamrowska-Zaluska D, Sun Y, Li L (2021) The image of the city on social media: a comparative study using “big data” and “small data” methods in the tri-city region in Poland. Landscape Urban Plann 206:103977. ISSN 0169-2046 15. Roy KC, Hasan S (2021) Modeling the dynamics of hurricane evacuation decisions from twitter data: an input output hidden Markov modeling approach. Transp Res Part C Emerg Technol 123:102976. ISSN 0968-090X 16. Jiang J-Y, Zhou Y, Chen X, Jhou Y-R, Zhao L, Liu S, Yang P-C, Ahmar J, Wang W (2021) COVID-19 surveiller: toward a robust and effective pandemic surveillance system based on social media mining. Phil Trans R Soc A 380:20210125 17. Andreadis S, Antzoulatos G, Mavropoulos T, Giannakeris P, Tzionis G, Pantelidis N, Ioannidis K, Karakostas A, Gialampoukidis I, Vrochidis S, Kompatsiaris I (2021) A social media analytics platform visualising the spread of COVID-19 in Italy via exploitation of automatically geotagged tweets. Online Soc Networks Media. 23:100134. ISSN 2468-6964 18. Cantini R, Marozzo F, Talia D, Trunfio P (2022) Analyzing political polarization on social media by deleting bot spamming. Big Data Cogn Comput 6(1):3 19. Budiharto W, Meiliana M (2018) Prediction and analysis of Indonesia presidential election from Twitter using sentiment analysis. Big Data 5:51. https://doi.org/10.1186/s40537-018-0164-1
Event Detection in Social Media Analysis: A Survey
53
20. Kaleel SB (2013) Event detection and trending in multiple social networking sites. In: Proceedings of the 16th communications & networking symposium. Society for Computer Simulation International 21. Lejeune G, Brixtel R, Doucet A, Lucas N (2015) Multilingualevent extraction for epidemic detection. Artif Intell Med 65(2):131–143 22. Sun X, Liu L, Ayorinde A, Panneerselvam J (2021) ED-SWE: event detection based on scoring and word embedding in online social networks for the internet of people. Elsevier B.V. on behalf of KeAi Communications Co. Ltd 23. Aldhaheri A, Lee J (2017) Event detection on large social media using temporal analysis. In: IEEE 7th annual computing and communication workshop and conference (CCWC), pp 1–6. IEEE 24. Li Q, Nourbakhsh A, Shah S, Liu X (2017) Real-time novel event detection from social media. In: IEEE 33rd international conference on data engineering (ICDE), pp 1129–1139. https:// doi.org/10.1109/ICDE.2017.157 25. Pradhan AK, Mohanty H, Lal RP (2019) Event detection and aspects in twitter: a BoW Approach. In: Distributed computing and internet technology, vol 11319, ISBN: 978-3-03005365-9
Artificial Intelligence Mechanism to Predict the Effect of Bone Mineral Densıty in Endocrıne Diseases—A Review Vivek Duraivelu, S. Deepa, R. Suguna, M. S. Arunkumar, P. Sathishkumar, and S. Aswinraj
Abstract As a rigid human organ, bone supports the human body and helps in the formation of shapes; it also protects many important parts of the human body. A healthy bone is crucial for humans for three main reasons: It promotes mobility, protects internal organs and blood cells formulation, and stores nutrients. As a result, researchers discovered stable bone mineral density (BMD) as an indicator for bone health. As BMD plays a key role in maintaining bone health, many clinical studies are proposed to perform the prediction of BMD and fracture risk level based on the gender and age descriptions. BMD is not only related with bone metabolism (BM), and it also correlates with Endocrine Diseases (EDs). Various hormone imbalance and biominerals are estimated to measure the deficiency of endocrine disorder, which mainly causes Diabetes Mellitus type 1, type 2 (T1DM, T2DM), hyperthyroidism, and hypothyroidism. Factors associated with bone mineral density and endocrine disorder are recognized with various proven medical studies. Therefore, the relationship between BMD and ED is linked and categorized for performing further analysis. Comparison of different prediction and classification algorithms used in the correlation of BMD and ED is discussed with the help of Artificial Intelligence Techniques (AITs). The above discussion on AIT is summarized with a dataset (image, biological V. Duraivelu (B) Department of Computer Science and Engineering, B V Raju Institute of Technology, Narsapur, Telangana 502313, India e-mail: [email protected] S. Deepa · S. Aswinraj Department of Computer Technology, Kongu Engineering College, Perundurai, Tamil Nadu 638060, India e-mail: [email protected] R. Suguna · P. Sathishkumar Department of CSE, Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu 638401, India M. S. Arunkumar Vel Tech Rangarajan Dr, Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu 62, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_5
55
56
V. Duraivelu et al.
data) related to BMD and ED. Henceforth, the evaluation metrics of different models are categorized and compared to show the requirement of AIT in the prediction of bone health. Keywords Bone mineral density (BMD) · Endocrine disease (ED) · Artificial intelligence (AI) · Rigid bone · Diabetes mellitus and thyroid disorders (TDs)
1 Introduction The combination of protein collagen and mineral calcium forms bone in human body. Bone is the main support for humans to perform different actions and performance traits. Without bone, there are no action and no structure in external muscles. Therefore, healthier bone and continuous growing tissue supplements are necessary to be in control and manage the density level of bone in human body. Age and sex characteristics will have difference in bone model and metabolism. The density level will change for male and female at various levels of ages. Normally, gender disparities with respect to bone density are studied. This comparative study shown that males have higher bone density than females and achieved it in later ages. It has also been proven that 0.82% per year for male and 0.96% per year for female bone loss at femoral neck are estimated. In the same study it is also shown that age range, where the rapid decline occurred for men in 74–79 years but for female in 65–69 years. The calcium and other types of minerals, which present in human bone, are measured by bone mineral density test. Bone mineral density (BMD) is very helpful to provide and detect osteoporosis and also predict the fracture risk. The BMD test can be done using various methods, which use dual-energy X-rays’ absorptiometry (DEXA). The test helps to find the future diagnosis of fracture by determining Tscore value of the healthy or normal adults. The T-score between + 1 and − 1 is said to be normal or healthy which is formulated by World Health Organization. Table 1 shows the level of bone density with different categorizations. The BMD is also correlated with endocrine diseases, where the bone metabolism is decreased based on the changes in endocrine glands. Here, various biochemical metabolism and clinical factors are analyzed to measure the BMD. This biochemical factor or material in BMD identifies the endocrine disorder, which can predict the diseases in future. The different endocrine systems’ disorder is listed in Table 2, and Table 1 Level of BMD
S. No.
Level
T-score definition
1
Normal
+ 1 to −1
2
Low bone mass
−1 to −2.5 SD
3
Osteoporosis
−2.5 SD or lower
4
Severe osteoporosis
More than 2.5 SD
Artificial Intelligence Mechanism to Predict the Effect of Bone Mineral …
57
Table 2 Nutrients of endocrine system S. No.
Endocrine system
Nutrient alteration
Disorder
1
Thyroid
Iodine deficiency
Hyperthyroidism, hypothyroidism, and autoimmune thyroid disease
2
Bone
Calcium and vitamin D deficiency
Osteoporosis
3
Metabolic
Cow milk consumption and malnutrition
Type 1 diabetes, malnutrition-related diabetes mellitus
4
Gonads
Caloric excess, chromium deficiency and anorexia nervosa
Diabetes, bone loss
5
Growth
Macro and micro nutrient deficiency
Short stature
additionally, it also shows the nutrients of each endocrine system as mentioned. The endocrine diseases are decomposed with emergency medical traits such as diabetic ketoacidosis, thyroid storm, acute adrenocortical insufficiency. The repeated cycle of bone growth and resorption are performed with the dynamic relationship among osteoclasts, osteoblasts, and an array of hormonal influences. The bone metabolism is regularized by multiple environmental signals, which includes chemical, mechanical, electrical, and magnetic properties. The bone cellular compartments respond to the signal, which modulates the balance between new bone formation and remodeling bone as studied. The three cells, osteoblasts, osteocytes, and osteoclasts, are derived from stem cells’ lineage and hematopoietic lineage, which is present in between immune and bone. Diabetes mellitus (DM) and thyroid dysfunction (TD) coexist in patients. The hypothyroidism and hyperthyroidism are more common in diabetes patients. Due to the reduction in thyroid stimulating hormone levels and the conversion of T4 to T3 in peripheral tissues, the nodule formation increases the size of goiter. These functions will increase the impact in glucose, lipid, and protein level, which reduces the insulin level in the metabolism. Several studies prove that the DM and TD are closely interlinked disorder with which the central and peripheral control of hormone mechanism gets stimulated to metabolic syndrome.
2 Bone Fracture Computer-Aided Diagnosis (CAD) is introduced by Ma and Luo [9] to reduce the work of doctors and easily identify the fractures. Here, crack-sensitive convolutional neural network (CrackNet), a new classification model, is designed to identify the sensitive fracture lines. The two-stage system used for evaluating the sensitive fracture lines is organized in the order of faster region with CNN (FR-CNN) to detect twenty different regions of the bone in X-ray images, secondary CrackNet is used to
58
V. Duraivelu et al.
identify the fractured region in the bone from the output of FR-CNN. With a total of 1052 images, the performance achieved is 90.11% accuracy and 90.14% F-Measure. The Deep Neural Network (DNN) classification model is used to classify the fractured bone and healthy bone. At first, the DNN model got overfitted due to small dataset, so the data augmentation processed by Yadav and Rathor is used to increase the size of dataset. Softmax and adam optimizer are used to analyze the performance of the model and the results leverages 92.4% classification accuracy for classifying the healthy and fractured bone using fivefold cross-validation [15]. Here, the experiments are performed by showing the difference in percentage training and testing data by using Python 3.8 Jupyter Notebook to improve the performance of fivefold cross-validation. Different models such as SABL, RegNet, RetinaNet, PAA, Libra R-CNN, FSAF, Faster R-CNN, Dynamic R-CNN, and DCN are used to detect various back bones and 20 categorical fractures by analyzing wrist X-ray images. To improve the evaluation of detection model [4], five different ensemble models is framed as a single unique model, which is named as wrist fracture detection combo (WFD-C) by Hardalac et al. [4]. Here, the highest accuracy is given by 0.8639 precision which is evaluated the 26 different models of fracture detection.
2.1 Faster R-CNN Twenty types of bones are taken into consideration in FR-CNN model, where the introduction of fracture recognition is processed. The region of bone detection using FR-CNN is shown in Fig. 1 which explains the flow of bone detection. Classification and localization are the two distinct tasks that make up object detection in most cases. Region-based Convolution Neural Network is known as R-CNN. The R-CNN series’ central idea is region proposals. Here, bone objects are localized using region suggestions, which make it easier to find the bone mass level and manage the early detection of bone damage or bone weakness. Here, the metrics is evaluated with different data distributions such as skull, lower trunk, upper trunk, upper limb, and lower limb having total count of 3053 datasets. The performance of FR-CNN is evaluated in terms of accuracy, precision, sensitivity, specificity, and F-measure, which estimates the object detection algorithm and marked box in the original image. The measurement of these metrics is given by IoU which is given by, IoU = detection results ∩ ground truth /detection results ∪ ground truth.
Artificial Intelligence Mechanism to Predict the Effect of Bone Mineral …
59
Fig. 1 Flow of region localization and bone detection
The performance of region-wise classification given by CrackNet is estimated by using 242 original X-rays’ images, which are derived from FR-CNN to train and test the CrackNet. Figure 2 shows the performance of the region-wise classification given by Ma and Luo [9]. Data collection and segmentation, data preprocessing, feature extraction/ dimension reduction, and recognition and classification are the four phases that can often be used in the biomedical data classification process. Here, data are classified based on the nature, sensitivity, and importance of the data to the organization in the event that it is changed, stolen, or destroyed. It helps a business in appreciating the worth of its data, identifies its vulnerability, and enables control to lessen risks. Here, bone mass level features are extracted from bone mass index evaluation or bone image analysis. Max Pooling is a pooling operation that calculates the maximum patch value of a feature map and uses it to create a down-sampled (pooled) feature map. Usually used after convolution layers. The bone images or bone mass features are forwarded to CNN, and then, the output features are cumulated to evaluate the features under those layers. The layers are categorized and used for further process of prediction and analysis of bone mass level. By evaluating the recognized features, the classification process of bone mass level is achieved. The proposed framework is summarized with two-stage system of
60
V. Duraivelu et al.
Fig. 2 Region-wise classification comparisons
Metrics Values
Region wise Classification 1 0.95 0.9 0.85 0.8 0.75
Performance estimators Laplace + ResNet Gabor + ResNet Schmid+ResNet
Performance of X-Rays & Radiopaedia Metrics Values
Fig. 3 Comparison of X-rays and radiopaedia (FR-CNN + ResNet)
Sorbal + ResNet ResNet CrackNet
1 0.5 0 X-Ray ( FRCNN + ResNet)
Radiopaedia (FRCNN + ResNet)
Models Accuracy
Precision
Recall
F-measure
X-rays’ dataset and radiopaedia dataset. The performance of both dataset is evaluated with faster R-CNN + ResNet which helps in comparing the fractures’ region and region of location in an image, and Fig. 3 shows the comparison chart of X-ray and Radiopaedia dataset. Hence, the system is not only detect the fracture, but it also localizes the bone in the X-ray images which outperform the bone fracture detection task.
2.2 Fivefold Cross-Validation The Deep Neural Network (DNN) is developed to identity the fracture risk of bone and healthier bone. Here, 100 images are taken in consideration from different human bones. The work flow is designed as shown in Fig. 4 how the DNN is processed with the help of fivefold cross-validation.
Artificial Intelligence Mechanism to Predict the Effect of Bone Mineral …
61
Fig. 4 Flow of fivefold cross-validation
The experiment is developed in three stages of DCNN, were the first stage will be done by using 90% trained data and 10% test data, were 100 epochs with 40 batch size. The normalization of batch is done to minimize the performance fluctuation while execution of the model. Secondly, the same model is processed with 80% of trained data and 20% of test sample. After completion of 100 epochs in two stages, finally fivefold cross-validation is applied for further performance of the system. The experiment evaluation is measured using the metrics Trained accuracy (Tr ac), Validation accuracy (VL ac), Training Loss (Tr Loss), and Validation Loss (VL Loss). The experiments are evaluated and shown in Fig. 5 to explore the fivefold cross-validation performance. Fig. 5 Evaluation of fivefold cross-validation Metrics Value
5 fold cross validation 100.00% 50.00% 0.00%
Tr-acc
VL-acc
Tr-Loss
VLLoss
Metrics Fold1
Fold2
Fold3
Fold4
Fold5
Average
62
V. Duraivelu et al.
2.3 Deep Learning-Based Object Detection Model The wrist X-ray images’ fracture area is detected by different approaches, namely deep learning-based DCN (Faster R-CNN), Dynamic R-CNN, FSAF, Libra R-CNN (RetinaNet), PAA, RegNet, and SABL models. Therefore, 20 different procedures of facture detection were performed with and without data augmentation by the model of object detection. The details of deep leaning-based built object detection model used for fracture detection in wrist X-rays’ images and ensemble model developed are given in Fig. 6. The two-stage detector is framed in proposed model which is shown in Fig. 7. The proposed fracture detection is done by ensemble model based on weighted boxer fusion. The single-stage and two-stage structures of the fracture detection are done by different sub-models. The ensemble model named with WFD-1, 2, 3, 4, 5 is the traits carried out in the performance model combinations. This dynamic model is produced with high probability value for few outputs. This is shown by the threshold value of 0.1 to 0.9 in Fig. 8.
3 Bone Metabolısm The properties of bone are estimated by osteoblastic and osteoclastic cells [11] which allow the bone to maintain the tissues with some factors, which are analyzed by Stagi et al. [11]. Here, the physical and biochemical factors are described that help in constructing healthy bone. More biochemical are the markers of the bone metabolism, and the biochemical which are most important are serum alkaline phosphate (SAP), serum osteocalcin (OC), etc. Metabolism is also included with diet plan, exercise, and vitamin D deficiency. To ensure the health of the bone priority is given for bone metabolism when dealing with children and adolescents.
Fig. 6 Different categorizations of models
Artificial Intelligence Mechanism to Predict the Effect of Bone Mineral …
63
Fig. 7 Flow of two stages of experiments
Threshold Relation b/w Bbox
Metric Values
Fig. 8 Threshold versus metric value comparing dynamic R-CNN with PAA
5000 4000 3000 2000 1000 0 0.1
0.2
0.3
0.4 0.5 0.6 0.7 0.8 Threshold Value PAA Bbox Dynamic R-CNN Bbox
0.9
The correlation between metabolic syndrome (Mets) and human bone health is learned by Cheng et al. with association of lifestyle and socio-economic factors [2], which remains unclear in evaluating bone mineral density. The machine learning approach estimates the prediction of bone mass loss with the correlation of bone metabolic syndrome. Here, concurrent samples of 23,497 adults in three stages are used to monitor the bone density which is evaluated by the use of Mets scoring index. The concurrent prediction and non-concurrent prediction are differentiated and evaluated using various ML algorithms such as SVM, LR, RF, XGBoost, where performance is measured by F1-score. Bone metabolism regulation is discussed by Shahi et al. using formation of bones by the process of combinations of biochemical in the endochondral supplements
64
V. Duraivelu et al.
[10]. The different steps in regulations of bone metabolism are used to formulate the signal pathways. The remodeling of bones based on the metabolism disorder is reviewed in the regulation of bone metabolism. This review considers the major factors of bone formation using signal pathways with the help of growth factors, bone morphogenetic proteins, wingless-type genes, etc. The main factor by which the bone growth is estimated is bone metabolism.
3.1 Bone Tissue Growth The growth of bone in childhood and adolescence is organized by longitudinal and size, shape of the skeleton. The cycle of bone remodeling is shown by Stagi et al. [11], which tells about osteoclasts and osteoblasts given in Fig. 9. The daily requirement of calcium is stated in Table 3, which helps in calculation of bone metabolism which depends on different age factors.
Fig. 9 Cycle of bone remodeling
Artificial Intelligence Mechanism to Predict the Effect of Bone Mineral … Table 3 Requirement of calcium in daily intake of children and adults (suggested by doctor)
65
66
V. Duraivelu et al.
4 BMD and Osteoporosis The computational model is developed from 8981 medical variables [12], which helps in diagnosing BMD after treatment. Tanphiriyakun et al. [12] use seven ML models with 13,562 instances which are used to perform the prediction model. Random forest is chosen as the best algorithm for the prediction of osteoporosis treatment with ROC of 0.70 and accuracy of 0.69. Comparatively new regimen shows the good treatment response with the old actual regimen. The recommended regimens with 9.54% higher than the original regimens. Therefore, the approach of ML-based decision support is used for predicting BMD response after osteoporosis treatment. The automated tool [6] is presented by Hsien et al., to identify fractures and to predict BMD and evaluate fracture risk using plain radiographs. Here, DXA is the main feature to work on the identification of fractures. Totally, 18,175 patients’ DXA is evaluated to measure the performance of the model. The model calibration is with minimum bias value of −0.003 to + 0.003. Therefore, the automated tool classifies 5206 with positive and negative prediction values for osteoporosis which also identify the high-risk patients for bone fractures. Comparing with 3008 DXA images of the patients. The risk in fractures or osteoporosis is evaluated by Cruz et al. using AI model with set of identified models with good capability in identifying more significant distinct factors [3]. Various researchers identified the different AI models to help in screening the risk groups of osteoporosis or fractures. These features are limited for the specific ethnic group, gender, or age. Henceforth, the predictive tool for different populations is used to evaluate the factors of risk in fractures. The big challenge is identified in AI model which is to deal with data complexity generated by unification and developing evidence-based standards.
5 Bone Mass in Endocrine Disease The regulation of hormones in human body is important and plays a main role in estimating the body mass of the individuals. The factors of endocrine decide the growth of bone and deficiency of human hormone levels. Henceforth, Higham et al. also decides the impact of diabetes and thyroid disorder [5]. The endocrine levels also determine the bone mass and fracture risk of the individuals. Therefore, the level of hormone helps in analyzing different health problems, broadly estimated as endocrine diseases, which includes and keeps track on fracture risk of the individuals. The clinical endocrinology gives the path of future research in this bone mass linked with endocrine diseases.
Artificial Intelligence Mechanism to Predict the Effect of Bone Mineral …
67
5.1 Diabetes Mellitus Low bone mineral density and fracture risk are observed in type 1 diabetes mellitus patients. Nearly 86 T1DM patients and 140 healthy patients are monitored to calculate the level of BMD and FR [7] by Joshi et al. 2013. The examination of BMD and other body composition were evaluated by using dual-energy X-ray absorptiometry (DXA). Therefore, T1DM patients have less BMD when comparing with the patients who have control in total body (TB) and lumbar spine in which P < 0.05. Here, Linear Regression analysis shown that T1DM patients have low BMD who associated with poor glycemic control and less physical activity. Examination of bone health among T2DM patients is analyzed by Xu and Wu with the help of continuous national health and nutrition examination survey [14]. This examination helps to identify the BMD and trends of osteoporosis among T2DM and non-T2DM patients. The comparative study among T2DM and non-T2DM shows the increase in osteoporosis. This also identifies the osteopenia, which rapidly increases in both T2DM and non-T2DM patients with linear increase in linear trend that is Plineartrend < 0.04. The fracture risk of Taiwan patients with T2D is examined by including Hypertension, Hyperglycemia, and Hyperlipidemia. Here, 1690 male and 1641 female NHIRDs were followed up by Lin et al. to discuss about the osteoporosis [8] using ICD9-CM codes which is evaluated with anatomical therapeutic chemical codes, which is found from NHIRD. The estimation of incidences and cumulative event rates are analyzed using person year approach and Kaplan–Meier analysis [9], for calculating adjusted hazard ratio (HR) for osteoporosis events. This model was created to estimate the relative risk of an event in survival analysis. Like most statistical tests and models, Cox regression relies on background assumptions such as linearity and additivity of predictors. The basic assumption of the Cox model is that the hazards are proportional (PH). This means that the relative hazards for different predictor or covariate levels are constant over time. Cox proportional hazard model is used in process of estimation. Therefore the T2D specimen as higher osteoporosis when compared with non-T2D.
5.2 Thyroid and Diabetes Mellitus The relationship between thyroid dysfunction and diabetes mellitus is studied by Biondi et al. and proved that there is some relationship between these two health problems [1]. The association of these common disorders is reviewed with the knowledge of thyroid hormone and glucose lipid metabolism levels which helps in monitoring thyroid and diabetes functionalities of the individuals. The work also estimates the common genes and pathogenetic mechanism which contribute the diabetic and thyroid disorder. It has been shown that untreated thyroid can impair the control of diabetic patients which is harmful during pregnancy. Therefore, the major cause of
68
V. Duraivelu et al.
failure in optimal management of thyroid and diabetes mellitus is treated by new algorithms. The bone metabolism judges the thyroid and diabetic disorder [13], which is compared with ketosis and without ketosis by Xu et al. [13]. The main material transformation is taken in the groups of glucose control and pancreatic status, which decides the bone transformation. The results show the comparative biochemical mechanism which significantly changes the level of biometabolism to identify the stages of peptide level and HbA1C levels with DK or DKA, since DKA finds the thyroid function where P < 0.05. The findings show that the dramatic changes in bone metabolism of diabetic patients are related to thyroid functions.
6 Summary and Conclusion The different aspects of bone mineral density and fracture risk are studied and reviewed. The comparison of various models that analyze the BMD and fracture risk is also taken in to consideration for future analysis. The performance metrics are estimated and evaluated by using machine learning, deep learning, and Deep Neural Network models. As the values of BMD and FR are characterized with the help of age, gender, and mainly with bone metabolism, endocrine diseases such as diabetes mellitus and thyroids are effectively predicted and analyzed. In some models, the bone metabolism is considered along with biochemical materials, which is further evaluated, and the performance is also measured to predict BMD. All researchers and clinical experts shown that for a human bone, different remodelings have to be designed and processed to maintain a healthier bone. The healthier bone structure can be structured only by performing an in-depth analysis using Artificial Intelligence (AI) concepts. The methods and importance of AI should be improved by evaluating bone mineral density and fracture risk. As there is a need for the improvement in evaluating the healthier bone for an individual, more concentration should be dedicated to incorporating AI model to predict BMD. In future, the BMD automatizations should be introduced. This further helps in designing the exoskeleton model for differently abled persons. This new automatization will also help in defense to protect the soldiers. A collaborative research study should be carried out to estimate the level of bone health based on BMD, which may be useful to design automation tools and robotics in different environments.
Artificial Intelligence Mechanism to Predict the Effect of Bone Mineral …
69
References 1. Biondi B, Kahaly GJ, Paul Robertson R (2019) Thyroid dysfunction and diabetes mellitus: two closely associated disorders. Endocrine Rev 40(3):789–824. https://doi.org/10.1210/er.201800163 2. Cheng C-H, Lin C-Y, Cho T-H, Lin C-M (2021) Machine learning to predict the progression of bone mass loss associated with personal characteristics and a metabolic syndrome scoring index. Healthcare 9(8):948. https://doi.org/10.3390/healthcare9080948 3. Cruz AS, Lins HC, Medeiros RVA, Filho JMF, da Silva SG (2018) Artificial intelligence on the identification of risk groups for osteoporosis, a general review. BioMed Eng OnLine 17(1):12. https://doi.org/10.1186/s12938-018-0436-1 4. Hardalaç F, Uysal F, Peker O, Çiçeklida˘g M, Tolunay T, Tokgöz N, Kutbay U, Demirciler B, Mert F (2022) Fracture detection in wrist X-ray images using deep learning-based object detection models. Sensors 22(3):1285. https://doi.org/10.3390/s22031285 5. Higham C, Abrahamsen B (2022) Regulation of bone mass in endocrine diseases including diabetes. Best Pract Res Clin Endocrinol Metab 36(2):101614. https://doi.org/10.1016/j.beem. 2022.101614 6. Hsieh C-I, Zheng K, Lin C, Mei L, Lu L, Li W, Chen F-P et al (2021) Automated bone mineral density prediction and fracture risk assessment using plain radiographs via deep learning. Nature Commun 12(1):5472. https://doi.org/10.1038/s41467-021-25779-x 7. Joshi A, Varthakavi P, Chadha M, Bhagwat N (2013) A study of bone mineral density and its determinants in type 1 diabetes mellitus. J Osteoporos 2013:1–8. https://doi.org/10.1155/2013/ 397814 8. Lin H-H, Hsu H-Y, Tsai M-C, Hsu L-Y, Chien K-L, Yeh T-L (2021) Association between type 2 diabetes and osteoporosis risk: a representative cohort study in Taiwan. In Blank RD (ed). PLOS ONE 16(7):e0254451. https://doi.org/10.1371/journal.pone.0254451 9. Ma Y, Luo Y (2021) Bone fracture detection through the two-stage system of crack-sensitive convolutional neural network. Inform Med Unlocked 22:100452. https://doi.org/10.1016/j.imu. 2020.100452 10. Shahi M, Peymani A, Sahmani M (n.d.) Regulation of bone metabolism 10 11. Stagi S, Cavalli L, Iurato C, Seminara S, Brandi ML, de Martino M (n.d.) Bone metabolism in children and adolescents: main characteristics of the determinants of peak bone mass. Clin Cases Mineral Bone Metab 8 12. Tanphiriyakun T, Rojanasthien S, Khumrin P (2021) Bone mineral density response prediction following osteoporosis treatment using machine learning to aid personalized therapy. Sci Rep 11(1):13811. https://doi.org/10.1038/s41598-021-93152-5 13. Xu C, Gong M, Wen S, Zhou M, Li Y, Zhou L (2022) The comparative study on the status of bone metabolism and thyroid function in diabetic patients with or without ketosis or ketoacidosis. Diab Metab Syndr Obes Targets Ther 15:779–797. https://doi.org/10.2147/DMSO.S349769 14. Xu Y, Wu Q (2021) Trends in osteoporosis and mean bone density among type 2 diabetes patients in the US from 2005 to 2014. Sci Rep 11(1):3693. https://doi.org/10.1038/s41598021-83263-4 15. Yadav DP, Rathor S (2020) Bone fracture detection and classification using deep learning approach. In: 2020 international conference on power electronics and IoT applications in renewable energy and its control (PARC). IEEE, Mathura, India, pp 282–825. https://doi.org/10.1109/ PARC49193.2020.236611
Evaluation of Deep Learning CNN Models with 24 Metrics Using Soybean Crop and Broad-Leaf Weed Classification J. Justina Michael
and M. Thenmozhi
Abstract Soybean is one of the highest protein diets which is consumed globally. It is also used for producing animal feed, biodiesel, crayons, bio-composites, carpets, candles, cleaners, ink, tires, foam, and way more. However, from sowing to harvesting soybean, several difficulties emerge among which weed removal is most challenging. To assist agriculturalists, several technologies have evolved. Robotic weeding is one such technology that eradicates the growth of weeds without human intervention thereby multiplying the crop yield. This paper focuses on building the best soybean crop and broadleaf weed classifier that helps robots in weed identification and removal by differentiating crops from weeds. Using transfer learning, 16 different CNN-based architectures were built and compared to determine the best classifier. 2382 observations (images) were used to build these classifiers. To expand this crop-weed dataset, six augmentation methods were applied to the training images. 24 metric values were computed for every classifier for comprehensive result analysis. Top-3 metrics were selected and analyzed for finding the best classifier. Ultimately, MobileNet outperforms all other classifiers by resulting in 99.58% model accuracy. Results are tabulated, and graphs are represented for understanding the outcome precisely. Future works in weed identification and removal were also discussed before concluding. Keywords Soybean crop · Broadleaf weed · Transfer learning · Convolutional Neural Network (CNN) · Image classification
J. Justina Michael Department of Computer Science and Engineering, SRMIST, Kattankulathur campus, Chennai, India e-mail: [email protected] M. Thenmozhi (B) Department of Networking and Communications, SRMIST, Kattankulathur campus, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_6
71
72
J. Justina Michael and M. Thenmozhi
1 Introduction Agriculture is the key to the survival of human life but faces several challenges when providing quality food for the entire world. Weed control is one of the toughest challenges for farmers [1]. Emerging technologies like Artificial Intelligence (AI) support agriculturalists to deal with the rising problem of weeds. This in turn increases several folds of crop yields which raises the global economy. Soybean, whose botanical name is Glycine max, is cultivated in almost all parts of the world, and it has a broad spectrum of productions. Broadleaf weeds are most commonly grown in the soybean field and must be removed during the critical period. The critical period for soybean crop-weed competition begins six to seven weeks after sowing. Weed detection is dealt with deep learning algorithms and is most successful. In [2], weeds are detected from soybean fields using five deep learning models. dos Santos Ferreira et al. [3] solved the same problem using CaffeNet architecture in their work. This paper proposes a deep learning model which works efficiently in classifying soybean crops and broadleaf weeds. The rest of the paper is framed as follows: Recent research works done in deep learning, transfer learning, and crop-weed classification are presented in Sect. 2. In Sect. 3, materials and methods used for carrying out this work are discussed. Section 4 discusses the experimental results which include metrics tables and graphs. Eventually, future improvements are explored along with the conclusion in Sect. 5.
2 Related Works Research and development emphasize agriculture and focus on weed classification, identification, detection, weed growth estimation, weed area density calculation, weed growth control, and weed removal techniques. Some of the recent works are briefed in this section below:
2.1 Weed Detection in the Maize Field Cheng et al. aim to detect weeds from maize fields through a proposed model. In their work, the parameters were reduced, the feature extraction speed was improved, and the training speed was also accelerated. It was proved that the proposed model performs better when compared with YOLOv4 and YOLOv4 Tiny giving an average precision value of 98.98% [4].
Evaluation of Deep Learning Models with Metrics Using Soybean …
73
2.2 Estimating Weed Growth Mishra et al. have collected weed images from three different places containing different soil textures which include ten crop varieties and ten weed varieties. The growth rate and the impact of the weeds were estimated using CNN. Weed growth is estimated through nutrient-based weed growth estimation (NWGE) based on the type of plant, characteristics of leaf, and weed species. The highest accuracy was found to be 97% [5].
2.3 Multi-classification of Weeds Using HSI Images Soybean plants along with five weeds were captured as Hyper Spectral Images (HSI) and classified using the Partial Least Square Regression (PLSR) classifier. The classifier used spectral information from the images for classifying the six classes. For better classification, eight image pre-processing methods were followed. Weed identification was noticed from the chemical images generated, and the accuracy was 86.2% [6].
2.4 Weed Density Estimation Mishra et al. have considered RGB images of soyabean and its weeds, which were cleaned by separating foreground and background vegetation. Using vegetation segmentation, weed density area was estimated in unlabeled data with the help of Inception V4 architecture with an accuracy of 98.2% [7]. Different unlabeled weed species (4384 weed images) from soyabean crop fields had been considered to generalize the model.
2.5 Detecting Cotton Weeds Using Yolo Object Detector Dang et al., in this study, have developed several deep learning models for weed detection in a cotton field. This model has considered 5648 images in its dataset which contains 12 different weed classes. The images are annotated with 9370 bounding boxes. 18 Yolo object detectors were established for detecting weeds. Cotton weeds are detected, and an accuracy of 95.22% is attained by the YOLOv4 algorithm [8].
74
J. Justina Michael and M. Thenmozhi
Fig. 1 Module 1—Soybean crop and broadleaf weed data pre-processing
2.6 Weed Detection at an Early Stage Wang et al. have proposed a method for Solanum rostratum Dunal seedings using Yolo v5. In this proposed method, the high-resolution images are sliced to create new datasets that reduce loss which was formed by compressing high-resolution images. This method shows higher precision and recall values of 0.9465 and 0.9017, respectively. The model performance is increased by showing average precision of 0.9017 [9].
3 Materials and Methods Existing works in detecting weeds focus on building a classifier only with a limited number of architectures. However, in the proposed work the best classifier is chosen after building 16 different architectures and comparing their 24 metrics. The overall workflow of the classifier is split into three modules (module 1, module 2, and module 3) which are depicted in Figs. 1 and 2 respectively. Data pre-processing is a mandatory task as it cleans the data and makes it suitable for the deep learning classifier. The data is augmented to let the classifier have intense training. The classifier is evaluated through its performance metrics.
Evaluation of Deep Learning Models with Metrics Using Soybean …
75
Fig. 2 Module 2—Soybean crop and broadleaf weed CNN model building and Module 3—Soybean crop and broadleaf weed model performance evaluation
Fig. 3 a–c are sample observations from segmented soybean crops and d–f are from segmented broadleaf weeds
3.1 Module 1—Data Pre-processing Original dataset This project utilizes the soybean crop and broadleaf weed dataset which is open source and is freely available in ‘https://data.mendeley.com/datasets/ 3fmjm7ncc6/2.’ The original dataset contains four classes, namely broadleaf, grass, soil, and soybean. This work considers two classes for crop-weed classification. Soybean, a crop is taken as class 0 and broadleaf, a weed is taken as class 1. These are segmented Tag Image File Format (TIFF) images that are lossless. Balancing the original dataset In the original dataset source, soybean contains 7376 images and broadleaf contains 1191 images, which makes the dataset imbalanced. However, in this work, the dataset is balanced by selecting the first 1191 images from the soybean class to exactly match the 1191 broadleaf weed images. Figure 3a–c shows sample observations from segmented soybean crops and (d–f) from segmented broadleaf weed.
76
J. Justina Michael and M. Thenmozhi
Table 1 Soybean and broadleaf weed dataset split Class ID Class name
Image count
Train (80%) Class 0 Class 1 Total a All
.
Soyabean crop 953 Broadleaf 953 weed 1906
Validation (10%) 119 119
Test (10%)
Total
119 119
1191 1191
238
238
2382
images are represented in TIFF
Dataset split ratio The balanced dataset is split into three sets: training, validation, and test. 80:10:10 is the dataset split ratio used in this work [10], which resulted in 1906 observations for training, 238 observations for validation, and 238 observations for testing. Table 1 shows the dataset split count for each class separately. Data pre-processing and augmentation All images were resized to 299 .× 299 pixels for InceptionV3, Xception, and InceptionResNetV2 models, and for all other models they were resized to 224.×224 pixels [11]. Randomly chosen images were augmented through rotation, width shift, horizontal flip, height shift, zoom, and brightness. After applying the six transformations, all images were re-scaled between 0 and 1 with a factor of 1/255.
3.2 Module 2—Classification of Crop and Weed Hardware and software configuration The device configuration to carry out this work was 8GB RAM and Intel(R) Core(TM) i5-1035G4 CPU @ 1.10 GHz 1.50 GHz processor. Tensorflow and Scikit-learn Python libraries were used for implementing and estimating the model metrics. Building the classifier Pretrained CNN architectures were implemented to distinguish the soybean crop from the broadleaf weed. While training the classifier, crop/weed features are extracted through the CNN algorithm, and the images are either classified as a crop or as a weed. The model´s hyperparameters are defined when the 238 validation crop-weed images are processed during the validation phase. 16 unique pre-trained models with varying depths were instantiated by loading the already available pre-trained weights through the concept of transfer learning. These classifiers were implemented by preserving all layer structures except for the trainable layer which is on the top. The pre-trained models with which crop-weed classifiers are built are VGG16, VGG19, DenseNet121, DenseNet169, DenseNet201, ResNet50, ResNet50V2, ResNet101, ResNet101V2, ResNet152, ResNet152V2, MobileNet, MobileNetV2, Xception, InceptionV3, and InceptionResNetV2.
Evaluation of Deep Learning Models with Metrics Using Soybean …
77
Model hyperparameters The model was optimized with the Ádamóptimizer which had a default learning rate of 0.001 [12]. Early stopping was implemented with a patience value of 5, to avoid overfitting. The default epochs for the models were set to 50. The model was trained with a batch size of 32. As the classification kind is binary, the Sigmoid activation function was used to achieve nonlinearity [13]. The model´s loss is calculated using the binary cross entropy loss function.
3.3 Module 3—Performance Evaluation The classifier´s outputs are analyzed and their performances are evaluated using 24 metrics. The workflow of module 2 and module 3 is depicted in Fig. 2. Evaluation metrics for class 0—Soybean crop 24 metrics were computed for all 16 classifiers to accurately analyze the performance in differentiating a crop from a weed. ROC curves were also measured. The estimated metrics were accuracy, error rate and loss (aka log loss aka logarithmic loss aka logistic loss) for training, validation, and test sets, time taken (in seconds) for predicting a single test image, training epochs, True Positives (TP), True Negatives (TN), False Positives (FP), False Negatives (FN), True Positive Rate (TPR aka Recall aka Sensitivity aka Hit rate), True Negative Rate (TNR aka Specificity aka Selectivity), False Positive Rate (FPR aka Fall-Out aka TYPE I error), False Negative Rate (FNR aka Miss Rate aka TYPE II error), F1-score (aka F-score), Positive Predictive Value (PPV aka Precision), Negative Predictive Value (NPV), False Discovery Rate (FDR), and Matthews Correlation Coefficient (MCC) (aka phi coefficient). Criteria for selecting the best classifier Among the 24 metrics, top-3 metrics were chosen to select the best classifier [14]. Testing accuracy is considered as the top-1 metric followed by Matthew´s correlation coefficient as the top-2 metric, and log loss as the top-3 metric [15]. Algorithm 1 defines the pseudocode for selecting the best classifier.
4 Experimental Results 4.1 Results and Discussions In existing works, the metrics and architectural designs considered for building the best classifier is insufficient. Hence in the proposed work, quantitative and qualitative measurements are made enough by considering 24 metrics and 16 architectural designs.
78
J. Justina Michael and M. Thenmozhi
Algorithm 1 Select the Top-1 classifier Require: bestmodeli , bestmodel j Ensure: i = modelmetricsi , j = modelmetrics j 1: if accuracyscor ei > accuracyscor e j then 2: bestmodel ⇐ i 3: else if accuracyscor ei < accuracyscor e j then 4: bestmodel ⇐ j 5: else 6: if mcci > mcc j then 7: bestmodel ⇐ i 8: else if mcci < mcc j then 9: bestmodel ⇐ j 10: else 11: if lossi < loss j then 12: bestmodel ⇐ i 13: else if lossi > loss j then 14: bestmodel ⇐ j 15: else 16: bestmodel ⇐ i, j 17: end if 18: end if 19: end if
Table 2 shows the observed classification accuracy, error rate, and logarithmic loss for the training set, validation set, and testing set. However, testing accuracy is considered to be the model´s overall accuracy, and hence, Table 2 is sorted by the same. Table 3 shows the calculated metrics values from the classification report, confusion matrix, and the above metrics illustrations. Table 3 is sorted concerning Matthew´s Correlation Coefficient (MCC) as it is considered to be the most critical metric among the other metrics present [16]. Table 4 lists the classifiers and their top-3 metric values in descending performance order. MobileNet performs the best of all and happens to be the top-1 model by giving an accuracy of 0.9958 and a minimum loss of 0.0430. It was also observed that the Type I and Type II errors were a minimum of 0.0 and 0.0084 respectively. Precision, recall, and F1-score were 1.0, 0.99, and 1.0, respectively. VGG16, Inception V3, ResNet152 V2, and DenseNet201 perform equally well in terms of model accuracy by giving 0.9874. However, to pick the best among the four classifiers, Matthew´s correlation coefficient (MCC) which was considered the top-2 metrics is evaluated. Once again VGG16, InceptionV3, and ResNet152 V2 showed the same level of performance with 0.9751 MCC. To further select the best among the three, log loss which was considered the top-3 metrics is evaluated then. Ultimately, VGG16 favored the top-2 model list by giving a minimum log loss of 0.0483. Inception V3 happens to be top-3 with the next minimum log loss of 0.06 followed by ResNet152 V2 with the log loss of 0.1. The other models were also sorted by following the same procedure. VGG19 and InceptionResNetV2 conflict for the same reason, eventually VGG19 takes the lead with a minimum log loss of 0.05 when compared with 0.08 given by
Evaluation of Deep Learning Models with Metrics Using Soybean …
79
InceptionResNetV2. The same dispute occurs between DenseNet121 and ResNet101 V2 where finally DenseNet121 wins. In the case of MobileNet V2 and Xception, they both perform equally well in terms of model accuracy but have different MCC and hence the former leads. Even DenseNet169 and ResNet50 V2 clash when analyzing MCC lastly DenseNet169 comes up with a higher value of 0.9415 MCC in comparison with 0.9412 MCC of ResNet50 V2.
4.2 Graphs—Confusion Matrix and ROC Curve Confusion matrix Confusion matrix shows the number of observations that were correctly/incorrectly classified as crops/weeds [17]. To exactly understand the number of observations rightly/wrongly classified for each model, confusion matrices were generated for each classifier and are represented in Figs. 4 and 5. The top-1 best model MobileNet produces 118 true positives out of 119 class 0 soybean crop images and 119 true negatives out of 119 class 1 broadleaf weed images. Otherwise stated as the model has predicted one soybean crop image incorrectly as a broadleaf weed, however, no broadleaf weed had been incorrectly predicted as a soybean crop. In contrast, ResNet101 which gave the model accuracy of 0.9076 predicted 101 true positives and 115 true negatives. Alternatively stated as the model has predicted 22 observations incorrectly. ROC Curve Receiver Operating Characteristic (ROC) curve has been generated for all 16 models and is depicted in Figs. 6 and 7 [18]. The ROC curves for all classifiers are sorted concerning their performance in Table 4. From the shape of the curve, we could accurately examine how the performance of the model goes down as the area under the curve (AUC) gets smaller. This graph is plotted against the False Positive Rate (FPR) and True Positive Rate (TPR) given by each model. MobileNet has a bigger area under the curve, whereas ResNet101 has a smaller area under the curve which implies that the higher the AUC better the model performance will be. MobileNet shows an AUC value of 1.0, while ResNet101 shows an AUC value of 0.96.
5 Conclusion and Future Enhancements 5.1 Conclusion Weeds that are grown in fields must be eliminated as they overthrow crop yield. This work builds the best classifier that could differentiate the soybean crop from the broadleaf weed through deep learning. Transfer learning models were chosen
.
0.9848 0.9575 0.9628 0.9774 0.9759 0.9879 0.9565 0.9045 0.969 0.9643 0.9853 0.9801 0.9654 0.8772 0.8232 0.8687
Training Acc 0.0152 0.0425 0.0372 0.0226 0.0241 0.0121 0.0435 0.0955 0.031 0.0357 0.0147 0.0199 0.0346 0.1228 0.1768 0.1313
Error 0.1215 0.0952 0.1464 0.0838 0.1852 0.0298 0.4091 0.4674 0.3572 0.2323 0.0942 0.0824 0.2622 0.2991 0.4057 0.3075
Loss 0.9916 0.9916 0.9958 0.9874 0.9958 0.9916 0.9958 1 0.9958 0.9916 0.9874 0.9916 1 0.9118 0.8824 0.8763
0.0084 0.0084 0.0042 0.0126 0.0042 0.0084 0.0042 0 0.0042 0.0084 0.0126 0.0084 0 0.0882 0.1176 0.1237
Validation Acc Error
2 (Epochs) shows the total training epochs with early stopping with a patience value set to 5
6 6 6 6 7 6 6 6 9 7 6 6 7 9 9 14
MobileNet VGG16 InceptionV3 DenseNet201 ResNet152V2 VGG19 InceptionResNetV2 DenseNet121 ResNet101V2 Xception MobileNetV2 DenseNet169 ResNet50V2 ResNet152 ResNet50 ResNet101
a Column
Epochs
Model name
Table 2 Accuracy score, error rate and log loss of the binary classifiers
0.1088 0.0438 0.0155 0.0337 0.0192 0.0232 0.0128 0.0068 0.0431 0.0189 0.0448 0.0104 0.004 0.2451 0.2735 0.3163
Loss 0.9958 0.9874 0.9874 0.9874 0.9874 0.9832 0.9832 0.979 0.979 0.9748 0.9748 0.9706 0.9706 0.9202 0.916 0.9076
Test Acc 0.0042 0.0126 0.0126 0.0126 0.0126 0.0168 0.0168 0.021 0.021 0.0252 0.0252 0.0294 0.0294 0.0798 0.084 0.0924
Error
0.043 0.0483 0.06 0.0799 0.1006 0.051 0.0813 0.0666 0.2383 0.1093 0.1699 0.0653 0.1025 0.2552 0.2862 0.274
Loss
80 J. Justina Michael and M. Thenmozhi
.
118 116 119 116 118 116 118 117 117 117 116 117 115 106 109 101
TP
119 119 116 119 117 118 116 116 116 115 116 114 116 113 109 115
TN 0 0 3 0 2 1 3 3 3 4 3 5 3 6 10 4
FP 1 3 0 3 1 3 1 2 2 2 3 2 4 13 10 18
FN
3 is sorted by Matthew´s Correlation Coefficient (MCC)
0.08 0.13 0.09 0.14 0.19 0.18 0.19 0.12 0.16 0.08 0.17 0.15 0.1 0.12 0.14 0.17
MobileNet VGG16 InceptionV3 ResNet152V2 DenseNet201 VGG19 InceptionResNetV2 DenseNet121 ResNet101V2 MobileNetV2 Xception DenseNet169 ResNet50V2 ResNet152 ResNet50 ResNet101
a Table
Time
Model name
Table 3 Evaluation metrics of the binary classifiers
1 1 0.97 1 0.98 0.99 0.98 0.97 0.97 0.97 0.97 0.96 0.97 0.95 0.92 0.96
PPV 0.99 0.97 1 0.97 0.99 0.97 0.99 0.98 0.98 0.98 0.97 0.98 0.97 0.89 0.92 0.85
TPR 1 0.99 0.99 0.99 0.99 0.98 0.98 0.98 0.98 0.97 0.97 0.97 0.97 0.92 0.92 0.9
F1 score 1 1 0.97 1 0.98 0.99 0.97 0.97 0.97 0.97 0.97 0.96 0.97 0.95 0.92 0.97
TNR 0 0 0.03 0 0.02 0.01 0.03 0.03 0.02 0.036 0.03 0.04 0.03 0.05 0.08 0.03
FPR 0.01 0.03 0 0.03 0.01 0.03 0.01 0.02 0.02 0.02 0.03 0.02 0.03 0.11 0.08 0.15
FNR 0.99 0.97 1 0.97 0.99 0.97 0.99 0.98 0.98 0.98 0.97 0.98 0.97 0.9 0.92 0.86
NPV
0 0 0.03 0 0.02 0.01 0.03 0.02 0.03 0.03 0.03 0.04 0.03 0.05 0.08 0.04
FDR
0.99 0.97 0.97 0.97 0.97 0.97 0.96 0.95 0.95 0.95 0.95 0.94 0.94 0.84 0.83 0.82
MCC
Evaluation of Deep Learning Models with Metrics Using Soybean … 81
82
J. Justina Michael and M. Thenmozhi
Fig. 4 Confusion matrix of the soybean crop and broadleaf weed classifiers (Classifier (i) to (viii))
Evaluation of Deep Learning Models with Metrics Using Soybean …
83
Fig. 5 Confusion matrix of the soybean crop and broadleaf weed classifiers (Classifier (ix) to (xvi))
Fig. 6 ROC curve of the soybean crop and broadleaf weed classifiers (Classifier (i) to (viii))
84 J. Justina Michael and M. Thenmozhi
Fig. 7 ROC curve of the soybean crop and broadleaf weed classifiers (Classifier (ix) to (xvi))
Evaluation of Deep Learning Models with Metrics Using Soybean … 85
86
J. Justina Michael and M. Thenmozhi
Table 4 Evaluating models based on Algorithm 1 Accuracy MCC Model name MobileNet VGG16 InceptionV3 ResNet152V2 DenseNet201 VGG19 InceptionResNetV2 DenseNet121 ResNet101V2 MobileNetV2 Xception DenseNet169 ResNet50V2 ResNet152 ResNet50 ResNet101 a Table
.
0.9958 0.9874 0.9874 0.9874 0.9874 0.9832 0.9832 0.979 0.979 0.9748 0.9748 0.9706 0.9706 0.9202 0.916 0.9076
0.9916 0.9751 0.9751 0.9751 0.9748 0.9665 0.9665 0.958 0.958 0.9497 0.9496 0.9415 0.9412 0.8418 0.8319 0.8208
Loss 0.043 0.0483 0.06 0.1006 0.0799 0.051 0.0813 0.0666 0.2383 0.1699 0.1093 0.0653 0.1025 0.2552 0.2862 0.274
4 is sorted by Algorithm 1
to obtain better results with generalized performance. Effective data augmentation methods for this dataset were chosen to achieve the best result. 16 models were implemented and were extensively evaluated with 24 metrics with a dataset of 2382 images of the soybean crop and broadleaf weed. In the final analysis, it was observed that MobileNet outperforms all other models by resulting in a model accuracy of 99.58%. MobileNet is the fastest predicting classifier with 0.08ms predicting time per image. It yielded 118 true positives and 119 true negatives out of 238 images. Precision, recall, and F1-score values of the MobileNet classifier are 1, 0.99, and 1. MobileNet classifier produces 0.9916 MCC and 0.043 loss values.
5.2 Future Enhancements Research concerning robotic weeding is a boon to agriculturalists. Accordingly, we would extend our work in fine-tuning the hyperparameters of the MobileNet model to achieve generalized results. Moreover, to support the variability of images, several data pre-processing methods would be considered in our subsequent work. In addition, weed varieties will be added to the dataset which will result in the multiclassification of weed species.
Evaluation of Deep Learning Models with Metrics Using Soybean …
87
References 1. Abouziena HF, Haggag WM (2016) Weed control in clean agriculture: a review. Planta Daninha 34:377–392 2. Razfar N et al (2022) Weed detection in soybean crops using custom lightweight deep learning models. J Agric Food Res 8:100308 3. dos Santos Ferreira A et al (2017) Weed detection in soybean crops using ConvNets. Comput Electron Agric 143:314–324 4. Cheng L, Shi-Quan S, Wei G (2021) Maize seedling and weed detection based on MobileNetv3YOLOv4. In: 2021 China automation congress (CAC). IEEE 5. Mishra AM et al (2022) A deep learning-based novel approach for weed growth estimation. In: Intelligent automation and soft computing, vol 31(2) 6. Ahmed MR et al. Multiclass classification on soybean and weed species using a novel customized greenhouse robotic and hyperspectral combination system. Available at SSRN 4044574 7. Mishra AM et al (2022) Weed density estimation in soya bean crop using deep convolutional neural networks in smart agriculture. J Plant Diseases Protect 129(3):593–604 8. Dang F et al (2022) Deep cotton weeds (DCW): a novel benchmark of YOLO object detectors for weed detection in cotton production systems. In: 2022 ASABE annual international meeting. American Society of Agricultural and Biological Engineers 9. Wang Q et al (2022) A deep learning approach incorporating YOLO v5 and attention mechanisms for field real-time detection of the invasive weed Solanum rostratum Dunal seedlings. Comput Electron Agric 199:107194 10. Nguyen QH et al (2021) Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Math Probl Eng 2021:1–15 11. Zhongzhi H (2019) Computer vision-based agriculture engineering. CRC Press 12. Jais IKM, Ismail AR, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowl Eng Data Sci 2(1):41–46 13. Farhadi F (2017) Learning activation functions in deep neural networks. Ecole Polytechnique, Montreal (Canada) 14. Hossin M, Nasir Sulaiman Md (2015) A review on evaluation metrics for data classification evaluations. Int J Data Mining Knowl Manage Process 5(2):1 15. Chicco D, Tötsch N, Jurman G (2021) The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining 14(1):1–22 16. Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21:1–13 17. Sammut C, Webb GI (eds) (2011) Encyclopedia of machine learning. Springer Science & Business Media 18. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Bluetooth Controlled Integrated Robotic Arm with Temperature and Moisture Sensor Modules K. C. Sriharipriya, R. Shivani, K. Sai Ragadeep, and N. Sangeetha
Abstract Robotic arms are automated machines that are equipped with a range of sensors that allow them to interact with the environment. They can be used in a variety of applications such as manufacturing, medical, and laboratory tasks. A robotic arm with a temperature sensor is especially useful for applications that require precise temperature control. The temperature sensor is typically mounted on the robotic arm and it sends a signal back to the controller which can be used to regulate the temperature of the space. For example, in a manufacturing plant, the robotic arm can be programmed to turn on a cooling fan when the temperature of the environment exceeds a certain threshold. The temperature sensor can also be used for safety purposes, e.g., if the temperature of the environment is too high, the robotic arm can be programmed to shut down the system. This ensures that the system does not overheat and that the workspace remains safe for the people working in it. In addition, the temperature sensor can be used to detect changes in the environment, e.g., if the temperature is rising, the robotic arm can be programmed to open a door or window to allow more air circulation. This can help maintain a comfortable working environment and can also prevent the system from overheating. In conclusion, a robotic arm with a temperature sensor is an invaluable asset for any industry or research lab. It can help to regulate the environment and prevent the system from overheating, as well as providing safety features. The temperature sensor can also be used to detect changes in the environment, allowing the robotic arm to adjust the system accordingly.
K. C. Sriharipriya · R. Shivani (B) · K. Sai Ragadeep · N. Sangeetha School of Electronics Engineering, VIT University, Vellore, India e-mail: [email protected] K. C. Sriharipriya e-mail: [email protected] K. Sai Ragadeep e-mail: [email protected] N. Sangeetha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_7
89
90
K. C. Sriharipriya et al.
Keywords Internet of Things · Robotic arm · Degree of freedom · Bluetooth module · Arduino · Industrial robots · Programmable logic controller · Actuation and joint mechanism · Dexterous manipulation · Field robotics · Physically assistive devices · Robotics in hazardous fields · Sensor-based control
1 Introduction Robotic arms are one of the most important advances in modern technology. They are used in a variety of industrial, medical, and military applications and have revolutionized the way we work and live. Robotic arms are used in factories to assemble products, in medical facilities to perform surgeries, and in the military to operate weapons systems. Robotic arms are made up of several different components that work together in order to function. The main components are the actuators, sensors, controllers, and end effectors. The actuators are responsible for moving the arm and providing the necessary power to do so. Sensors monitor the environment and provide feedback information to the controller. The controller is responsible for interpreting the information and deciding how to move the arm. The end effectors are the tools on the end of the arm, such as a gripper or a drill. Robotic arms are capable of performing a wide variety of tasks with precision, accuracy, and speed. They can be programmed to perform precise and complex movements, such as welding, painting, and drilling. They are also able to work in hazardous environments, such as nuclear power plants or in space exploration, where human operators cannot. They have even been used to perform delicate surgeries, such as suturing, and in some cases, even replacing human surgeons. Robotic arms are also incredibly versatile, as they can be programmed to perform a variety of tasks. They are used to assemble products in the automotive industry, to paint and weld in the manufacturing industry, and even to explore space. They can be used to perform dangerous tasks, such as welding in hazardous environments, or to perform delicate tasks, such as inserting small components into an electronic device. The use of robotic arms has revolutionized the way we work and live. They are more efficient, cost-effective, and precise than ever before. The development of robotic arms has improved the speed, accuracy, and quality of many industrial and medical tasks, and has allowed us to explore and utilize new technologies. They are an invaluable part of modern technology and will continue to be used for years to come.
Bluetooth Controlled Integrated Robotic Arm with Temperature …
91
2 Literature Survey • Szabo and Lie [1] present an application to sort colored objects with a robotic arm. It uses image recognition and a webcam to identify colors, with the arm moving in a PTP trajectory. To completely replace humans with robots, decisions must be made by the arm. A good example is the sorting of objects by color, useful in factories like pencil factories. • Kruthika and team in [2] have designed and developed a robotic arm with 5 degrees of freedom (DOF) for feeding the elderly and physically disabled people. The robotic arm is simulated through MATLAB software and is controlled with the help of Arduino MEGA2560 I/O board. This robotic arm can be applied in other areas too. • The objective of Gautam and team in [3] was to develop a light-weight robotic arm using materials like aluminum and carbon fiber, along with advancements in stepper motors to provide a better alternative for the conventional robotic arm. Also, the paper aims to reduce the wear and tear of the components so that there is efficient cost cutting in maintenance of the robot. • This bio-mimicking idea is inspired by the intelligent nature of octopus arms. In [4], Wu and team use reconfigurable Kresling units to magnetically control the robotic arm’s movements such as stretching, folding, omnidirectional bending, and twisting. A basic structure is provided by assembling four Kresling units, and it is further scaled up to meet the required complexity by adding on more units. This is highly adaptable but not cost-efficient. • Robots controlled by the Internet aim to bridge the gap between industrial robots and domestic robots. In [5], the robots are controlled using Arduino UNO and interfaced with the Internet using Arduino Ethernet Shield. The accuracy of such robots is high, and the operational tests were successful. • Humanity has always aspired to provide life-like traits and characteristics to its products to discover a replacement for itself. To make a comeback into a normal lifestyle, robots are used for better treatment of patients. Dutta and team in [6] have designed and developed a low-cost 3D printed humanoid robotic arm which can be controlled and operated wirelessly. It works on wireless technology, providing six degrees of freedom and can be used for hazardous conditions, traffic management systems, and smart cities. • The use of robotic arms to help the elderly and vulnerable increases gradually. Motion planning for robotic arms must be more stringent to ensure safer and more reliable Human–Robot interaction and its wider adoption. Yang et al. [7] present a motion planning method based on human arm physics and reinforcement learning. Data from a VICON motion capture system is used to extract motion rules. Reward functions are then proposed and training of the robotic arm is done using deep deterministic policy gradient and hindsight experience replay algorithms. Experiments verify the feasibility and effectiveness of the proposed approach in planning the humanoid motion of the robot arm.
92
K. C. Sriharipriya et al.
• Gungor and Kiyak [8] deploy a robotic arm to detect the presence of the autonomous vehicle by reading the QR code of the vehicle. This efficiently cuts down the requirement of manual labor in the charging stations. Hence, they deploy only image processing methods, cutting out the redundant cost of proximity sensors, pressure sensors and other such sensors which brings down the vehicle’s charging cost efficiently. • Krishnaraj Rao et al. [9] aim to create an automated robotic system that can be used to complete pick and place tasks in an efficient and effective manner. The robotic arm will be designed using machine learning algorithms to enable it to complete these tasks with greater accuracy and speed than what is possible with traditional robotic systems. The system will also be able to learn and adapt to its environment, allowing it to complete tasks more quickly and reliably. By using machine learning algorithms to create the robotic arm, the project will be able to provide improved performance and efficiency, while also reducing the cost associated with traditional robotic systems. • The Mars 2020 Perseverance Rover’s Sample Caching Subsystem (SCS) includes the ability to switch drill bits, collect and process rock and regolith samples, and interface the Corer and Bit Carousel for bit exchange, as explained in [10]. The docking assembly consists of four alignment cones and a rotating ring. It uses force-corrected docking to align the Corer and Bit Carousel, and has been tested thousands of times and demonstrated in flight. • In [11], Mourtzis and team have explored the various means and methodologies of interaction between humans and robots. Thus, the research engineers have developed Human–Robot interfaces (HRI) using a closed-loop framework with technologies such as mixed reality (MR). This provides an insight on how to operate the machine from a safe distance, with precise coordination between the machine and the operator, using the digital twin methodology. The digital twin is modeled with robot operating system (ROS), with data handled by the cloud and mixed reality for easing the Human–Robot interface. • The COVID-19 pandemic has caused a surge in medical waste, risking safety of front-line health workers and disposers. A system using a robotic arm and voice commands to segregate medical waste using YOLOv3 and ROS achieved in [12], with 94% training accuracy, 82.1% automated accuracy, and 82.5% manual accuracy after 30 trials. • Technology is growing fast to meet the needs of the industrial revolution. Humans alone can’t cope. The current revolution 4.0 is being driven by robotic tech. Robotic arms are needed for routine tasks like moving objects, welding, etc. These arms are controlled by microcontrollers and have 5–7 degrees of freedom. An end effector is attached to the edge, like a finger, and its performance depends on the program. Rooban et al. [13] discuss the design of the robot arm through Coppeliasim to pick and place objects with accuracy and reliability. • Babu et al. [14] describe a low-cost prosthetic robotic arm built with a 3D hand and a patient monitoring system for disabled children. It has a force sensor to detect muscle flex activity and a heartbeat tracker to measure pulse, temperature, and
Bluetooth Controlled Integrated Robotic Arm with Temperature …
•
•
•
•
•
•
93
location. It can lift objects of up to 1 kg and the monitoring shows a temperature of 32.87 °C and a pulse of 71 bpm. Soft body dynamics is a computational resource based on physical reservoir computing platforms, in which an arm-like structure is used to simulate a physical system. A study was conducted in [15] to determine how the arm length and environmental conditions affect its information processing capacity. The results showed that the memory capacity was higher in water than in air, and that a weak echo state property was observed in air. Furthermore, the study demonstrated how the body and its environment can affect information processing. Flying manipulators are aerial drones equipped with robotic arms that can be used for a variety of applications. The proposed model by Szasz and team in [16] demonstrates that these platforms can be used to leverage the compliant behavior of the arm while still providing maneuverability. A hierarchical modelbased controller has been implemented to control the arm, and this has been tested in simulation to evaluate its effectiveness. The use of flying manipulators could open up new fields of application that were not possible before. Soft robotic arms are a type of robotic technology designed to safely and dexterously perform tasks in cluttered and confined spaces. They are designed with compliance and a high degree of freedom in order to safely work in these environments. In order to ensure the robot’s accuracy, a closed-loop control system is needed, which requires proprioceptive sensing. Ouyang et al. [17] present a pneumatically-actuated soft robotic arm that is designed to be modular, with fast assembling connectors and a tactile sensing array embedded in each joint. The array uses k-nearest neighbors regression to map the stress distribution and posture of the actuator. Experiments showed that the tip location of a three-segment soft arm had a mean error of 5.64 mm. A low-cost, highly adaptable, and easy to control robotic arm using Arduino UNO is developed by Yuvaraj and all in [18], to aid physically challenged people. This uses voice-based commands to control the arm since voice-based control has higher precision and efficiency than mechanical control. The speech commands are converted to writing using speech recognition software. Wegrowski et al. [19] analyze a prototype robotic arm for a quadcopter (drone). The arm is a modular folding mechanism that extends from the drone body and, when fitted with a gripper, can retrieve or sample objects. It uses one actuator and can extend to six times its original length. This arm offers advantages over autonomous ground vehicles (AGV’s) for sampling difficult or impossible to reach areas. A SolidWorks model was created, a prototype built and tested, and it performed as expected. In [20], a robotic arm with one actuator is designed and analyzed for an advanced grasping system for UAVs. It includes a foldable arm, case, gripper, and vision system. The SolidWorks model, motion analysis, and a prototype were developed and tested. The design can be scaled and attached to different sizes of UAVs.
94
K. C. Sriharipriya et al.
• A robotic arm trajectory planning is proposed to express emotions, using a fuzzy reasoner and kinematics feature mapping, by Wu and team in [21]. This method generates motion trajectories based on kinematics characteristics, and was simulated in MATLAB for the emotional robot WEBO. It better expresses emotions and achieves better Human–Robot interaction (HRI). • Robotic arms have been used for more than 50 years in industry, but can be costly for small-scale applications. Ndambani et al. [22] propose a cost-effective autonomous robotic arm system that uses IoT devices for object recognition, motion compensation (using Kalman filter), and real-time object sorting with 95% accuracy and 1.1 mm mean error. • Oikonomou et al. [23] present a new approach to controlling a two-module soft robotic arm, inspired by biology. The method uses probabilistic movement primitives (ProMPs), a type of model learning, to program the arm’s movements. The technique also includes a procedure for re-planning the ProMPs’ activation in an asynchronous manner. Testing of the technique demonstrates that it simplifies the process of controlling robots with complex and unmodeled dynamics. • The triboelectric nanogenerator-based (TENG-based) sensors are highly sensitive, reactive, and self-powered. In [24], Ji and team use a flexible, two-axis force TENG to sense and collect mechanical energy using a PolyJet 3D printer. This uses Agilus30, digital acrylonitrile butadiene styrene (ABS), nitrile rubber, and stainless steel. The researchers attached this TF-TENG as a skin to the robot arm to detect and avoid collision with other objects. • Dai et al. [25] review the existing technologies present in the spatial robotic arm and present ways to map the trajectory of the robotic arm’s motion. The metrics used to measure the merit of the implemented technology is based on the success of the completion of operation. Since the application of robotic arms in space is a relatively new implementation of the technology, the research paper focuses on only a few areas like avoidance of space obstacles in the robotic arm’s trajectory. • This study by Chen and team in [26] focuses on using RGB-D images and modifying an existing machine learning network architecture to predict the gripping posture of a successfully grasped object. A 5-Fin gripper was tested to show it can perform delicate missions better than 2- or 3-Fin grippers. Experiments were conducted with 6-DOF robot arm and 2-Fin/5-Fin grippers, producing an automated system with a gripping success rate of over 90%. • This prototype wire-driven robotic arm, developed by Huang and team in [27] is designed to address the issues of insufficient torque and overshoot in soft robotic arms, by combining multiple solid polygons and spherical joints. Its motion resembles muscle contraction, and is driven by a servo control and look-up table-based method to rotate the motor and drive the arm to a specific point. The motion model has been verified through MATLAB Simulink. • In [28], Yaacob and all train and deploy a robotic arm. They have created a robotic arm with three degrees of freedom (3 DoF) and have trained the model in such a way that it can be operated in two modes, namely tele-operated and semi-autonomous modes. In the tele-operated mode, the robotic arm follows the command of the user while in the autonomous mode, it operates from the
Bluetooth Controlled Integrated Robotic Arm with Temperature …
•
•
•
•
•
95
trained intelligence it acquired from the teaching pendant. Arduino UNO is used to program the robot. The consistency and the accuracy of the two modes are evaluated by performing various tasks. In [29], the 2 DOF robot manipulator is studied using SMC for stability, accuracy, and robustness. Lyapunov stability criteria are used to analyze the controllers, and the results show an impressive improvement of 40.5% and 36.7% for joints 1 and 2, respectively. This demonstrates that SMC is a powerful tool for controlling robot manipulators and achieving improved performance. Marnell et al. [30] design and manufacture an Android-controlled robotic arm for picking and placing objects in constrained environments. The research assesses the feasibility of controlling complex real-time operating systems using a handheld device, with potential for controlling larger scale operations. The arm is produced using rapid prototyping and controlled by an Arduino-based board and a Bluetooth app. Experiments show the system can lift weights up to 80 g, with a current draw of 970 mA, and runs for up to 12 min before encountering bugs or overheating. The 3D printed PETG wears over time, suggesting a more suitable material should be utilized. Non-destructive testing (NDT) such as ultrasonic testing (UT) is often used to inspect non-working equipment for remanufacturing. Most UT is implemented manually, but an automatic NDT inspection implemented with a robotic arm control platform has been proposed in [31]. Simulation and a control platform based on ROS were developed to ensure successful implementation. A monocular camera was used to reconstruct the object and plan paths for the robotic arm, which was tested in simulation and implemented in the real world. The robotic arm system designed for contactless surgery due to the coronavirus pandemic consists of two robotic arms, a camera, AutoCAD design, 3D printing, and wireless communication. This system is designed by Gadikar and team in [32] to reduce the risk of infection and enable more efficient operations with less blood loss and a faster recovery time. The two robotic arms are equipped with cutting-edge technology to allow for precise and accurate surgical movements, the camera and AutoCAD design provide a 3D visualization of the surgical site, and 3D printing is used to fabricate customized surgical tools. Additionally, wireless communication allows remote control of the robotic arms, enabling doctors to perform contactless surgery while safely monitoring the procedure from a distance. Soft robots denote the robots that are flexible and adaptable, unlike tier rigid counterparts. This implies that the aforesaid robot will have lower force exertion and control complexity. This can be deployed in various aspects like space industry, where the occupancy requirement is constricted and the requirement of weight reduction is high. In [33], the researchers have developed a soft robot called POPUP robots, with inflatable links and rigid joints, a hybrid structure that has light-weight deployable parts that has simple control mechanism and low energy consumption.
96
K. C. Sriharipriya et al.
• Montoya Angulo et al. [34] outline an assisted operation system for a robotic arm to position it near an explosive device. The system uses two cameras mounted in a camera-in-hand configuration, using the CAMSHIFT algorithm for object tracking and feature matching. Inverse kinematics is implemented to easily grab the grenade, and tests have verified the effectiveness of the system.
3 Methodology In this research work, we plan to build a robotic arm which is controlled via Bluetooth, remotely, and is commanded via the Internet. For the first step, we researched around 34 journal papers, to establish an idea how to implement the robotic arm. Next, we designed the prototype of a robotic arm with 5 to 6 degrees of freedom with the help of Tinkercad. We then tried the algorithm to code the robotic arm’s microcontroller, and we would implement the same using Arduino IDE. The prototype of how the robotic arm simulator would look like, designed using Tinkercad is shown in Fig. 1. Further, we would build the prototype of the robotic arm and connect it with the gesture control glove. For programming the microprocessor (Arduino UNO) and integrating it with the other modules such as the HC-05 (Bluetooth module), PCA9685 (Servo motor driver module), etc., we used the Arduino IDE software, which uses a version of C++ and includes the required header files to access the module functions. We use the following modules and sensors to implement the robotic arm. a. Arduino microprocessor is used as the main controlling unit in both the glove (master) as well as the robotic arm (slave). Fig. 1 Prototype of how the robotic arm simulator would look like, designed using Tinkercad
Bluetooth Controlled Integrated Robotic Arm with Temperature …
97
b. PCA 9685 driver is the module used to control the servo motors present in the robotic arm. We use this due to its higher refresh rate of 60 Hz and 12-bit resolution for each output. c. A4988 micro driver is used to control bipolar stepper motors. This driver has an advantage over the others since we need just two pins for the control, one for rotation direction and the other for the number of steps. d. MPU6050 accelerometer is used for the detection of the motion from the user’s wrist movement. Since the accelerometer works based on the determination of the amount of tilt in any direction of the Cartesian plane, and the sensitivity of the sensor can be set to the user’s requirement, we implement MPU6050 in our work. e. HC 05 Bluetooth module is used for wireless communication between the glove (master) and robotic arm (slave). It has a frequency of 2.45 GHz, and can establish a connection up to 10 m. f. Flex sensor is used in the glove for the detection of the movement in the fingers. The advantage of a flex sensor is that it changes its resistance even at the slightest bend in the plastic, which makes it easy to take the reading of the degree of bending of the sensor. We use a 2.2-inch flex sensor in this work. g. Humidity sensor placed with the robotic arm detects the amount of temperature and humidity present around the robotic arm. A continuous feed of the surrounding temperature and humidity would be sent to the monitor, according to which necessary actions can be taken. We 3D print the parts of the robotic arm and assemble it according to our requirement. Then, we fit in the sensors as required, program it through Arduino IDE console, and establish a connection between the robotic arm and the glove. Snippet of a code in Arduino IDE is shown in Fig. 2.
3.1 Implementation Strategy 1. Design the robotic arm: We start by designing the robotic arm. This should include the size, shape, number of joints, and the materials that will be used. We plan the sensors and the mechanisms that will be used to control the arm, like servo motors that are used for the movements, controlled by the servo motor driver. 2. Assembly of parts: Once the design of the robotic arm is complete, we begin to build the arm itself. This will involve assembling the components, connecting the joints, and attaching the sensors. 3. Uploading the code from Arduino IDE: Once the arm is built, we program the robotic arm. For controlling the gestures to be performed by the robotic arm, we take the help of the accelerometer and flex sensors present in the glove. The changes are detected, processed, and signals are passed via Bluetooth to the robotic arm, for further implementation.
98
K. C. Sriharipriya et al.
Fig. 2 Snippet of a code in Arduino IDE
4. Testing the robotic arm: Once the robotic arm is programmed, we begin testing the arm. This should include tests for accuracy, reliability, and durability. 5. Adding the temperature and moisture sensor: Once the robotic arm is tested and working properly, we add the temperature and moisture sensors. This will involve connecting the sensors to the robotic arm and programming the arm to interpret the data from the sensors. There are quite a lot of factors regarding the implementation of the robotic arm, some of them being the durability of the 3D printed parts, the accuracy and precision of the sensors used on the arm, and also the customized values for the MPU 6050, so that we can control the robotic arm at our own comfort and convenience.
3.2 Block Diagram Block diagram of the process flow is shown in Fig. 3. The control flow of the robotic arm proceeds as discussed in the above diagram. First, the user is expected to perform a gesture which they intend to be mimicked by the robotic arm. These gestures are then detected by the MPU6050 accelerometer sensor which detects the motion in rotatory joins such as the wrist and the elbow based on their axial movement, and the flex sensors, which detect the motion of the fingers based on the degree of bending of the sensor, and the signals are passed on to the Arduino present on the glove. This signal is then relayed to the HC-05 Bluetooth module that acts as the master, which in turn transmits the commands to the HC-05
Bluetooth Controlled Integrated Robotic Arm with Temperature …
99
Fig. 3 Block diagram explaining the flow of the project
Bluetooth module that acts as the slave, present in the robot. The received signals are then relayed on to the Arduino microprocessor present on the robot. The PCA 9685 servo motor driver is directed by the Arduino to relay the signals to the corresponding motors, thus completing the action initiated by the user.
4 Discussion This work introduces a new module to the robotic arm consisting of a temperature sensor and a moisture sensor connected to the device with the help of Internet of Things (IoT) applications such as Blynk. This module will enable us to observe and understand the environment surrounding the robotic arm. This is particularly beneficial in areas with high temperatures and radioactivity, where remote control of the arm may be necessary. The data collected from the temperature and moisture sensors will provide valuable insights into the conditions of the environment, thus allowing us to make better decisions about the arm’s operations. Finally, we would integrate the robotic arm with a web application and an Android application, to enable the users to control the robotic arm with the help of Bluetooth. We also plan to integrate the robotic arm with speech recognition systems, to enable the users to control the robotic arm through voice commands. This would enable the users to control the robotic arm remotely without any physical intervention. We could also integrate the robotic arm with a machine learning algorithm to enable it to learn and perform certain tasks automatically, as a part of the future scope. There are a few challenges when it comes to the implementation of an integrated robotic arm with temperature and moisture sensor module:
100
K. C. Sriharipriya et al.
• Sensor accuracy and reliability: One of the main challenges in integrating temperature and moisture sensors with a robotic arm is ensuring the accuracy and reliability of the sensor readings. The sensors must be calibrated properly and have the ability to withstand different environmental conditions, such as changes in temperature, humidity, and pressure. • Real-time data processing: Another challenge is processing the sensor data in real-time to provide the robotic arm with accurate information. This requires a fast and efficient processing system that can handle large amounts of data. • Sensor placement: The placement of the temperature and moisture sensors is critical for accurate readings. The sensors must be placed in a location that allows them to capture data from the environment while minimizing interference from the robotic arm. • Integration with the robotic arm control system: The integration of the sensor module with the robotic arm control system is a significant challenge. The sensor data must be used to make real-time decisions about the arm’s movements, such as adjusting the grip strength based on the moisture content of an object. • Power management: The sensor module and robotic arm require significant power to operate. The challenge is to optimize the power management system to ensure efficient use of the available energy, while maintaining the performance of the system.
5 Results The following circuit represents the prototyping of the robotic arm, with six servo motors connected to an Arduino board via a 6-pin DIP switch (used as a manual substitute for PCA9685 servo motor driver). Circuit diagram of robotic arm prototype is shown in Fig. 4. The servo motors turn with the help of commands received from the motor driver, which essentially gets these commands from the gestures recorded on the glove, via Bluetooth, in practical applications.
6 Conclusion From the papers we had reviewed, we have concluded the following inferences: 1. Most of the papers we have seen have models with restricted degrees of freedom (DoF), ranging from 3 to 5, but in our model, we will have a DoF of 6. 2. We will be making this model easy to handle, and highly portable, which makes the handling easier to the users when compared with existing models. 3. In this model, we will be using Bluetooth technology to establish the connection. This is wireless, in contrast to a few previous models that use wired connections between the master and the slave components.
Bluetooth Controlled Integrated Robotic Arm with Temperature …
101
Fig. 4 Circuit diagram of robotic arm prototype
4. Existing models use either pre-programmed commands to operate, or is controlled based on a feedback loop mechanism, but in our model we would incorporate gesture-based live controlling. 5. Although we have seen that some existing models can be scaled up to the requirement by stacking up similar component modules and connecting them, we would have a rigid, non-scalable model, since this application would have to compromise on the functionality of the model under extreme conditions if structural rigidity is implemented.
References 1. Szabo R, Lie I (2012) Automated colored object sorting application for robotic arms. In: 2012 10th international symposium on electronics and telecommunications, electronics and telecommunications (ISETC), pp 95–98. https://doi.org/10.1109/ISETC.2012.6408119 2. Kruthika K, Kiran Kumar BM, Lakshminarayanan S (2016) Design and development of a robotic arm. In: 2016 International conference on circuits, controls, communications and computing (I4C). IEEE, pp 1–4 3. Gautam R, Gedam A, Zade A, Mahawadiwar A (2017) Review on development of industrial robotic arm. Int Res J Eng Technol (IRJET) 4(03) 4. Wu S, Ze Q, Dai J, Udipi N, Paulino GH, Zhao R (2021) Stretchable origami robotic arm with omnidirectional bending and twisting. Proc Natl Acad Sci 118(36):e2110023118 5. Kadir WMHW, Samin RE, Ibrahim BSK (2021) Internet controlled robotic arm. Proc Eng 41:1065–1071
102
K. C. Sriharipriya et al.
6. Dutta A, Tannivar M, Hande B, Sengupta D, Maji A, Shree O, Sarkar M (2021) Robotic arm utilization in global Covid-19 management. In: 2021 5th International conference on electronics, materials engineering & nano-technology (IEMENTech), pp 1–7. https://doi.org/ 10.1109/IEMENTech53263.2021.9614905 7. Yang A, Chen Y, Naeem W, Fei M, Chen L (2021) Humanoid motion planning of robotic arm based on human arm action feature and reinforcement learning. Mechatronics 78. https://doi. org/10.1016/j.mechatronics.2021.102630 8. Gungor AK, Kiyak I (2021) Automatic charging of electric driverless vehicles with robotic arm. In: 2021 Innovations in intelligent systems and applications conference (ASYU), pp 1–6. https://doi.org/10.1109/ASYU52992.2021.9598962 9. Krishnaraj Rao NS, Avinash NJ, Rama Moorthy H, Karthik K, Rao S, Santosh S (2021) An automated robotic arm: a machine learning approach. In: 2021 IEEE international conference on mobile networks and wireless communications (ICMNWC), December 3, pp 1–6. https:// doi.org/10.1109/ICMNWC52512.2021.9688512 10. Brooks S, Townsend J, Collins C, Carsten J, Frost M, Reid J, Robinson M, Warner A (2022) Docking the Mars 2020 perseverance robotic arm. In: 2022 IEEE aerospace conference (AERO), March 5, pp 1–12. https://doi.org/10.1109/AERO53065.2022.9843517 11. Mourtzis D, Angelopoulos J, Panopoulos N (2022) Closed-loop robotic arm manipulation based on mixed reality. Appl Sci 12(6):2972 12. Tasnim S, Zahid Hasan Md, Ahmed T, Tanvir Hasan Md, Uddin FJ, Farah T (2022) A ROSbased voice controlled robotic arm for automatic segregation of medical waste using YOLOv3. In: 2022 2nd international conference on computer, control and robotics (ICCCR), March 18, pp 81–85. https://doi.org/10.1109/ICCCR54399.2022.9790152 13. Rooban S, Joseph SIT, Manimegalai R, Sai Eshwar IV, Uma Mageswari R (2022) Simulation of pick and place robotic arm using coppeliasim. In: 2022 6th International conference on computing methodologies and communication (ICCMC), March 29, pp 600–606. https://doi. org/10.1109/ICCMC53470.2022.9754013 14. Babu D, Nasir A, Farag M, Muhammad Sidik MH, Rejab SBM (2022) Development of prosthetic robotic arm with patient monitoring system for disabled children; preliminary results. In: 2022 9th International conference on electrical and electronics engineering (ICEEE), March 29, pp 206–212. https://doi.org/10.1109/ICEEE55327.2022.9772565 15. Kagaya K, Yu B, Minami Y, Nakajima K (2022) Echo state property and memory in octopusinspired soft robotic Arm. In: 2022 IEEE 5th international conference on soft robotics (RoboSoft), April 4, pp 224–230. https://doi.org/10.1109/RoboSoft54090.2022.9762119 16. Szasz R, Allenspach M, Han M, Tognon M, Katzschmann RK (2022) Modeling and control of an omnidirectional micro aerial vehicle equipped with a soft robotic arm. In: 2022 IEEE 5th international conference on soft robotics (RoboSoft), April 4, pp 1–8. https://doi.org/10.1109/ RoboSoft54090.2022.9762161 17. Ouyang W, He L, Albini A, Maiolino P (2022) A modular soft robotic arm with embedded tactile sensors for proprioception. In: 2022 IEEE 5th international conference on soft robotics (RoboSoft), April 4, pp 919–924. https://doi.org/10.1109/RoboSoft54090.2022.9762156 18. Yuvaraj S, Badholia A, William P, Vengatesan K, Bibave R (2022) Speech recognition based robotic arm writing. In: Proceedings of international conference on communication and artificial intelligence. Springer, Singapore, pp 23–33 19. Wegrowski P, Thomas W, Lemrick J, Deemyad T (2022) Advanced folding robotic arm for quadcopters. In: 2022 Intermountain engineering, technology and computing (IETC), May 1, pp 1–6. https://doi.org/10.1109/IETC54973.2022.9796687 20. Thomas W, Wegrowski P, Lemirick J, Deemyad T (2022) Lightweight foldable robotic arm for drones. In: 2022 Intermountain engineering, technology and computing (IETC), May 1, pp 1–6. https://doi.org/10.1109/IETC54973.2022.9796899 21. Wu K, Chen L, Wang K, Wu M, Pedrycz W, Hirota K (2022) Robotic arm trajectory generation based on emotion and kinematic feature. In: 2022 International power electronics conference (IPEC-Himeji 2022-ECCE Asia), May 15, pp 1332–1336. https://doi.org/10.23919/IPEC-Him eji2022-ECCIE53331.2022.9807205
Bluetooth Controlled Integrated Robotic Arm with Temperature …
103
22. Ndambani MA, Fang T, Saniie J (2022) Autonomous robotic arm for object sorting and motion compensation using Kalman filter. In: 2022 IEEE international conference on electro information technology (EIT), May 19, pp 488–491. https://doi.org/10.1109/eIT53891.2022.981 4075 23. Oikonomou P, Dometios A, Khamassi M, Tzafestas CS (2022) Reproduction of human demonstrations with a soft-robotic arm based on a library of learned probabilistic movement primitives. In: 2022 International conference on robotics and automation (ICRA), May 23, pp 5212–5218. https://doi.org/10.1109/ICRA46639.2022.9811627 24. Ji S, Shin J, Yoon J, Lim K-H, Sim G-D, Lee Y-S, Kim DH, Cho H, Park J (2022) Three-dimensional skin-type triboelectric nanogenerator for detection of two-axis robotic-arm collision. Nano Energy 97:107225 25. Dai Y, Xiang C, Zhang Y, Jiang Y, Qu W, Zhang Q (2022) A review of spatial robotic arm trajectory planning. Aerospace 9(7):361 26. Chen C-S, Li T-C, Hu N-T (2022) The gripping posture prediction of eye-in-hand robotic arm using Min-Pnet. In: 2022 International conference on advanced robotics and intelligent systems (ARIS), August 24, pp 1–5. https://doi.org/10.1109/ARIS56205.2022.9910442 27. Huang CC, Chang CL (2022) Design and implementation of wire-driven multi-joint robotic arm. In: 2022 international conference on advanced robotics and intelligent systems (ARIS), August 24, pp 1–6. https://doi.org/10.1109/ARIS56205.2022.9910455 28. Zainuddin Yaacob MS, Sadun AS, Abang Zulkarnaini AMZ, Jalani J (2022) Development of a low-cost teleoperated and semi-autonomous robotic arm. Indonesian J Electr Eng Comput Sci 27(3):1338–1346. https://doi.org/10.11591/ijeecs.v27.i3.pp1338-1346 29. Aboud WS, Aljobouri HK, Al-Amir HSA (2022) A robust controller design for simple robotic human arm. Indonesian J Electr Eng Inform 10(3):655–667. https://doi.org/10.52549/ijeei. v10i3.3895 30. Marnell A, Shafiee M, Sakhaei AH (2022) Designing and manufacturing an android-controlled robotic arm using rapid prototyping. In: 2022 27th International conference on automation and computing (ICAC), September 1, pp 1–6. https://doi.org/10.1109/ICAC55051.2022.9911120 31. Wang Z, Zhang M, Xu Y (2022) Development of a robotic arm control platform for ultrasonic testing inspection in remanufacturing. In: 2022 27th International conference on automation and computing (ICAC), September 1, pp 1–6. https://doi.org/10.1109/ICAC55051.2022.991 1174 32. Gadikar N, Naik I, Joshi S, Jain T, Swaroop Y, Panchal M, Sharma S (2022) Development of twin surgical robotic arm system for invasive surgery. In: 2022 1st International conference on technology innovation and its applications (ICTIIA), September 23, pp 1–6. https://doi.org/10. 1109/ICTIIA54654.2022.9935952 33. Palmieri P, Melchiorre M, Mauro S (2022) Design of a lightweight and deployable soft robotic arm. Robotics 11(5). https://doi.org/10.3390/robotics11050088 34. Montoya Angulo A, Pari Pinto L, Sulla Espinoza E, Silva Vidal Y, Supo Colquehuanca E (2022) Assisted operation of a robotic arm based on stereo vision for positioning near an explosive device. Robotics 11(5). https://doi.org/10.3390/robotics11050100
Image Dehazing Using Generic Model Agnostic Convolutional Neural Network Gurditya Khurana, Rohan Garodia, and P. Saranya
Abstract The most frequent environmental factor that affects image quality and image analysis is fog. A coherent generative approach to image organization is suggested in this study. We require the construction of a fully convolutional neural network in order to recognize the haze pattern in the input photo and restore a clear, fog-free image. The suggested method is agnostic because it doesn’t take into account air scattering models. Surprisingly, even SOTS outdoor images created using atmospheric scattering models outperform current state-of-the-art image decluttering techniques. Many modern apps use visual data analysis to find patterns and make decisions. Intelligence and control systems are some instances where clear pictures are crucial for precise outcomes and dependable performance. Environmentally generated distortions, the most frequent haze and fog, could, however, have a considerable impact on such systems. Therefore, the challenge of recovering high-quality images from their fuzzy equivalents has received attention in the vision community. The dehaze problem is the common name for that issue. This work proposes a dehazeNN that simply focuses on creating a blur-free version of the input picture, offering a novel and more flexible approach to the dehaze problem. It constructs an encode and decode architecture using the most recent developments in deep learning in order to recover the clear image, while entirely removing the estimation problem. The method might also be able to spot intricate haze patterns that the atmospheric scattering model missed, which were present in the training data. A convolutional neural network called the generic model-agnostic convolutional neural network (GMAN) has been proposed for the elimination of haze and the restoration of clear images. An end-to-end deep learning system for photo denoising uses encoder-decoder networks. Keywords Image dehazing · Deep learning · Convolutional neural network
G. Khurana · R. Garodia · P. Saranya (B) Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_8
105
106
G. Khurana et al.
1 Introduction Many modern apps use visual data analysis to find patterns and make decisions. Intelligent monitoring, tracking, and controlling the systems can require sharpening pictures for accurate results and reliable images. We take advantage of the latest method in machine learning to build encode/decode architectures trained to recover sharp images while completely eliminating parameter estimation problems instantly [1]. The proposed method may also be able to detect complex haze patterns that were missed by the atmospheric scattering model but were present in the training data. The use of haze-generating models has wide applications in the fields of computer vision and image processing. This model is often used to develop images in adverse weather conditions. Atmospheric particles range in size from 1 to 10 µm, and the presence of these particles in aerosols affects image quality. The quantity of particles in the atmosphere is influenced by the weather. Calculating measurements of the particles responsible for the visual qualities has taken a lot of work [2]. Therefore, simple weather conditions are divided into two categories: constant and dynamic. For degraded weather images, mainstream image processing applications give mediocre results. As a result, dehaze algorithms are becoming increasingly important in various applications such as aerial photography, object detection, image retrieval, and object analysis. Atmospheric acquisition deviations have also been observed from severe meteorological conditions consisting of haze, fog, haze, smoke from the outdoor landscape or beautiful decomposition of other media, resulting in several problems, such as automated monitoring systems [3]. Digital images taken outdoors in scenic environments are effectively polluted by the haze that degrades the information being transmitted. Haze is a natural phenomenon that obscures the landscape, narrows vision, or changes color. Image dehazing is a technique that is gaining popularity for restoring images of the natural world that have degraded due to low visibility weather, dust, and other factors. The advancements in autonomous systems and platforms have increased the requirement for low-complexity, high-performing dehazing solutions. While contemporary learning-based picture dehazing systems usually add complexity at the expense of dehazing performance, which has recently improved, the use of priorbased approaches persists despite their poorer performance [4]. A frame dehaze using color attenuation priority is based on haze lines. In this article, we propose a new frame-by-frame deblurring method for synthetic and real blurry images. Degradation is pixel-by-pixel and depends on the distance of the scene point from the camera. The model DehazeNet used for removing haze from a single image is a difficult task. Many constraints and priorities are used in existing approaches to provide a realistic haze removal solution. To achieve haze removal, it is essential to estimate the average transmittance map of the blurred input image. This model fails in places where slightly inaccurate air lights lead to poor performance. The paper is structured in such a manner: Sect. 1 contains the abstract of the project, and Sect. 2 contains the introduction of the model. Section 3 contains the literature survey of the relative work for image dehazing for several types of research.
Image Dehazing Using Generic Model Agnostic Convolutional Neural …
107
Section 4 includes system architecture and design. Section 5 proposes methodology, Sect. 6 includes results and discussion, and Sect. 7 concludes all the work.
2 Related Work The GMAN proposed in this article explores new approaches to solving the problem of haze removal. GMAN learned to take blurry structures in pictures and get clear structures using the fully convolutional architecture of the encoder/decoder without using an atmospheric scattering model. GMAN is an end-to-end generative network that employs encoder-decoder architectures with a down and upsampling factor of 2. Using blocks with 64 channels of convolution, its initial two layers are constructed. Following them are two-step down sampling layers that encode the input image into a 56 × 56 × 128 volume. Moreover, it avoids estimating the deemed superfluous parameters A and t. The CNN’s (potential for GMAN) experimental findings proved its ability to produce images devoid of fog and showed that it could correct several typical mistakes made by state approaches, such as B, dark colors and excessively sharp edges. Also, the architecture of CNN (generic GMAN) may open up new avenues for investigation into generic image recovery in the future. Indeed, we anticipate that the network will generalize as a result of training and design optimization. The current model not only provides a better solution to the problem of fog removal, but is also a step forward in the development of an image recovery model [5]. Yang and Evans [1] improved single frame organizing method for (radar crosssection) RCS—frame organizing handles the deteriorating effects of bad weather, dust, and other factors on photos in nature is a method that is becoming more popular. The development of autonomous systems and platforms has resulted in improved performance of dehaze solutions. We require an autonomous system. Wang [13] proposed a frame dehaze using color attenuation priority based on haze lines. For the Artificial and Actual fuzzy Photos this model provide frame-by-frame deblurring technique. The constant scattering coefficient has been proposed to be replaced by a dynamic scattering coefficient, an exponential function of picture depth. According to experimental findings, the deblurred images produced by the suggested algorithm are more transparent and realistic than those produced by the previous color attenuation. The haze effect can be effectively improved by the suggested algorithm. In the presence of extremely thick haze particles, the issue in this model is useless. Cai [2] introduced DehazeNet. Removing haze from a single image is a difficult task. Many constraints and priorities are used in existing approaches to provide a realistic haze removal solution. To achieve haze removal, it is essential to estimate the average transmittance map of the blurred input image. This model fails in places where slightly inaccurate air lights lead to poor performance. Berman [16] proposed removing non-local image haze to limit visibility and reduce image contrast in outdoor photography. Degradation is pixel-by-pixel and depends on the distance of the scene point from the camera. This dependence is expressed as a transmission coefficient, which controls the attenuation of the scene
108
G. Khurana et al.
and the amount of haze for each pixel. This model fails in places where the air light is much brighter than the scene. These models face so many challenges as they fall in places where the air light is very bright; these models always require an autonomous system for doing the experiments and are sometimes ineffective in places where there are thick haze particles. So, to solve all these problems, we proposed a model image dehazing using GMAN.
3 Proposed Methodology Since frame fog removal is an inappropriate task, a (DotNetNuke) DNN based on convolution, continuous, and deconvolution models is designed and trained to take the blurred picture and restore the fog-free picture. Deblurring method: Haze turns a colored image into a whitish one and can result in loss of image detail and reduced parallax. Similarly, haze poses problems for various applications, including targeted direct monitoring and indirect detection, tracking and measurement. Clean up your images to remove haze from your photos, improve your view of the scene, and enhance the overall visual effect [5]. Haze removal is the biggest challenge associated with mathematical ambiguity. However, image organization is essential for computer vision applications. As a result, most researchers have attempted to tackle these problematic tasks using various dehaze algorithms. Image enhancement, image fusion, and image restoration are three categories of clean up methods. There are two categories of single-image restoration. Single-image dehaze, which requires only one image as input and multi-image dehaze, which requires two, three, or more images of the same POI. Both methods fall into different categories. Haze removal algorithms that take only one image as input can classify an image into three main types: prior probability or hypothesis-based algorithms. This method removes the haze from the image while evaluating the haze image parameters of the model and yields satisfactory results [6]. In their technique two sources of information were obtained from the first authentic picture weight by three weight maps (luminance, saturation, and enhancement) to obtain a multiscale blended in combination to remove haze effects. This method has recently attracted the interest of researchers. Flow of the proposed GMAN model is shown in Fig. 1. Conv2D is frequently used to detect features, such as in the encoder part of an autoencoder model, and it may result in the input shape of your input model shrinking. Contrarily, Conv2DTranspose is employed to create features, such as those found in the decoder portion of an autoencoder model for creating images. While Conv2D may make your input larger and is used to identify features in an image, Conv2DTranspose may make your input smaller and be used to create these features. Loading of the image includes loading the image in the form of tensors, a function to get the path of the individual image by adding folders like clear image, hazy image, and hazy clear image and loading the tensor image data in batches. This function includes shaping the image and window size, normalizing the filter of the image, also
Image Dehazing Using Generic Model Agnostic Convolutional Neural … Fig. 1 Flow of the proposed GMAN model
109
110
G. Khurana et al.
calculating the weighting function by using rows, columns, and num filters, and precompute the constants that are later needed in the optimization steps. This employs an encoder-decoder-based end-to-end generative method to address the dehazing problem. The first two layers that an input picture encounter contain convolution blocks with 64 channels. They are followed by two downsampling blocks (encoders) with stride 2. The encoded picture then came into a level of four leftover blocks. Each block contains a shortcut link (same as ResNets) [7]. This leftover layer contributes to understanding the hazy structure. Following these, the upsampling or deconvolutional (decoder) layer reconstructs the results of the residual layers. The input image (global residual layer) is combined with the final two layers (convolutional blocks) to produce a dazed image from the upsampled feature map—this scene’s overall residual layer aids in capturing the boundary properties of objects at various depths. The encoder component of the architecture helps to reduce the image’s dimension before supplying the downsampled image to the residual layer to recover the image’s features. The decoder component will then learn and recreate the lost data from the blur-free image. They were pruning the network from the pre-trained network. In ripping the pruned model, we must remove the wrappers added to the network. Performance evaluation includes importing the required modules and checking if the weights are pruned or not correctly, checking all the parameters window size, comparing the sizes and adequately checking the folder path of the image. We have used naturally hazed images taken from our own campus instead of taking a day there is a problem with the size of the image, then resizing it according to the parameters. They are calculating the boundary size as a final training constraint. Based on the original training function, we refine the estimate of the transmission [8]. A complete convolutional neural network (CNN) has been proposed. It was used to remove fogging from the input image. This is an end-to-end generative network with a 2-sample down and upsampling encoder/decoder structure [9]. The top two layers consist of 64-channel convolution modules. This is followed by a two-level downsampling layer that encodes the input image into a 56 × 56 × 128 volume. After encoding, the image is placed in a continuous layer. The continuous layer consists of four continuous blocks, each with an association link. The subsequent unfolding layer samples the output of the continuous layer and remakes the new 224 × 224 × 64 volume for subsequent convolution, marking the transition from encoding to decoding. Proposed architecture of GMAN model is shown in Fig. 2. A well-known encoder/decoder design used to address the denoising problem is the basis for the proposed GMAN architecture. It consists of three components: decoder, hidden layer, and encoder. This architecture enables in-depth network training while reducing the size of the data. Since haze is a type of error, the encoded output is downsampled and sent to the continuous layer to recover crucial properties. The network discards error data while preserving the best aspects of the original image. Throughout the decoding phase, the decoder component is expected to develop functionality, reconstruct missing data from fog-free images considering the statistical distribution of the input data [10].
Image Dehazing Using Generic Model Agnostic Convolutional Neural …
111
Fig. 2 Proposed architecture of GMAN model
This network employs relative learning at both the local and global levels. A local residual layer is constructed using the residual blocks from the hidden layer immediately after downsampling. For easy training, we use the virtual and empirically proven capacity of the residual block [11]. While recogination of blurry image the entire model of the GMAN is characterised by residual learning. The first input picture is passed to the sum operator and the final convo layer’s final picture to produce the single global continuous block. A well-known two-component loss function is used to train the proposed GMAN. The first determines how well the results match the real-life situation and help to make a pleasing picture. The most popular method for proving an algorithm’s efficacy is to compute the difference between the output image and the source image using the peak signal-to-noise ratio (PSNR). MSd is therefore chosen as the first component of the linear mean square error (LMSE). Perceptual loss in many well-known picture restoration issues, MSE loss, is employed to evaluate the resulting image’s quality. The loss, nevertheless, may not necessarily provide a solid indicator of the visual impact. Conv2D and Conv2DTranspose routines were employed during creation [12]. The first is the GMAN network, which has three channels in the final output layer and 64 filters altogether (excluding the encoding layers, which have 128). The parallel network (PN), is a convolutional network with all of the dilated layers. This includes 64 filters, with the exception of the final layer, which has three channels and is modeled after GMAN [13]. How to custom train a model is the most important thing I discovered. In general, forecasting appears exciting, but this is the actual training methodology. We have a training loop and a validation loop for every epoch. The training loss is then calculated using the gradients, which were computed using the training data. The output (using the display img function) and validation loss are analyzed using the gradients computed in that epoch utilizing validation data in the validation loop. Lastly, we reset the loss measures and store the model (weights, variables, etc.) from that epoch. The kernel weights are not initialized at zero; instead, random normal initialization produces superior results. To reduce overfitting, an L2 regularizer with a weight decay of 1e-4 is also utilized. Not every layer has the same kernel initialization, so keep that in mind [14].
112
G. Khurana et al.
4 Results and Discussions In terms of performance, the proposed GMAN outperforms many state-of-the-art technologies. It outperforms all its competitors. An outdoor dataset is considered. In addition, as shown in the figure, GMAN prevents object edges from being too sharp and image tints from being too dark. On the other hand, GMAN performs well in regions of high-depth values in the target image but performs poorly in regions of moderate depth. This is due to the dynamic method, which attenuates the light intensity of the defogged image and causes color distortion in areas with high depth values (such as the sky). As a result, the model GMAN can address issues and produce a clearer picture. We also tested the network using an indoor dataset. This achievement is not very noticeable, ranking only after DehazeNet, gated fusion network (GFN) and all-in-one dehazing network (AOD-Net). The great potential of model-independent dehaze approaches can already be seen in indoor datasets. Networks, GFN ranks first for structural similarity index (SSIM) and second for PSNR, putting him almost on par with DehazeNet in the first place. Our preliminary results show that by integrating and generalizing the underlying concepts of GMAN, the Outperforms outdoor (Synthetic Objective Testing Set) SOTS dataset and indoor. Follow-up investigations will clarify this issue. They are training the model successfully by uploading clear and hazy images (dataset from Kaggle). Once the model is trained, we can use the model by passing in our test image dataset called hazy_test_ images, uploaded hazy_test_images to the trained model and implementing dehazing on these test images.
4.1 Running Time Comparison In the proposed model, 50 photos from TestSet are to run on the MacBook Pro with all models. Table 1 displays the overall average running time per image for each model. Although the algorithm’s speed is not the fastest, it may be utilized for real-time dehazing and provides the best dehazing effects. As a result, when the algorithm is taken into account completely, video defogging may be an application. The running time of all the models is given in Table 1 [15]. Table 1 Runtime comparison
Models
Run time
DCP
0.92
DEHAZENET
0.51
MSCNN
0.47
NIN-DEHAZENET
0.39
GMAN (proposed model)
0.38
Image Dehazing Using Generic Model Agnostic Convolutional Neural … Table 2 PSNR value comparison
Models
PSNR value
DCP
17.81
DEHAZENET
18.03
MSCNN
18.25
NIN-DEHAZENET
18.48
GMAN (proposed model)
20.53
113
Fig. 3 a Runtime comparison. b PSNR value comparison
4.2 PSNR Value Comparison Utilizing the peak signal-to-noise ratio, the proposed algorithm is evaluated against the most advanced dehazing techniques (PSNR). Table 2 reveals that the strategy which has been suggested performs better than others in terms of PSNR [15]. Figure 3a and b represents the runtime comparison and PSNR value comparison of all the existing models. The proposed GMAN in this study explores a novel strategy for the dehaze problem. Because to its encoder-decoder completely convolutional design, GMAN learns to capture haze structures in photographs and restore the clear ones without requiring the atmosphere scattering model. The ability of GMAN to produce images free of haze and to avoid some of the common downsides of cuttingedge approaches, such as color darkening and excessive edge sharpening, has been proved through experimental results. Furthermore, the general architecture of GMAN might provide as a starting point for future research on general picture restoration. Our network should be able to generalize to capture other types of visual noise and distortions with practice and a few design tweaks. Our network should be able to generalize to capture other types of visual noise and distortions with practice and a few design tweaks [16].
5 Conclusion GMAN’s ability to produce fog-free images while avoiding some of the shortcomings of state technologies, such as B, experimentally proven to darken colors and oversharpen edges. Additionally, GMAN generic architecture may serve as a starting
114
G. Khurana et al.
point for future research on generic image recovery. Indeed, with training and some design modifications, we expect the network to be able to generalize to a wide range of visual noise and distortions. In this regard, current research not only provides more effective haze removal solutions but also advances the development of general image restoration techniques. In this work, GMAN investigates a new approach to the haze removal problem. Due to its fully convolutional encoder/decoder design, GMAN is capable of capturing haze structures in images and restoring well-defined structures without computing the parameters A and t(x) according to atmospheric scattering models, which are considered unnecessary. Learn. It also retains GMAN’s ability to produce fog-free images confirmed by experimental results and to avoid some common shortcomings of modern techniques, such as B, dark colors and overly sharp. The primary takeaways from this work are a GMAN architecture, research into the dehazing domain, and advice on how to handle inputs where both the features and the labels both are images. Further research will be done to enhance the performance of the suggested model and combine it with the object detection component utilizing deep sort to dehaze the traffic video as well.
References 1. Yang G, Evans AN (2021) Improved single image dehazing methods for resource-constrained platforms. J Real-Time Image Proc 18:2511–2525 2. Cai B, Xu X, Jia K, Qing C, Tao D (2016) DehazeNet: an end-to-end system for single image haze removal. IEEE Trans Image Process 5187–5198 3. Tang H, Li Z, Zhong R, Zhang J, Fang X (2021) Sky-preserved image dehazing and enhancement for outdoor scenes. In: 2021 IEEE 4th international conference on electronics technology (ICET). Chengdu, China, pp 1266–1271 4. Tang S, Meng Z (2022) Positive-and-negative learning for single image dehazing. In: 2022 7th International conference on intelligent computing and signal processing (ICSP). Xi’an, China, pp 1879–1883 5. Shu Q, Wu C, Xiao Z, Liu RW (2019) Variational regularized transmission refinement for image dehazing. In: 2019 IEEE international conference on image processing (ICIP). Taipei, Taiwan, pp 2781–2785 6. Zhang H, Li J, Li L, Li Y, Zhao Q, You Y (2011) Single image dehazing based on detail loss compensation and degradation. In: 2011 4th international congress on image and signal processing. Shanghai, China, pp 807–811 7. Ren X, Tang C, Wang B, Su H, Li X (2020) Single image with large sky area dehazing based on structure-texture decomposition. In: 2020 IEEE 6th international conference on computer and communications (ICCC). Chengdu, China, pp 415–419 8. Yeh CH, Huang CH, Kang LW, Lin MH (2018) Single image dehazing via deep learning-based image restoration. In: 2018 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). Honolulu, HI, USA, pp 1609–1615 9. Liu Y, Rong S, Cao X, Li T, He B (2020) Underwater image dehazing using the color space dimensionality reduction prior. In: 2020 IEEE international conference on image processing (ICIP). Abu Dhabi, United Arab Emirates, pp 1013–1017 10. Min X et al (2019) Quality evaluation of image dehazing methods using synthetic hazy images. IEEE Trans Multimedia 21(9):2319–2333 11. Wang W, Yuan X (2017) Recent advances in image dehazing. IEEE/CAA J Autom Sinica 4:410–436
Image Dehazing Using Generic Model Agnostic Convolutional Neural …
115
12. Huang Y, Chen X (2021) Single remote sensing image dehazing using a dual-step cascaded residual dense network. In: 2021 IEEE international conference on image processing (ICIP). Anchorage, AK, USA, pp 3852–3856 13. Wang Q, Zhao L, Tang G, Zhao H, Zhang X (2019) Single-image dehazing using color attenuation prior based on haze-lines. In: 2019 IEEE international conference on big data (big data). Los Angeles, CA, USA, pp 5080–5087 14. Zheng M, Qi G, Zhu Z, Li Y, Wei H, Liu Y (2020) Image dehazing by an artificial image fusion method based on adaptive structure decomposition. IEEE Sens J 20(14):8062–8072 15. Dong XM, Hu XY, Peng SL, Wang DC (2010) Single color image dehazing using sparse priors. In: 2010 IEEE international conference on image processing. Hong Kong, China, pp 3593–3596 16. Kudo Y, Kubota A (2018) Image dehazing method by fusing weighted near-infrared image. In: 2018 international workshop on advanced image technology (IWAIT). Chiang Mai, Thailand, pp 1–2 17. Berman D, Treibitz T, Avidan (2016) Non-local image dehazing 1674–1682
A Novel Approach to Build Privacy and Trust in Vehicle Sales Using DID Aju Mathew Thomas , K. V. Lakshmy , R. Ramaguru , and P. P. Amritha
Abstract Decentralized identity solutions, built on the concept of Self-Sovereign Identity (SSI), have gained a competitive edge over existing identity management (IM) systems. This paper discusses the significance of decentralized identifier (DID) and verifiable credential (VC) in a peer-to-peer application for selling and managing pre-owned vehicles. The application uses Ethereum’s ERC-1056 lightweight DID standard. We aim to comply with the general data protection regulation (GDPR) requirements of the European Union. We propose implementing JSON Web Tokens (JWT) to store the user encoded information locally and a private interplanetary file system (IPFS) to maintain encrypted and encoded vehicle data information for improved privacy. Additionally, the ERC-721 standard is used to tokenize the vehicle to create the digital twin. Finally, we add the VC to the digital twin of the vehicle to increase the trust in the proposed model. The results demonstrate that our proposed solution offers more trust between the users and privacy of user and vehicle data. Furthermore, we also compute and compare the average cost of user DID creation using ERC-1056 and ERC-725, and the proposed solution is more cost-effective than the existing solutions. Keywords Decentralized identity · Verifiable credential · Blockchain technology · Ethereum · Interplanetary file system · ERC-1056 · ERC-725 · ERC-721
A. M. Thomas (B) · K. V. Lakshmy · R. Ramaguru · P. P. Amritha TIFAC-CORE in Cyber Security, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] K. V. Lakshmy e-mail: [email protected] R. Ramaguru e-mail: [email protected] P. P. Amritha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_9
117
118
A. M. Thomas et al.
1 Introduction Personal data is regarded as an essential component in the current digital era. Recently, digital surveillance and security breaches have increased, highlighting the need for improved privacy and security, particularly with regard to users’ personal information. Blockchain and distributed ledger technologies (DLT) enable the protection of data from manipulation via decentralized identity and other privacy mechanisms. Blockchain technology is a decentralized computation and distributed ledger platform that uses a rational decision-making process among multiple parties in an open and public system to immutably store transactions in a verifiable manner. Blockchain-based IM solutions are seen as the next big revolution in addressing the problems that the government faces in disbursing schemes to qualified beneficiaries. The identity and specific attributes are stored in the blockchain at this point, and these attributes can be verified using a VC [1]. Decentralized identifiers (DIDs) [2] are a revolutionary new concept in the realm of IM. DIDs are globally unique and persistent identifiers that provide a decentralized and verifiable digital identity. It is a World Wide Web Consortium-developed standard (W3C). DIDs are a critical component of Self-Sovereign Identity (SSI) since they empower us to manage our digital identities independently of a central authority. A DID corresponds to a document containing a sequence of assertions concerning the user’s identity. The zero-knowledge protocol (ZKP) enables speedier verification, privacy protection, and selective dissemination of information. A verifiable credential (VC) is information about an entity’s cryptographically trustworthy history, such as its name, government identification number, home address, email address, and academic degree. This is accomplished through the use of a digital signature that can be validated using the Issuer DID’s public key. DID in conjunction with a VC is gaining traction, and many businesses have begun implementing it. The concept of DID can be used in a variety of real-world scenarios, including medical records [3], academia [4], and vehicle management. For example, in the case of vehicle management, we can use a DID to identify each vehicle and track its use, maintenance, and ownership. In conjunction with blockchain technology, DID can be used to track the details of used vehicles. It instills participants with trust, transparency, and auditability in the vehicle-related ecosystem. By leveraging the blockchain properties, customers no longer worry about getting unreliable vehicle condition information from used vehicle dealers. It enables customers to obtain authentic vehicle information while ensuring that the system is transparent to all stakeholders and is impenetrable to tampering. Other Ethereum-based identity standards include ERC-725 [1], ERC-1484 [5], and ERC-1207 [6]. ERC-725 is a DID-compliant standard that enables users on the Ethereum network to be identified. These criteria may also be applied in real-world situations, such as the insurance industry. However, the high gas costs associated with implementing an identity smart contract for each user make this method uneconomical, and it has been deprecated. ERC-1484 is a digital IM standard that is DID compatible and permits the aggregation of digitally identifiable information. Due to its DID compliance, it may be used in conjunction with the ERC-725 and ERC-1056
A Novel Approach to Build Privacy and Trust in Vehicle Sales …
119
frameworks without the need to link to a resolver smart contract. The ERC-1207 protocol is based on the DAuth Access Delegation protocol. It utilizes the DAuth protocol, which enables identity delegation between smart contracts without requiring the user’s private key. The following are a few of the issues inherent in the existing vehicle management system. The usual operating method begins when a prospective customer seeking to sell their vehicle registers on the portal, provides information about the vehicle, and requests an appointment for a physical examination by their team. Cars24.com, OLX, and auto-portal are just a few of the major players in India that focus on pre-owned automobile sales to prospective clients. Regrettably, the entire verification process may take time. It can take a few days to several months to complete. Currently, all of these organizations keep information in a centralized manner. The primary concern with centralized systems is a lack of openness and trust. Customers frequently confront situations in the used car market where they are unsure whether the information provided by the used vehicle dealer accurately describes the vehicle’s actual condition. The recorded data may be erroneous and unreliable, as numerous organizations preserve it in their databases. It is an inefficient and time-consuming business process [7]. As a result, these used automobile statistics’ validity, correctness, and fairness are worth addressing. Additionally, data housed in centralized systems are vulnerable to cyber attacks, which can result in data breaches [8]. Additionally, there are issues about data privacy violations, manipulations, and infringements, all of which culminate in a single point of failure [9]. This research study aims to demonstrate the feasibility of using a DID and VC to improve the trust and reliability of vehicle sales data. Due to the ongoing concerns about data privacy in existing centralized systems, this study was conducted to develop an alternative solution for vehicle sales built on the Ethereum blockchain, which enhances data immutability and reliability. Our paper aims to address the challenges associated with vehicle management by utilizing the Ethereum blockchain to eliminate the need for a third party, DID to provide control over the user’s data, and VC to foster user trust. Additionally, we introduced the ERC-721 [10] standard for identifying and transferring ownership of each vehicle. DIDs are generated by the ERC-1056 standard for the user and their vehicles. Additionally, the paper proposes decoupling business logic and data. The vehicle-related data is securely stored in a private IPFS after encrypting and encoding with the AES-256 algorithm and JWT. The business logic is stored as a smart contract on the Ethereum blockchain. Thus, it makes the proposed solution GDPR compliant. The rest of the paper is structured as follows. Section 2 addresses the related works, focusing on various blockchain-based vehicle applications that have been proposed and are in use. Furthermore, we have outlined a few different DID solutions that have been suggested over the years. Section 3 outlines our proposed work, which employs the ERC-1056 standard for creating DIDs and issuing credential and the ERC-721 standard for creating tokens and private IPFS for secured-distributed information storage. Section 4 discusses and analyzes the implementation results. Finally, Sect. 5 summarizes and concludes the paper.
120
A. M. Thomas et al.
2 Related Work This section explores a few well-known research works on DID solutions that have gained traction in recent years and their application in real-world scenarios. The increase in the number of Web applications and services has raised concerns about privacy and PII protection. We also talked about some of the research projects that proposed blockchain framework for vehicle management. VC received increasing attention during the COVID-19 pandemic when Eisenstadt et al. [11] developed a mobile application that issues COVID-19 Antibody Test/ Vaccination Certificate (CAT/VC), which could be used as an immunity passport to enter workplaces. The goal was to create a tamper-proof and privacy-protected credential about COVID-19 test results that any verifier could instantly verify. This application is built on solid, a decentralized personal data platform, which incorporates DID and VC features. The credentials are stored on the end-user device; only the reference hashes are stored on Ethereum consortium blockchain for faster verification. However, the data stored in the solid platform is unencrypted. The user photo also gets embedded in the credential, eliminating the need for physical verification in future. This design leads to the disclosure of sensitive data in the application. Another unique implementation of DID and VC is mentioned by Lagutin et al. [12], where the authors built a framework on Hyperledger Indy using DIDs, VC, and an OAuth-based authorization server to provide authentication and authorization for resource-constrained IoT devices. Additionally, the authors proposed a practical use case for issuing a VC to a visiting lecturer to access a university’s infrastructure, say a printer. The goal was to eliminate reliance on X.509 certificates, which are costly and have privacy implications. Blockchain can revitalize the automotive industry by ensuring secure transactions, fostering trust, reducing fraud, and storing immutable records decentralized. One such work was published in the automotive industry in which the authors [13] implemented a blockchain-based framework based on hyperledger fabric that provides comprehensive solutions for vehicle life cycle tracking, from registration to scrap. The framework primarily focuses on insurance issuance, used vehicle transfer, accident and violation management, and CouchDB stores vehicle-related data. The application is intended to involve all stakeholders typically associated with the life cycle of a vehicle. Valastin et al. [14] introduced a peer-to-peer car-sharing solution using two distinct ERC-721 tokens and one ERC-20 token in the paper. The first ERC-721 token represents car assets, while the second ERC-721 token represents the unlock token for the car. ERC-20 tokens are used to reward users for taking advantage of discounts based on their usage. The solution uses IPFS to store the image of the car token to reduce the cost of Ethereum transactions and the Pinata platform as a gateway to IPFS to improve overall service speed. Aswathy et al. [8] proposed and created a blockchain vehicle database (BVD) to combine vehicle registration, traffic violation recording, and vehicle activity monitoring on a single blockchain platform by bringing together various entities such as Regional Transport Office (RTO), Police, and People. The proposed approach does not use any
A Novel Approach to Build Privacy and Trust in Vehicle Sales …
121
distributed storage such as IPFS for storing the vehicle registration information and instead relies on the Ethereum blockchain. In this case, the expected gas fees will be higher, and they will rise in proportion to the amount of data stored on the chain. Blockchain with the Internet of vehicles is becoming another interesting use case in the automotive domain, where Ramaguru et al. [15] proposed VAHAAN-NamChain, which is a real-time blockchain network, to provide solutions for the IoV ecosystem’s security, safety, and privacy. BigchainDB stores vehicle details as unique assets and communication between vehicles and other stakeholders. It also keeps vehicle logs using IPFS. The framework is designed on top of a peer-to-peer network layer using Libp2p. Additionally, the framework utilizes a distributed identifier to configure the vehicle’s communication with other vehicles, the Internet, roadside infrastructure, and other IoT devices in the ecosystem. It supports a native cryptocurrency token called Naanayam, equivalent to ERC-721 Tokens, for incentivizing and availing services from the service provider.
3 Proposed Work The architecture of our proposed solution is shown in Fig. 1. Our solution is built on top of the Ethereum blockchain. Ethereum is a permissionless, open-source, and publicly distributed computing platform that uses blockchain technology to support smart contracts and secure cryptocurrency transactions without the involvement of a third party. A smart contract is a self-executing and immutable computer program stored on the blockchain and executed to carry out a specific transaction or task when a specific condition is met. Separate smart contracts have been developed and deployed on the Ethereum blockchain network for each functionality, including the creation of user DIDs, vehicle registration, vehicle DID creation, and the transfer of vehicles from seller to buyer. ERC-1056 [16] is an Ethereum and DID-compliant standard for creating and updating lightweight digital identities with minimal use of blockchain resources. Compared to other Ethereum identity standards such as ERC-725, this one is scalable, GDPR compliant, and fee-free as the contract needs to be deployed onto the blockchain only once. We used ERC-1056 to create the DID for the user through the metamask wallet. Metamask is a well-known cryptocurrency wallet that also serves as a gateway to Ethereum decentralized applications (DApps). Additionally, we used JWT [17], an open standard for securely transferring information between two entities to encode the user PII and to issue credential and verify them. It has been digitally signed, trusted, and validated. The information stored in the IPFS is encrypted using the AES-256 symmetric encryption techniques to provide an additional layer of protection. Instead of storing this encrypted data on-chain, we used private IPFS to store it off-chain, which provides enhanced privacy and reduced transaction costs. Interplanetary file system (IPFS) [18] is a protocol and peer-to-peer network that facilitates the creation of a distributed file system for storing and sharing data. It aids in maintaining the version process by identifying each file in the global names-
122
A. M. Thomas et al.
Fig. 1 Architecture of our proposed solution
pace using content addressing. We designed a private IPFS network using the libp2p network instead of using the public IPFS system as the former can provide security and restrict the flow of information within the network’s peer nodes. This private IPFS helps the proposed solution meet the GDPR specifications. Finally, we used the ERC-721 standard to identify each vehicle asset as an NFT on-chain uniquely.
3.1 Creation of User DID The application generates DID using hierarchical deterministic (HD) Ethereum wallet addresses. The wallet provides the user with a secret recovery phrase comprising twelve seemingly random words, which is considered the confidential information used for wallet recovery. It is a backup phrase that restores your wallet and serves as a key generator. This secret recovery phrase should be stored securely by the user, and the same can be used to recover the private key if lost. The Ethereum wallet address will be the ERC-1056-compliant DID address, and we created the DIDs using the ethr-did library. Every buyer, seller, and issuer must create a profile before using the
A Novel Approach to Build Privacy and Trust in Vehicle Sales …
123
Fig. 2 DID created using ERC-1056 standard
Fig. 3 Console window containing user JWT details
application. If the user credentials are not available, the profile modal opens automatically, resulting in the creation of the user DID. Figure 2 shows the user DID that is generated using an Ethereum wallet address. JWT is used to encode the metadata entered by the user into a random string of characters. The header section includes “typ” and “alg” fields. The type field indicates the type of token used, which is JWT. The alg field specifies the algorithm used to validate the signature. We used ECDSAsecp256k1 (ES256K-R) to validate the signature with an appended recovery bit scheme. The recovery bit can be used to extract the signature’s public key. These JSON objects are encoded with the base64 scheme and placed within the header section. The payload portion comprises the metadata entered by the user on the application, which is also encoded using the base64 method. The base64 encoded contents of the header and payload part are combined with a secret key to form the signature component. The signature ensures the user’s authenticity and uses the private key of the metamask for signing. This signature is linked to the other contents of the JWT and protects it from tampering. The JWT containing the user’s PII is saved locally. If the information is altered, the computed signature is no longer matched. Figure 3 illustrates the details of the DID generated for the user, as well as the metadata encoded using JWT.
3.2 Creation of Vehicle DID and ERC-721 Token Table 1 displays the list of data that the user must enter into the application regarding their Vehicle. When the user clicks the submit button, the information entered is sent to the IPFS. A content identifier (CID) is generated in response. The CID contains JWT that has been signed with the user’s DID. The vehicle information entered on
124
A. M. Thomas et al.
Table 1 List of vehicle attributes Attributes VIN (token ID) Manufacturer Manufacturing year State Vehicle color Pollution certificate? Last service date
Chassis number Model Transmission type RTO code Seating capacity Insurance? Vehicle photo
Date of registration Variant Engine number Total distance driven Mileage Cases? Price
the application is embedded in the signed JWT token’s payload section. Additionally, the payload is encrypted with the AES-256 cipher block chaining (CBC) algorithm. CBC is a popular method for encrypting large messages. It can reliably encrypt large plain-text inputs and provides enhanced security by employing a multi-step encryption mechanism that is difficult to deconstruct. To encrypt the data, we used the crypto library and generated a random 256-bit key. Additionally, the payload component contains the DID address used to enter the information and the timestamp at which the signature is created. To improve the processing speed and privacy, we used a private IPFS system over the public. The private IPFS system is built using libraries such as js-ipfs and libp2p. A swarm key is generated, which is later used to identify the membership of any IPFS node to a private IPFS network. In addition, the bootstrap nodes are changed in the IPFS configuration. We validated the Vehicle Chassis Number and Engine Number fields in our application to prevent duplicate record entry since the above fields of each vehicle are unique. The transaction is divided into two steps. When the first transaction completes successfully, the contract address of the vehicle DID contract is used to generate the vehicle’s DID. The vehicle DID holds the IPFS hash, which has the signed JWT that gives the vehicle and the user DID details. Once the vehicle DID is generated, the second transaction is triggered to generate an ERC-721-compliant asset token for the vehicle using the vehicle identification number (VIN). After the two transactions, the vehicle details will be saved and made for the issuer to issue the credential. Figure 4 shows the vehicle details displayed on the application.
3.3 Credential Issuance by the Issuer After physically inspecting the vehicle and cross-checking the information provided by the user during the vehicle DID creation process, the issuer issues an off-chain credential to the user. The issuer then issues the credential to the user as JWT and updates the vehicle verification status. The JWT credential is stored in the IPFS, and
A Novel Approach to Build Privacy and Trust in Vehicle Sales …
Fig. 4 Vehicle details page
125
126
A. M. Thomas et al.
Fig. 5 Console window containing credential details
the resulting hash is stored on the blockchain. In case if the user has created a vehicle DID for a stolen vehicle during the physical inspection, the issuer will be able to identify and block the vehicle DID. The seller’s registration for the vehicle is made available only to the issuer. The vehicles shall not be made available on the application unless the VC is issued. Only verified vehicles will be made available for sale via the application portal, which will bear a verified symbol next to them. The given VC in JWT is associated with the DID of the vehicle and includes fields such as the credential verification status (CVS), issued at time (IAT), expire time stamp (ETS), and the issuer address(ISS), as shown in Fig. 5. CVS contains a numeric value between 0 and 2. The 0 value indicates that the vehicle has not been verified, 1 value indicates that the vehicle has been verified, and 2 indicates a discrepancy in the information provided. IAT specifies the time at which the credential was issued. The ETS specifies the number of days for which this credential is valid, which we set to 100. After 100 days, the user must request a new VC from the issuer. The ISS field contains the issuer’s DID address and provides additional trust to sellers by indicating that a trusted authority issued it.
3.4 Vehicle Transfer to Buyer The buyer verifies the VC issued by the issuer using the hash value stored on the blockchain. The buyer will be able to verify the authenticity of the vehicle and ownership details. The buy button enables the buyer to buy the prospective vehicle in a two-step process. The buy button initially invokes the ERC-721 contract’s transfer function, which results in transferring the vehicle token from the current owner to the new owner upon payment of the required amount in ethers as specified on the application. The second step invokes the ERC-721 contract’s setApprovalForAll function, enabling an operator to transfer the sender’s token on their behalf. This step is triggered once the buyer has been charged the required amount of ether. We have
A Novel Approach to Build Privacy and Trust in Vehicle Sales …
127
Fig. 6 Pollution issuance
also integrated the burn function of the ERC-721 standard that allows the vehicle owners to remove vehicle details from the application if they no longer wish to sell the vehicle.
3.5 Pollution Certificate Issuance We have added the feature of issuing a pollution clearance credential. Whenever the user wants to get the pollution clearance, the user has to request a clearance credential from the Pollution Control Board (PCB). By using the ERC-721 token (VIN), PCB can obtain the vehicle’s details for verification. Unlike the vehicle registration process, the pollution clearance process does not require complete vehicle information. Instead, it requires selective information disclosure such as the vehicle DID, chassis number, registration date, manufacturer, model name, variant, transmission, state of registration, RTO code, vehicle color, total distance traveled, and last service date. The pollution clearance issuer has the authority to approve or deny the clearance. Once approved, the vehicle pollution clearance status is updated, and the credential as JWT is added to the vehicle DID. Figure 6 shows the pollution clearance window of a particular vehicle.
4 Results and Discussions In this section, we are discussing the results based on our implementation of DID using ERC-1056 and comparing it with an alternate Ethereum DID standard ERC-725 [1]. We compared the storage and computational cost associated with DID creation
128
A. M. Thomas et al.
Fig. 7 Comparison chart
in terms of Gas used. Figure 7 shows the average cost in gas between two standards, ERC-1056 and ERC-725, for creating DID. ERC-1056 used 55165.2 gas, whereas ERC-725 used 3104140.6 gas for an average of five transactions, which is approximately 55–60 times more than the former standard. Another notable aspect is that the contract’s creator has already deployed ERC1056 on the Ethereum Testnet, so application developers are not required to do so. This contrasts ERC-725, which requires the developer to create a contract that uniquely identifies each identity account deployed, incurring additional costs. According to our analysis, ERC-1056 is more cost-effective than ERC-725 for the Ethereum blockchain in terms of gas consumption at the time of writing. Another critical point being discussed is the VC. While the on-chain verification provides immutability, it may raise privacy concerns and impose a cost. On the other hand, off-chain verification is relatively cheap and ensures data privacy and immutability by referencing the hash value of off-chain data on-chain. The application use case requirement determines whether an on-chain or off-chain verification is appropriate. The use of VC in conjunction with the DID created for both user and vehicle ensures the reliability of the vehicle associated with the owner during off-chain claim verification. The VC provides information about the vehicle’s DID, which is then linked to the user’s DID.
5 Conclusion This paper describes a peer-to-peer distributed vehicle management application that makes use of DIDs and VC. We discussed issues and challenges in IM, such as the role of intermediaries, data infringements, and mutability that occur in a traditional
A Novel Approach to Build Privacy and Trust in Vehicle Sales …
129
centralized system, and how blockchain technology can be used to address these issues. We have demonstrated through our proposal that DIDs can give users control and ownership over their data. ERC-1056, a lightweight standard, provides DID functionality lower than ERC-725. Nonetheless, this lower cost of DID creation is unaffordable due to the Ethereum blockchain’s high gas price. However, Ethereum is on the verge of launching Ethereum 2.0. Ethereum 2.0 is a set of interconnected upgrades to the network that aims to make it more scalable, secure, and sustainable by migrating from the proof of work (PoW) to the proof of stake (PoS) consensus model. The newer platform will now be capable of processing more transactions at an affordable gas price. DID is critical in applications that deal with physical assets. Combining NFT standards such as ERC-721 enables precise mapping between the DID and the NFTs it holds. Adhering to regulatory privacy requirements, utilizing a private IPFS, and encrypting stored data enhances information security and enables the overall application to comply with GDPR requirements. Our findings indicate that the addition of DID and VC improves the trust, privacy, and scalability of a peerto-peer application built on Ethereum. The results show that the proposed solution offers a more seamless and rapid user experience than a centralized vehicle buying and selling application. The prototype can be expanded by incorporating other vehicle lifecycle management modules such as insurance, vehicle registration at the regional transport office, and motor vehicles department.
References 1. Thomas AM, Ramaguru R, Sethumadhavan M (2022) Distributed identity and verifiable claims using Ethereum standards. In: Ranganathan G, Fernando X, Shi F (eds) Inventive communication and computational technologies. Lecture notes in networks and systems, vol 311. Springer, Singapore. https://doi.org/10.1007/978-981-16-5529-6_48 2. Avellaneda O et al (2019) Decentralized identity: where did it come from and where is it going? IEEE Commun Stand Mag 3(4):10–13. https://doi.org/10.1109/MCOMSTD.2019.9031542. December 3. Anjum S, Ramaguru R, Sethumadhavan M (2021) Medical records management using distributed ledger and storage. In: Singh M, Tyagi V, Gupta PK, Flusser J, Ören T, Sonawane VR (eds) Advances in computing and data sciences. ICACDS 2021. Communications in computer and information science, vol 1441. Springer, Cham. https://doi.org/10.1007/978-3-03088244-0_6 4. Sivadanam YL, Ramaguru R, Sethumadhavan M (2022) Distributed ledger framework for an adaptive university management system. In: Chaki N, Devarakonda N, Cortesi A, Seetha H (eds) Proceedings of international conference on computational intelligence and data engineering. Lecture notes on data engineering and communications technologies, vol 99. Springer, Singapore. https://doi.org/10.1007/978-981-16-7182-1_24 5. Ethereum. ERC-1484: Digital Identity Aggregator. Issue #1495 Ethereum/EIPs. GitHub, https://github.com/ethereum/EIPs/issues/1495. Last accessed: 20 Aug 2022 6. Ethereum. ERC-1207 DAuth Access Delegation Standard. Issue #1207. Ethereum/EIPs. GitHub, https://github.com/ethereum/EIPs/issues/1207. Last accessed: 20 Aug 2022 7. Admin. Managing the lifecycle of a car with blockchain technology. Cardossier, 11 Aug 2022. https://cardossier.ch
130
A. M. Thomas et al.
8. Aswathy SV, Lakshmy KV (2019) BVD—a blockchain based vehicle database system. In: Thampi S, Madria S, Wang G, Rawat D, Alcaraz Calero J (eds) Security in computing and communications. SSCC 2018. Communications in computer and information science, vol 969. Springer, Singapore. https://doi.org/10.1007/978-981-13-5826-5_16 9. Mahankali S (2020) Blockchain for non IT professionals: an example driven, metaphorical approach. Notion Press 10. NamChain-Open-Initiative-Research-Lab. GitHub—NamChain-open-initiative-researchlab/non-fungible-tokens: a detailed study and research on NFTs trends, marketplace, governance and future. GitHub, https://github.com/NamChain-Open-Initiative-ResearchLab/Non-Fungible-Tokens. Last accessed: 20 Aug 2022 11. Eisenstadt M, Ramachandran M, Chowdhury N, Third A, Domingue J (2020) COVID-19 antibody test/vaccination certification: there’s an app for that. IEEE Open J Eng Med Biol 1:148–155. https://doi.org/10.1109/OJEMB.2020.2999214 12. Lagutin D et al (2019) Enabling decentralised identifiers and verifiable credentials for constrained IoT devices using OAuth-based delegation. In: Proceedings of the workshop on decentralized IoT systems and security (DISS 2019), in conjunction with the NDSS symposium, vol 24. San Diego, CA, USA. https://doi.org/10.14722/diss.2019.230005 13. Syed TA, Siddique MS, Nadeem A, Alzahrani A, Jan S, Khattak MAK (2020) A novel blockchain-based framework for vehicle life cycle tracking: an end-to-end solution. IEEE Access 8:111042–111063. https://doi.org/10.1109/ACCESS.2020.3002170 14. Valaštín V, Košt’ál K, Bencel R, Kotuliak I (2019) Blockchain based car-sharing platform. Int Symp ELMAR 2019:5–8. https://doi.org/10.1109/ELMAR.2019.8918650 15. Ramaguru R, Sindhu M, Sethumadhavan M (2019) Blockchain for the internet of vehicles. In: Singh M, Gupta P, Tyagi V, Flusser J, Ören T, Kashyap R (eds) Advances in computing and data sciences. ICACDS 2019. Communications in computer and information science, vol 1045. Springer, Singapore. https://doi.org/10.1007/978-981-13-9939-8_37 16. Braendgaard P, Torstensson J (2018) EIP-1056: Ethereum lightweight identity [DRAFT]. Ethereum improvement proposals, https://eips.ethereum.org/EIPS/eip-1056 17. Auth0.com. JWT.IO—JSON web tokens introduction. JSON Web Tokens—Jwt.Io, https://jwt. io/introduction. Last accessed: 20 Aug 2022 18. IPFS Powers the Distributed Web. IPFS powers the distributed web, https://ipfs.io/#why. Last accessed: 20 Aug 2022
Bipolar Disease Data Prediction Using Adaptive Structure Convolutional Neuron Classifier Using Deep Learning M. Ramkumar, P. Shanmugaraja, B. Dhiyanesh, G. Kiruthiga, V. Anusuya, and B. J. Bejoy
Abstract The symptoms of bipolar disorder include extreme mood swings. It is the most common mental health disorder and is often overlooked in all age groups. Bipolar disorder is often inherited, but not all siblings in a family will have bipolar disorder. In recent years, bipolar disorder has been characterised by unsatisfactory clinical diagnosis and treatment. Relapse rates and misdiagnosis are persistent problems with the disease. Bipolar disorder has yet to be precisely determined. To overcome this issue , the proposed work Adaptive Structure Convolutional Neuron Classifier (ASCNC) method to identify bipolar disorder. The Imbalanced Subclass Feature Filtering (ISF2) for visualising bipolar data was originally intended to extract and communicate meaningful information from complex bipolar datasets in order to predict and improve day-to-day analytics. Using the Scaled Features Chi-square Testing (SFCsT), extract the maximum dimensional features in the bipolar dataset and assign weights. In order to select features that have the largest Chi-square score, the Chi-square value for each feature should be calculated between it and the target. M. Ramkumar CSBS, Knowledge Institute of Technology, Salem, India e-mail: [email protected] P. Shanmugaraja IT, Sona College of Technology, Salem, India e-mail: [email protected] B. Dhiyanesh (B) CSE, Dr. N.G.P. Institute of Technology, Coimbatore, India e-mail: [email protected] G. Kiruthiga CSE, IES College of Engineering, Thrissur, India e-mail: [email protected] V. Anusuya CSE, Ramco Institute of Technology, Rajapalayam, India e-mail: [email protected] B. J. Bejoy CSE, CHRIST (Deemed to be University), Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_10
131
132
M. Ramkumar et al.
Before extracting features for the training and testing method, evaluate the Softmax neural activation function to compute the average weight of the features before the feature weights. Diagnostic criteria for bipolar disorder are discussed as an assessment strategy that helps diagnose the disorder. It then discusses appropriate treatments for children and their families. Finally, it presents some conclusions about managing people with bipolar disorder. Keywords Bipolar disorder · Neural network · Features testing · Feature · Weights · Visualise data · Chi-square score
1 Introduction Bipolar disorder, also known as an affective disorder, is a recurring pathological disorder characterised by mood swings ranging from outrageous tumult or craziness to extreme sadness. It is frequently associated with thoughts, disturbing social influences, and continuous psychosis with clinical features. Essential epidemiological overview bipolar information shows that the prevalence of BD is expanding year over year, yet the general acknowledgement rate and treatment pace of the infection are still low. Patients require a long time between their most memorable side effects and a conventional diagnosis. To start with, despite the fact that the particular aetiology of BD is hazy, it is an intricate illness impacted by ecological and hereditary elements. Second, current treatment strategies for BD have a few constraints, and the misdiagnosis rate is generally tremendous. The determination and order of BD have forever been a hotly debated issue in research on this illness, and there are numerous contentions. The current phenotypic meanings of bipolar issues depend exclusively on clinical elements and high misdiagnosis rates. This approach needs to approve indicative tests for the overwhelming majority of actual problems, giving another point of view. BD ought to be examined. Dissimilar to conventional AI strategies that require obvious highlights, profound learning can take advantage of deep organisation models to separate elements and advance. Machine learning has been generally utilised in picture recognition, discourse recognition, normal language handling, and other fields and has accomplished astounding outcomes. Because of its intense component extraction, complex model age, and picture-handling capacities, it has likewise been slowly used to break down monstrous biomedical bipolar information. For BD, the scientists utilised a support vector machine (SVM) to foster a characterisation model in light of neuroimaging bipolar information to recognise bipolar confusion and significant burdensome problems. An examination group from Taiwan utilised irregular woodlands to facilitate a hereditary gamble expectation model for bipolar turmoil and to foresee the risk of sickness. However, the results of concentrating on bipolar issues by utilising brain networks combined with hereditary bipolar information are as yet unwritten. There is just a
Bipolar Disease Data Prediction Using Adaptive Structure …
133
single ASCNC-based BD characterisation model. In light of the cutting-edge hereditary qualities of BD and the utilisation of AI and profound learning in BD research, SFCsT screens simple examples that encode genetic samples mathematically rather than cardinally and total them. We demonstrated utilising versatile brain organisations to recognise bipolar confusion cases in solid subjects. This is a hereditary-level investigation of variables related to the phenotypic side effects of bipolar problems. We hope that our review will spark new ideas and improve the precision of existing BD treatments and detection techniques.
2 Literature Survey In this volume, we’ll take a closer look at how diagnosed authors discuss bipolar disorder (BD) and how it can help society and future research. The bipolar data used in the following methods is given below to calculate the registered details of the patients. Lee et al. [1], By using bipolar data science techniques, the author proposes that BD-I and BD-II can be classified more accurately by identifying SNPs with little impact on their classification. Developing a set of complementary diagnostic classifiers improves the diagnostic process. Accuracy is low due to a lack of focus on feature selection. Demir et al. [2], bipolar disorder has unique characteristics of cerebral white matter that this author intends to identify. Like the local diffusion method, this method detects and reports changes in white matter regions. Advances in early diagnosis and prospective diagnostic studies through imaging procedures are necessary. Zhang et al. [3], describe claims to present a highly effective isolated contrast model based on a whole-brain and machine learning “higher-order functional connectivity (HOFC)” network. HOFC captures non-permanent coherence in a dynamic FC time series. Liu et al. [4], In order to classify neuropsychiatric disorders, an ensemble hybrid feature selection approach is proposed. It is a robust method that includes DenseNet and an XGBoost approach. It extracts phenotypic features from phenotypic recordings and features images from MRI images. Büttenbender et al. [5] define the Eigen-routines method. It is based on concepts such as the accessibility of mobile devices, identifying trends in learning usage, and creating opportunities for managing neuropsychiatric disorders. Sakthivel [6], It consists of three steps. It determines the mechanism, calculates the extraction, and calculates the limits. Baki et al. [7], A multimodal inference system has been developed based on acoustic, linguistic, and visual patterns found in patient charts for mania classification, a technique introduced to the current community assessed in the Turkish Bipolar Disorder Corpus. Hong et al. [8], describe developing a method that uses information from healthy controls, the facial expressions of bipolar disorder (BD) patients, and unipolar
134
M. Ramkumar et al.
depression (UD) patients revealed through video clips to implement mood disorder classification. Criscuolo et al. [9], describe developing a painless technique for therapeutic drug monitoring (TDM) in sweat in individuals with BD on lithium. Kasthuripriya [10], We have created a wearable electrochemical detection stage by combining a paper fluid with a decent reference electrode (RE). Ranjan et al. [11], This technique is called cerebellar transcranial direct current stimulation (CTDCS). It facilitates willful development when the cerebellum is practically related. Postural control has been tweaked in permanently injured stroke survivors with CTDCS of the dentate core and the lobules VII-IX. Abaeikoupaei et al. [12], present a layered outfit classifier technique to characterise all consolidated elements after highlight choice. Convolutional neural networks (CNNs) and multilayer perception (MLP) carry out the first and second phases of a comparable three-layer group classifier separately. By utilising support learning, networks and their higher standards can be further developed. Liu et al. [13] propose Allele-Skewed DNA modification (ASM-SNP) information. It can utilise complex multiplex learning, bipolar information driven by highlight determination, and novel pathway examination to explore mental problems. Sakthivel [14], Bipolar Information Driven Component Choice (DDFS) Calculation because of Negative Solitary Worth Estimation (NSVA) Figures New Multivariate Learning Examination. Jie et al. [15], The purpose of this study is to develop a support vector machinebased feature selection method based on an SVM-FoBa strategy for selecting features for healthy control structures. Palmius et al. [16], state that records collected from participants and healthy controls (HC) can be identified by anonymised geographic location. The aim was to identify mental fatigue in a prospective community study of individuals with bipolar disorder (BD) using geospatial movements recorded from mobile phones. Zhang et al. [17], present a novel patient-specific method for predicting seizures in epilepsy patients based on one or two single- or bipolar-channel intracranial or scalp electroencephalography (EEG) recordings. Sakthivel [18], From the extracted spectral power features, their ratios are calculated. Matsubara et al. [19] propose an in-depth neurodevelopmental model based on resting-state FMRI bipolar data. According to Bayes’ rule, we can estimate the posterior probabilities of the position of an object from imaging bipolar data. The role of the object is determined according to the hypothesis. Arribas et al. [20], describe and define an automatic and reliable classification method. This method uses brain imaging data for bipolar disorder to monitor patients with schizophrenia and healthy controls with bipolar disorder. However, locating the affected one takes a long time. Amiriparian et al. [21], In an experiment, the CapsNet technique was used to classify patients with BD following a hypermanic episode into three categories: reduction, hypomania, and lunacy. Explicit impediments of convolutional neural networks (CNNs) are applied to epitomise the impacts by recalling the key spatial hierarchies (KSH) between separated pictures in sound bipolar information.
Bipolar Disease Data Prediction Using Adaptive Structure …
135
Valenza et al. [22], The author proposed identifying four possible clinical mood states among bipolar patients by using advanced biosignal processing methods and ad hoc methodologies. Alimardani et al. [23], designed a steady-state visual evoked potential (SSVEP) plot. It can be used to investigate whether or not EEGs can be used to classify patients with BD and schizophrenic disorders at specific frequencies induced by external stimulation. Rosa et al. [24], The electrocardiogram (ECG) channel, the galvanic skin response (GSR) channel, the temperature channel, and the biomotion detection (BMD) channel are described as low-power and flexible. Through psychiatric assessment, it is possible to quantify the daily functioning of mentally ill patients.
3 Proposed Method Initially, inputting the bipolar disease, the bipolar dataset was for analysis to find the disorders accurately. It started with the preprocessing stage to reduce the irrelevant bipolar dataset and extract the features from the preprocessing step based on its maximum number of targets and closest score. Patients’ EEGs were recorded using modulated visual incentives at specific frequencies to see if they could be used to classify bipolar disorder (BD). It evaluated the results using the Softmax activation function in the neural network. Finally, classification using an Adaptive Structure Convolutional Neural Network (ASCNC) improves accuracy and predicts results better than previous approaches. The proposed architecture diagram is depicted in Fig. 1; initially, the bipolar dataset values are analysed in preprocessing for filtering the irrelevant bipolar data using ISF2 and extracting the features based on the most feature weights and the closest bipolar data values. Before classification, evaluate the training and test the bipolar data inserted in the Softmax neural activation function to improve accuracy. Finally, we classified the results for predicting the disease based on the ASCNC and improved the classification performance.
3.1 Bipolar Data Preprocessing Using ISF2 The application of bipolar data preprocessing techniques improved the predictive accuracy of the weak classifier and showed satisfactory performance in determining the risk of bipolar disorder. Bipolar data preprocessing is a set of strategies to enhance bipolar data quality, such as missing value handling, feature type transformation, etc. Imputation of Missing Values by ISF2: Known for finding the most similar cases in a bipolar dataset and computing distances or similarities to impute missing values.
136
M. Ramkumar et al.
Fig. 1 Proposed architecture
[ | s |∑ si (x, y) = | (xa − ya )2
(1)
n=1
where x a is a number of values, ya is predicted values, si is similarity values, and some values remove irrelevant values. Algorithm Algorithm steps ISF2 For each feature training set F (x1, x2…, xn) Input the parameters a = ∅, b = ∅ Selected Number of features Compute max is weight constant Examine the closest feature from feature set a Selective For each Bipolar dataset fi from a Feature Term x = Extracted decision node from Di ∑size(fs) ISF2 = ( n=1 feature value e fi + a) End for Repeat Find the nearest relative feature weight from the max weight Get the class (fi) Average weight split function according to class weights Reduce irrelevant values
Bipolar Disease Data Prediction Using Adaptive Structure …
137
The above methodology was developed to analyse the significance and relevance of the results. Preference studies can be interpreted as either positive or negative outcome categories.
3.2 Scaled Features Chi-Square Testing (SFCsT) Feature extraction is performed using two types of bipolar data: time and frequency. Issue: Bipolar data were derived directly from a time-dependent depression zone. Bipolar data was collected from subject activity via actigraphy. On the other hand, frequency bipolar data is obtained by the Fourier transform of time-dependent depression bipolar data. Feature selection eliminates irrelevant and noisy bipolar data and selects a representative subset of all bipolar data to reduce the complexity of the classification process. Although the Scaled Features Chi-square testing (SFCsT) method shows good results, it still has some limitations. If SFCsT selects the top 20 train sets, the number of features in each category will change accordingly. The SFCsT statistical formula is closely related to the feature selection function in feature information ( f z ). SFCsT( f z ) =
N (x − y)2 (x + y)
(2)
where, N-Number of features, x-number of terms in features, y-number of values. Algorithm Steps Chi-square Testing values for the feature and class Chi-square Testing = ϕ For x = 1 to N Do Chi-square Testing = Terms (fi, Ti) Join (i, Chi_terms) to Testing features End for Store first feature values Chi_terms to choose Return IF values! = Null then While length = choose (0) Add the features to Chi_terms For f = 0 to length to find do Take features f0 to fn from row to combine If f0 is not in Fchoose do Join the f0 to Fchoose Delete f0 from Choose End if End for Return
Where, from f0 to fn, features count, the chi-square method computes the chi-square value between each f [I] feature. It provides an ordered list of selected feature codes.
138
M. Ramkumar et al.
The modified chi-square algorithm uses fChoose in the first step, as in the original method. Then, from fChoose , select the first feature code. The next step is to find the first feature from the combine. It is saved in fChoose and then removed.
3.3 Softmax Neural Activation Function The softmax neural activation function used in neural computing is the activation function. It can be used to compute probability distributions from real vectors. The Softmax function outputs a value between 0 and 1, where the sum of probabilities is 1. To perform and compute many classification tasks, Softmax is applied to the outputs (0, 1) of a large number of neurons. exp(ax ) F(ax ) = ∑ y exp(a x )
(3)
The softmax function is used in multiclass models to return the probability of each class, with the target class having the highest chance. The Softmax function is primarily used in almost all deep learning frameworks’ output layers. It is used for binary classification tasks, while Softmax is used for disease prediction classification tasks. ) ( ( x) ) ( i exp θnx a x x ) ( softmax a = X a = j|a , θ = ∑n (4) x x i=0 exp θn a where softmax (ax ) is the value for features, θ is the classifiers’ parameters. We evaluate the features’ values based on the maximum and a close number of weights. Each neuron calculates the terms of the features by evaluating the weights.
3.4 Classification Based on Adaptive Structure Convolutional Neuron Classifier (ASCNC) Classification is the process of identifying the class to which a new observation belongs based on a training bipolar dataset. Adaptive Structure Convolutional Neuron Classifier (ASCNC) an ASCNC model for facial feature extraction from bipolar data values. To address the BD condition, add a feature sequence to the classification model. Seven layers, including input, convolution, pooling, and activation functions, are constructed during ASCNC training. Weights and biases are updated at each layer, and the modified parameters are used as input to the next layer. As a final step, a 6:2:2 ratio of training, validation, and test sets was applied to the bipolar dataset, with each group containing the same number of BDs bipolar data samples.
Bipolar Disease Data Prediction Using Adaptive Structure …
139
Algorithm Steps Input: Bipolar disease Bipolar dataset ds Output: Prediction accuracy Pa Start Read ds Pa Initialize neural network ASCNC For all layers (1) For all neurons (N) Evaluation feature score Fs Estimate each features weight End End For each feature (F) Estimate neuron score Estimate weighted measure End Accuracy (A) = Choose the best features with a higher value Generate prediction accuracy Stop
The proposed ASCNC-based prediction generation algorithm computes the feature score for different feature values. Further, the method computes the weighted score with the conversational neural score.
4 Result and Discussion In discussing the results, an algorithm using ASCNC and Keras tensors is proposed and implemented, giving it better flow properties than previous algorithms. The main aim is to find a classification for bipolar disorder. Table 1 shows that the simulation parameter tool using Anaconda and Python uses the total bipolar dataset of 1000 with 12 features and that training is 700 and testing is 300. Table 1 Simulation parameter tool
Limitation
Values
Simulation tool
Parameter
Using tool
Anaconda
Language
Python
Number of bipolar data
1000
Training bipolar data
700
Testing bipolar data
300
140
M. Ramkumar et al.
Figure 2 shows the accuracy analysis for the proposed Adaptive Structure Convolutional Neuron Classifier (ASCNC) algorithm. The existing system, the convolutional neural network (CNN), predicts 52% of the time, the multilayer perceptron (MLP), 64% of the time, and the proposed algorithm, the Adaptive Structure Convolutional Neuron Classifier (ASCNC), predicts 95% of the time. Figure 3 shows the analysis of prediction level performance in the proposed algorithm, Adaptive Structure Convolutional Neuron Classifier (ASCNC). The existing system, the convolutional neural network (CNN), predicts 76% of the time, the multilayer perceptron (MLP), 84% of the time, and the proposed algorithm, the Adaptive Structure Convolutional Neuron Classifier (ASCNC), predicts 90% of the time. Figure 4 shows the analysis of different level performances in the proposed algorithm, Adaptive Structure Convolutional Neuron Classifier (ASCNC). The existing Analysis of Accuracy Performance
Accuracy in %
100 80 60 40 20 0 100 SVM
200 No. of data
300
MLP
ASCNC
Fig. 2 Analysis of accuracy performance
Analysis of Prediction Performance Prediction in %
100 80 60 40 20 0 100 SVM Fig. 3 Analysis of prediction performance
200 No.of data MLP
300 ASCNC
Bipolar Disease Data Prediction Using Adaptive Structure …
Performance in %
100
141
Analysis the Performance of Precison, Recall and FMeasure
80 60 40 20 0 Sensitivity
Specificity
F-Measure
No.of data SVM
MLP
ASCNC
Fig. 4 Analysis of precision recall and F-measure performance
Time Complexity Performance 60 Time in (ms)
50 40 30 20 10 0 100 CNN
200 No.of data MLP
300 ASCNC
Fig. 5 Attacking time complexity
convolutional neural network (CNN) study has a sensitivity score of 55%, a specificity of 51%, and an F-measure of 60%. MLP has a sensitivity score of 76%, a specificity score of 70%, and an F-measure of 66%. Then the proposed algorithm, Adaptive Structure Convolutional Neuron Classifier (ASCNC), has a sensitivity score of 87%, a specificity score of 80%, and an F-measurement score of 71%. Figure 5 shows the time complexity performance of the proposed Adaptive Structure Convolutional Neuron Classifier (ASCNC). The existing system, the convolutional neural network (CNN), takes 35 ms, while the multilayer perceptron (MLP) takes 30 ms. Then the proposed algorithm, Adaptive Structure Convolutional Neuron Classifier (ASCNC), has the lowest 25 ms time performance.
142
M. Ramkumar et al.
5 Conclusion Active neural networks can be used to differentiate between BD patients and healthy individuals using molecular genetic markers from genome-wide association analyses, making genetic modelling a suitable image format for the ASCNC. Simulate binary data processing to reduce complexity. Among all the models tested, we developed the most accurate model at 92%. The SFCsT method is effective as a method for feature selection based on experimental results. On the test set, the model’s accuracy is generally low, indicating that further improvement is needed to improve its generalisation capacity. Through the use of several methods, such as normalisation, dropouts, and binomial data imputation, this model minimises the risk of overfitting. Physical challenges associated with bipolar disorder diagnosis may necessitate greater precision. Different models have been developed to model complex diseases because labels assigned to patterns do not always reflect the true complexity of a disease. Despite the fact that bipolar disorder is a complex disorder that is influenced by genetic and environmental factors, we used only bipolar disorder data in our study.
References 1. Lee C-Y, Zeng J-H, Lee S-Y, Lu R-B, Kuo P-H (2021) SNP bipolar data science for classification of bipolar disorder I and bipolar disorder II. IEEE/ACM Trans Comput Biol Bioinf 18(6):2862– 2869 2. Demir A, Özkan M, Ulu˘g AM (2021) A macro-structural dispersion characteristic of brain white matter and its application to bipolar disorder. IEEE Trans Biomed Eng 68(2):428–435 3. Zhang H et al (2022) Divergent and convergent imaging markers between bipolar and unipolar depression based on machine learning. IEEE J Biomed Health Inform 26(8):4100–4110 4. Liu L, Tang S, Wu F-X, Wang Y-P, Wang J (2022) An ensemble hybrid feature selection method for neuropsychiatric disorder classification. IEEE/ACM Trans Comput Biol Bioinf 19(3):1459–1471 5. Sakthivel S (2016) UBP-trust: user behavioral pattern based secure trust model for mitigating denial of service attacks in software as a service (SaaS) cloud environment. J Comput Theoretical Nanosci 13(10):7649–7654(6). 6. Büttenbender PC, Neto EGA, Heckler WF, Barbosa JLV (2022) A computational model for identifying behavioral patterns in people with neuropsychiatric disorders. IEEE Latin Am Trans 20(4):582–589 7. Baki P, Kaya H, Çiftçi E, Güleç H, Salah AA (2022) A multimodal approach for mania level prediction in bipolar disorder. IEEE Trans Affective Comput 1–13 8. Hong Q-B, Wu C-H, Su M-H, Chang C-C (2021) Exploring macroscopic and microscopic fluctuations of elicited facial expressions for mood disorder classification. IEEE Trans Affect Comput 12(4):989–1001 9. Kasthuripriya S et al (2216) LFTSM-local flow trust-based service monitoring approach for preventing the packet during data transfer in cloud. Asian J Inf Technol 15(20):3927–3931 10. Criscuolo F, Cantù F, Taurino I, Carrara S, De Micheli G (2021) A wearable electrochemical sensing system for non-invasive monitoring of lithium drug in bipolar disorder. IEEE Sens J 21(8):9649–9656 11. Ranjan S, Rezaee Z, Dutta A, Lahiri U (2021) Feasibility of cerebellar transcranial direct current stimulation to facilitate goal-directed weight shifting in chronic post-stroke hemiplegics. IEEE Trans Neural Syst Rehabil Eng 29:2203–2210
Bipolar Disease Data Prediction Using Adaptive Structure …
143
12. Abaeikoupaei N, Al Osman H (2020) A multi-modal stacked ensemble model for bipolar disorder classification. IEEE Trans Affect Comput 1–10 13. Sakthivel S (2016) F2C: a novel distributed denial of service attack mitigation model for SaaS cloud environment. Asian J Res Soc Sci Human 6(6):192–203 14. Liu W, Li D, Han H (2020) Manifold learning analysis for allele-skewed DNA modification SNPs for psychiatric disorders. IEEE Access 8:33023–33038 15. Jie N-F et al (2015) Discriminating bipolar disorder from major depression based on SVMFoBa: efficient feature selection with multimodal brain imaging bipolar data. IEEE Trans Auton Ment Dev 7(4):320–331 16. Palmius N et al (2017) Detecting bipolar depression from geographic location bipolar data. IEEE Trans Biomed Eng 64(8):1761–1771 17. Sakthivel S (2015) Secure data storage auditing service using third party auditor in cloud computing. Int J Appl Eng Res 10(37) 18. Zhang Z, Parhi KK (2016) Low-complexity seizure prediction from iEEG/sEEG using spectral power and ratios of spectral power. IEEE Trans Biomed Circuits Syst 10(3):693–706 19. Matsubara T, Tashiro T, Uehara K (2019) Deep neural generative model of functional MRI images for psychiatric disorder diagnosis. IEEE Trans Biomed Eng 66(10):2768–2779 20. Arribas JI, Calhoun VD, Adali T (2010) Automatic Bayesian classification of healthy controls, bipolar disorder, and schizophrenia using intrinsic connectivity maps from fMRI bipolar data. IEEE Trans Biomed Eng 57(12):2850–2860 21. Amiriparian S et al (2019) Audio-based recognition of bipolar disorder utilising capsule networks. In: 2019 international joint conference on neural networks (IJCNN), pp 1–7 22. Valenza G et al (2014) Wearable monitoring for mood recognition in bipolar disorder based on history-dependent long-term heart rate variability analysis. IEEE J Biomed Health Inform 18(5):1625–1635 23. Alimardani F, Cho J-H, Boostani R, Hwang H-J (2018) Classification of bipolar disorder and schizophrenia using steady-state visual evoked potential based features. IEEE Access 6:40379–40388 24. Rosa BMG, Yang GZ (2019) A flexible wearable device for measurement of cardiac, electrodermal, and motion parameters in mental healthcare applications. IEEE J Biomed Health Inform 23(6):2276–2285
Oceanographic and Hydrological Study of the Moroccan Atlantic Coast: Focus on the Upwelling Kamal Hammou Ali, Aziz Bourass, Khalid Fariri, Jalal Ettaki, Thami Hraira, Khalid Doumi, Sima Boulebatt, Manal Maaroufi, Ahmed Talbaoui, Ali Srairi, Abdelmajid Dridi, Khadija Elkharrim, and Driss Belghyti
Abstract The extent of its maritime coastline of Morocco is about 3500 km. The Moroccan coast areas are known by the upwelling of cold waters rich in nutritive salts or upwelling phenomenon. The different informations from the sea surface temperatures and the coastal upwelling index show a relatively strong upwelling activity in the four upwelling areas of the Atlantic coast. Considering the period 2002– 2014, the upwelling activity showed a decreasing trend during the years 2004–2007 and 2009–2010 and increasing in 2011 and 2012. In terms of upwelling activity, the year 2014 is considered as a relatively intense year. On the Atlantic coast, upwelling is seasonal in the north of 26°30' N and occurs between March and August. South of this area and up to 20°N, it is almost permanent with a maximum of activity during spring. The upwelling is a highly productive system, rich in organic matter and nutrients, which develops an important biomass. Thus, 80% of the national catches are constituted by pelagic resources fished in these areas. Keyword Ocean · Atlantic · Temperature · Salinity · Oxygen · Upwelling · Morocco
K. H. Ali · A. Bourass · K. Fariri · J. Ettaki · T. Hraira · K. Doumi · S. Boulebatt · M. Maaroufi · K. Elkharrim · D. Belghyti Department of Biology, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco A. Talbaoui (B) Department of Biology, Faculty of Sciences Rabat, Rabat, Morocco e-mail: [email protected] A. Srairi · A. Dridi I. N. R. H, Bd Sidi Abderrahmane Ain Diab Casablanca, Casablanca, Morocco e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_11
145
146
K. H. Ali et al.
1 Introduction Morocco, by the extent of its maritime coastline of about 3500 km, its geographical position is characterized by a long mountainous strip running from east to west and then sinking in the south in the sand until the borders with Mauritania. The Moroccan coastline includes, according to the extensive definition, about 66,000 km2 of territorial waters and 1.1 Mkm2 of exclusive maritime economic zone. The continental part of the coastline represents an area equivalent to 1/7 of the territory. Its population reaches the third of the country’s population. The Moroccan urban population dominates largely and is concentrated on the coastal strip. On the Atlantic coastline are located the two political and economic capitals of the country and a set of cities with commercial, industrial, tourist and service functions of the highest order. The neuralgic zone of the country is located in the central part of this coast, from El Jadida to Kenitra. It is characterized by the resurgence of deep waters which ensures a considerable contribution of mineral elements to promote photosynthesis in surface waters and subsequently a production of the food web. The hydrology of the Moroccan Atlantic coast is described in [1] who was the first author to have treated particularly, the phenomenon of cold-water resurgence. More recent oceanographic studies [2–11] have shown in a permanent coastal upwelling phenomenon which is significantly present in this locality. Latitudinal, seasonal variability and intensity increase regularly from north to south and are closely related to the direction and intensity of trade winds.
2 Study Area The coastal upwelling phenomenon is one of the main characteristics of the Moroccan Atlantic zone (Fig. 1). In addition, this phenomenon is a result of ocean currents and winds [12–15].
3 Methodology of Study The oceanographic study is carried out with by collaboration INRH and Laboratory of Oceanology and Marine Biology of the Faculty of Science of Kenitra. This study ensures a follow-up of the seasonal temperature and salinity during campaigns carried out in spring–summer and in autumn. During these campaigns, the sampling network of oceanographic stations is carried out according to the course of the research vessels, following radials perpendicular to the coast.
Oceanographic and Hydrological Study of the Moroccan Atlantic Coast …
147
Fig. 1 Moroccan map and geographical position of Atlantic coast
The oceanographic parameters collected are: temperature, salinity, dissolved oxygen, turbidity, and fluorescence. The measurements are made by a CTD911 multisonde brand “Sea Bird Electronic” on board the research vessel, “Al Amir Moulay Abdellah”, “Atlantida”, and by a CTD multisonde brand “RINKO”. The area studied is considered as one of the four areas distinguished worldwide, by the phenomenon of upwelling of cold waters, which makes them rich in fishery resources [9].
4 Results and Discussion The marine environment is characterized by the superposition of two interacting systems. One of the surface systems, very productive and rich in organic matter due to its permanent exposure to light (photosynthesis), exports its excess production by sedimentation to the other system located below, rich in mineral matter due to the
148
K. H. Ali et al.
mineralization process. The phenomenon of upwelling is expressed by a sequence of physico-chemical and biological processes. The physico-chemical parameters (such as temperature, salinity, nutrient salts) and planktonic components (phytoplankton, zooplankton, and ichthyofauna) play a decisive role by their role in the dynamics of the marine environment, the fertilization of the environment in the structuring of trophic chains, the interaction with the climate, and the regulation of biogeochemical cycles. A coastal upwelling is defined as a dynamic system which, under the action of the wind, creates a vertical upward flow at the coast. This flow originates along the continental slope and is directed toward the surface. It brings on the continental shelf waters of superficial origin which are cold and rich in nutritive elements. It is the nutrient salts brought in the euphotic layer that will allow the development and maintenance of a strong biological production in the coastal zone. This productive potential is much greater than that existing in oceanic zones where most of the nutrient salts come from the regeneration of organic matter [16]. Figure 2 shows that the phenomenon studied is significantly present in south of Morocco 21–34° N.
Fig. 2 Locality of upwelling in Morocco
Oceanographic and Hydrological Study of the Moroccan Atlantic Coast …
149
4.1 Hydrology and Variation of the Surface Water Masses’ Temperature The surface water masses a high spatiotemporal evolution of the temperatures and strong activity of the upwelling. The phenomenon studied is particularly pronounced in summer (Figs. 3 and 4). From one station to another, the thermal differences are of the order of 3.5 °C. These describe a thermal gradient of 17–21 °C coast to coast. This gradient persists in November with a thermal difference reduced to 2 °C. It reflects the persistence of the upwelling throughout the year. North of Dakhla, the thermal variations in the coastal–offshore direction are not as significant as in the southern part. During 2009, the radials in the northern part Fig. 3 Hovmöller diagram representing the upwelling index [10]
150
K. H. Ali et al.
Fig. 4 Variation of surface temperatures during the period June 2007
did not show significant variations. In November 2009, offshore temperatures are similar to coastal temperatures. In the spring–summer season during 2014, the surface distributions of temperature and salinity indicate a horizontal stratification of surface waters. Measured temperatures ranged between 19.5 and 21 °C and salinities exceeded 36.5 g/L, in the offshore of the study area. Also, this season is manifested by active centers of resurgence, located at Cape Juby (28° N). The upwelling waters are cold (16 °C), less salty (< 36.2 g/L) and originate at 250 m depth. On the other hand, temperature and salinity show an active center of resurgences limited to the level of Cape Boujdour (26°–25°N), marked by a patch of cold water (16.07 °C) and less salty (36 g/L). Offshore, temperatures are around 23.1 °C and salinities around 36.6 g/L with averages in the southern area around 20.1 °C and 36.3 g/L, respectively, for temperature and salinity. Low salinities (35.9 g/L) were observed at Cabo Blanco, but temperatures were higher (21.3 °C), indicating the absence of upwelling activity, on the one hand, and the presence of warm and less salty South Atlantic Central Waters, on the other (Figs. 4, 5 and 6).
Oceanographic and Hydrological Study of the Moroccan Atlantic Coast …
151
Fig. 5 Variation of surface temperatures during the period June 2008–July 2009
As for the autumn season, the area located to the south (Cape Juby–Cape Boujdor) is characterized by a homogeneity of warm surface waters (22°–23.5 °C) and saltier (36.7 g/L). The active center of the resurgences, manifested by a minimum temperature (17.5 °C), is located at Cape Draa (28° 30' N) and coincides with a minimum salinity of 36.3 g/L. Toward the open ocean, temperatures and salinities reach their maximum, 23.5 °C and 36.9 g/L, respectively (Fig. 7). This situation is reflected on the surface by cold, less salty waters. As for the area north of Dakhla, particularly the Cape Boujdor region, warmer and saltier waters have invaded the continental shelf to the coast. Sea surface temperatures are higher in October 2015 compared to those observed in May 2016 (Fig. 8).
152 Fig. 6 Variation of surface temperatures during the period November 2007–November 2008
K. H. Ali et al.
Oceanographic and Hydrological Study of the Moroccan Atlantic Coast …
153
Fig. 7 Spatial and temporal evolution of surface temperatures during the year 2014
4.2 Salinity of Atlantic Surface Water Bodies Salinity values along the entire Atlantic coast show slight variations between the summer and winter seasons (Fig. 9). From north to south, values vary between 36.9 and 35.7 g/L. According to [17], the spatial distribution of temperature and salinity shows considerable variability on either side of Cabo Blanco and Cabo Boujdor. The salinity reveals a horizontal stratification of surface waters from the coast to the open sea (Fig. 10). The distribution of salinity (g/L) indicates a seasonal variability marked by warmer water in October than in May. On the other hand, a superficial water mass at the entrance to the bay with relatively low salinity (Fig. 11).
4.3 Dissolved Oxygen The surface distributions of dissolved oxygen show that surface waters are well oxygenated (8 mg/L), unlike the waters of the resurgences, which are less saturated with dissolved oxygen, not exceeding 5.5 mg/L (Fig. 12). South of Cape Boujdor, the hydrological situation is more marked by resurgence activity, located south of Dakhla and at Cape Blanc where this activity was more accentuated. This situation is reflected at the surface by cold water, less salty and less saturated with dissolved oxygen.
154
K. H. Ali et al.
Fig. 8 Spatiotemporal evolution of surface temperatures (October 2015 and May 2016)
4.4 Description of the Upwelling Process on the Moroccan Atlantic Coast by Remote Sensing Monitoring of the state of the upwelling is carried out continuously throughout the year. Figure 13 shows the monthly thermal fields of temperature (SST) over the annual average of 22 years from 1985 to 2006. Indeed, the marine environment of the southern zone is favorable to several species [18] due to the influence of deep waters’ resurgences. Based on the variability of the phenomenon studied over the period 2002–2014 which is based on the surface temperatures of the Atlantic water between the coast and the open sea of the Moroccan Atlantic coast, a weekly coastal upwelling index is calculated to characterize this upwelling activity [8, 11]. Concerning the year 2014, the upwelling activity shows a strong activity, both for the northern areas (26–29° N
Oceanographic and Hydrological Study of the Moroccan Atlantic Coast …
155
Fig. 9 Spatial distribution of surface salinity during 2007–2009
and 29–33° N) and for the southern areas (21–24° N and 24–26° N) of the Moroccan Atlantic coast (Fig. 14). In terms of interannual variability over the period 2002–2014, the upwelling activity generally shows a downward trend during the years 2004–2007 and 2009– 2010 and upward in the period 2011–early 2012, which is a high activity of the phenomenon. In 2013, the upwelling activity presented an average situation along the areas studied (Fig. 15). Oceanographic surveys conducted during 2014 indicate seasonal variability in deep water upwelling at the three resurgent areas [15, 19].
156
K. H. Ali et al.
Fig. 10 Spatial distribution of surface salinity during 2014
In northern area (Cape Spartel to Cape Cantin), upwelling activity occurs more during spring (May 2014) along the coast, while in autumn (October 2014), resurgences are limited in south of El Jadida. In addition, to central area from Cape Cantin to Cape Boujdor during the summer season (June–July 2014), upwelling activity was significant and more pronounced at Cape Ghir than at Cape Juby. While during the fall season, the situation was reversed. In addition, resurgences of Cape Sim are pronounced toward the open sea marking the filament of Cape Ghir [19–21]. On the other hand, area from Cape Boujdor to Cape Blanc, the upwelling activity manifests itself well during the summer season at Cape Boujdor. Thus, the hydrological situation during this year shows a slight increase in the offshore temperature without influence on the activity of coastal resurgences.
Oceanographic and Hydrological Study of the Moroccan Atlantic Coast …
157
Fig. 11 Spatial distribution of surface salinity (October 2015 and May 2016)
This phenomenon is marked by warm temperatures during September and October 2014 in the northern part of the coast between Tangier and Tarfaya and south of Cape Juby during October 2014. Exceptional rainfall is noted during November 2014 where lows were strongly reported off the northern and central areas of Morocco.
158
Fig. 12 Surface distribution of dissolved oxygen (mg/L) (Spring—Summer 2014)
Fig. 13 Annual average SST image series of 22 years (CRTS-INRH)
K. H. Ali et al.
Oceanographic and Hydrological Study of the Moroccan Atlantic Coast …
159
Zone 26-29°N
Zone 24-26°N
Zone 21-24°N
Fig. 14 Interannual evolution of the coastal upwelling index and its trend (in red) over the period 2002–2014 [8]
160
K. H. Ali et al.
Fig. 15 Upwelling index along the Moroccan areas over the period 2002–2014 [8]
5 Conclusion The Moroccan Atlantic coast is an area that is characterized by the presence of four upwelling zones that are located all along the coastline, especially between latitudes 21–34° N. The upwelling systems depend very closely on the physical environment and changes in oceanographic processes. In addition, it is important to say that the environment is changing due to the variability of the upwelling system. So, seasonal campaigns and a high future research must be conducted at sea, along the Moroccan Atlantic coast, through its research vessel “Amir Moulay Abdellah”. *Area 1, between Cape Boujdor and Dakhla, active during both seasons with variable intensity; *Area 2, between Cape Barbas and Cape Blanc, very rich in mineral matter (phosphates), very influenced by the almost permanent existence of the upwelling and by the propagation, toward the North, of the South Atlantic Central Waters. Concerning the Moroccan Atlantic coast, the surface water masses are distinguished by cool temperatures throughout the year due to the permanent activities of the upwellings in the three zones including Larache, Agadir, and between Cap Boujdor and Cap White. These vertical marine currents enrich the surface water layers of the euphotic zone with nutrients and therefore promote the development of phytoplankton, which represents the first link in the trophic chain for fishery resources. In terms of fisheries, Morocco’s interest lies in its position between Europe and Africa, at the crossroads of water masses of different origins and densities. The
Oceanographic and Hydrological Study of the Moroccan Atlantic Coast …
161
interest also lies in the importance of its maritime space extended to more than one million km2 of water. Finally, thanks to their location in upwelling zones, the Moroccan coasts are among the richest in fish in the world. Demersal resources are characterized by the diversification of species. The main fisheries are the cephalopod fishery in the south, the Hake/Shrimp fishery in the north between Tan-Tan and Tangier and the Mediterranean fishery. Alongside the coastal and offshore fisheries, there are other coastal activities such as the collection of algae, corals (red coral), or certain species of shellfish. The development of maritime fishing activities has made it possible to reach a ceiling of 1 million tons of catches. It is, moreover, intimately linked to port infrastructure and equipment in addition to the infrastructure established in the service of fishing.
References 1. Furnestin J (1959) Hydrology of Atlantic Morocco. Scientific Technical Institute Work Review. Marit Fish 23(1):5–77 2. Belvèze H (984) Biology and dynamics of sardine populations (Sardina pilchardus) inhabiting the Atlantic coasts and proposal for fisheries management. State Thesis, Brest Occidentale University, p 531 3. Binet D (1988) Possible role of an intensification of the trade winds on the change in the distribution of sardines and sardinellas along the West African coast. Aquat Liv Res 1:115–132 4. Mittelstaedt E (1983) The upwelling area off Northwest Africa: a description of phenomena related to coastal upwelling. Prog Oceanogr 12(3):307–331 5. Makaoui A, Bessa I, Agouzouk A, Idrissi M, Belabchir Y, Hilmi K, Ettahiri O (2021) The variability of the Cape Boujdor upwelling and its relationship with the Cape Blanc frontal zone (Morocco). Front Sci Eng 11:1 6. Makaoui A (2008) Study of the coastal upwelling of the Moroccan Atlantic coast and its contribution to the sedimentology of the continental shelf. Thesis Doc. Univ. Ben Msick, Casablanca, Morocco, p 131 7. Benazzouz AK, Hilmi A, Orbi H, Demarcq et Attilah A (2006) Spatiotemporal dynamics of Moroccan coastal upwelling by remote sensing from 1985 to 2005. Geo Observer 15:15–23 8. Benazzouz A, Mordane S, Orbi A, Chagdali M, Hilmi K, Atillah AL, Pelegrí J, Demarcq H (2014) An improved coastal upwelling index from sea surface temperature using satellite-based approach. The case of the Canary Current upwelling system. Cont Shelf Res 81:38–54. https:// doi.org/10.1016/j.csr.2014.03.012 9. Moujane A, Chagdali M, Blanke B, Mourdane S (2011) Impact of winds on upwelling in southern Morocco; contribution of the ROMS model forced by the ALADIN and QuikSCAT data. Bulletin of the Scientific Institute, Rabat, Earth Sciences section,vol 33, pp 53–64 10. Nieto K, Demarcq H, McClatchie S (2012) Mesoscale frontal structures in the Canary upwelling system: new front and filament detection algorithms applied to spatial and temporal patterns. Remote Sens Environ 123:339–346 11. Benazzouz A (2014) Coastal upwelling and the effect of mesoscale ocean dynamics on plankton variability and distribution in the Canary Current upwelling system. Hassan II CasablancaMohammadia University, p 262 12. Orbi A (1998) Hydrology and hydrodynamics of the Moroccan coasts: Paralic environments and coastal zones. Expo’98- Lisbon, Okad Edition, p 68 13. Orbi A, Nemmaoui M (1992) Fluctuation of winds and variability of upwelling along the Moroccan Atlantic coast. Rev. Works and Documents. Scientific Institute of Maritime Fisheries Morocco, vol 75, p 50
162
K. H. Ali et al.
14. Makaoui A, Orbi A, Hilmi K, Zizah S, Larissi J, Talbi M (2005) The upwelling of the Atlantic coast of Morocco between 1994 and 1998. C. R. Geoscience 1518–1524 15. Benazzouz A, Demarcq H, Chagdali M, Mordane S, Orbi A, Hilmi K, Atillah A, Larissi J, Makaoui A, Ettahiri O, Barraho A (2014) Long-term changes and trend in upwelling activity at the Canary current system from satellite imagery. Geo Observer. 21:47–60 16. Roy C (1991) Upwellings: the physical framework of West African coastal fisheries. ORSTOM Ed, pp 38–66 17. Meunier T, Bartone D, Barreiro B, Torres R (2012) Upwelling filaments off Cap Blanc: interaction of the NW African upwelling current and the Cape Verde frontal zone eddy field. J Geophys Res: Oceans 117(8) 18. Abdellouahab H, Berraho A, Baibai T, Agouzouk A, Makaoui A, Rrhif A (2016) Autumn larval fish assemblages on NWAA coast. Chin J Oceanol Limnol. https://doi.org/10.1007/s00 343-017-5302-7 19. Srairi A (2023) Contribution to the study of the biological cycle of Loginidae: case of Loligo vulgaris, (Lamarck, 1798) in the Moroccan South Atlantic zone from Cap Boujdor to Cap Blanc. National Doctorate, IBN Tofail University 20. Salah S, Ettahiri O, Berraho A, Benazzouz A, Elkalay K, Errhif A (2012) Distribution of copepods in relation to the dynamics of the Cap Ghir filament (Atlantic coast of Morocco). C. R. Biology 335 :155–167 21. Makaoui A, Orbi A, Arestigui J, Benazzouz A, Laarissi J, Agouzouk A, Hilmi K (2012) Hydrological seasonality of cape Ghir filament in Morocco. Natural Science 4(1):5–13
The Acceptance of Artificial Intelligence in the Commercial Use of Crypto-Currency and Blockchain Systems Mkik Marouane, Mkik Salwa, and Ali Hebaz
Abstract According to the literature review of technology acceptance designs, artificial intelligence is presented as a tool for technical and commercial use of cryptocurrency, and it is based on several dimensions of multidimensional attractiveness hence the notion of blockchain. We are focused on an acceptance test according to the axes and technical orientations in order to know the tricks of changing the thoughts and behaviors of the human being. At this level, several theories have been linked (the New Technology Acceptance Model, Standardized framework for integrating new technologies, Theory of Predicted Behavior, using concepts from both the theory of flow and behavioral reasoning (TOF)) in a single model. To concretize our approach, we opt for a quantitative approach based on the PCA method (principal components analysis) in order to reduce the items and to answer the hypotheses posed at the beginning. We note from the results of our study the existence of a strong significance between the dimension “The technology tools of commercial intervention” and the variable to be explained “Technological efficiency” which is slightly significant which allows us to determine a deep acceptability of the blockchain systems. Keywords Crypto-currencies · Technology acceptance · Behavioral reasoning theory · Principal component analysis · Supply chain · Theory of reasoned action
M. Marouane (B) University Mohamed V, FSJES Souissi, Souissi, Rabat, Morocco e-mail: [email protected] M. Salwa University Sultan Moulay Slimane, Beni Mellal, Morocco A. Hebaz University Chaouaïb Doukkali, National School of Commerce and Management, El Jadida, Morocco e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_12
163
164
M. Marouane et al.
1 Introduction We believe that it is vital to understand the origins of AI before debating its legality and its integration into HRIS. In literature, AI’s genesis may be found in a broad range of works, from the myth of the golem [1] to Isaac Asmirnov’s three laws of robotics (1942), that is according to a recent study [2]. Indeed, he has made significant contributions to the advancement of this field, especially through the creation of a Turing test or the imitation game, the initial goal of which is to ascertain whether a person can tell, based on written exchanges alone, whether or not his interlocutor is a woman or someone who pretends to be a woman. The evaluation was then bolstered. In this version, the player must determine whether or not the female character they are corresponding with is a computer program [3]. The second version of the Turing test is focused on testing a computer’s ability to conceal its true identity while conversing with a human. This is an ideal test for chatbots whose main selling point is that they can pass for human conversation [4]. Artificial intelligence was coined in 1955 and initially used at Dartmod [5]. It was hoped that by bringing together specialists from many domains, this institute may help lay the groundwork for a new academic sector dedicated to the study and advancement of AI (computation, neural networks, learning, natural language, creativity, and abstraction). Therefore, it is essential to model human intelligence in order to appreciate the workings and capacities of this technology. The dictionary Le Robert defines intelligence as the “power of knowing, of understanding; trait of the mind that knows and adapts fast” [6]. Human intelligence is discussed from two different vantage points here: “intelligence as a general factor” and “intelligence as a sequence of intricate methodology applied” [7]. A general factor for intelligence was first suggested by Spearman in 1923 in the form of the G-factor. This author proposed this theory after seeing that students who score well on one kind of intelligence exam also tend to perform well on all other types of intelligence tests. This G-factor, he argues, should be considered as a composite indicator of superiority in a broad variety of cognitive abilities. Starting in the 1930s, this idea was met with fierce resistance [8]. After then, numerous authors sought to narrow down what intelligence is by separating its component pieces [9]. In this context, our problem is as follows: to what extent does the acceptance of artificial intelligence improves the commercial use of crypto-currency and blockchain systems?
2 Literature Review and Hypotheses Development Human intelligence is discussed from two different vantage points here: “intelligence as a general factor” and “intelligence as a sequence of intricate intellectual processes” [10]. A general factor for intelligence was first suggested by Spearman in 1923 in the form of the G-factor. This author proposed this theory after seeing that students who
The Acceptance of Artificial Intelligence in the Commercial Use …
165
score well on one kind of intelligence exam also tend to perform well on all other types of intelligence tests. He then proposes that this G-factor can be interpreted as a marker for superiority in a broad variety of cognitive abilities. In the 1930s, this idea was already being faced with fierce hostility [11].
2.1 Blockchain and the Use of Crypto-currencies in Supply Chain Supply Chain Operations? Blockchain is a technology for storing and transmitting information in a transparent and secure manner, using blocks of data that add up to a chain (hence, the name). It guarantees the authenticity and integrity of the data recorded, without it being possible to falsify it. Cryptocurrencies, such as Bitcoin, are virtual currencies that use blockchain technology to guarantee their security and traceability. In supply chain operations, blockchain and cryptocurrencies can be used to ensure traceability and transparency of transactions between different actors in the chain. For example, they can be used to track the origin and quality of a product from production to delivery to the end-consumer by recording all stages of the chain on the blockchain. The new methods of crypto-currency that can be used in the banking system are that Decentralized Finance (DEFI) and environmentally friendly crypto-currencies are revolutionizing the world of finance. Information and communication technologies (ICTs) play a crucial role in the implementation and dissemination of these new methods. ICT plays a key role in the implementation and guidance of these new crypto-currency methods. Their smart use can help create a more inclusive, transparent, and environmentally friendly system of finance.
2.2 Links Between the Theory of Reasoned Action and the Theory of Planned Behavior The theory of reasoned action and the theory of planned behavior are both theories explaining human behavior. They were both developed by Icek Ajzen, a social psychologist. The theory of reasoned action suggests that human behavior is the result of behavioral intention, which is itself influenced by three factors: attitudes toward the behavior, social norms, and perceived control over the behavior. In other words, if a person has a positive attitude toward the behavior, if he or she believes that important people around him or her approve of the behavior, and if he or she has the perception that he or she can control the implementation of the behavior, he or she will be more likely to carry out the behavior. The theory of planned behavior is an extension of the theory of reasoned action that places more emphasis on behavioral intentions and
166
M. Marouane et al.
includes a fourth factor to explain the ease or difficulty of performing the behavior. However, both theories share the same key concepts and were developed by the same author. The most important component of the theories of reasoned action and planned behavior is behavioral intention. According to these theories, an individual’s intention is the best predictor of future behavior.
2.3 The Theory of Reasoned Action and Technological Acceptance Technological acceptance, on the other hand, was developed by Fred Davis in 1989 in order to understand how individuals accept new technologies. According to this theory, technological acceptance depends on two main factors: the perceived usefulness of the technology and the perceived ease of use of the technology. The higher the perceived usefulness and ease of use, the higher the adoption of the technology is likely to be. The Technology Acceptance Model (TAM) is a theory of user psychology that seeks to explain why and how individuals adopt or reject a technology. The TAM is based on Fishbein and Ajzen’s Theory of Reasoned Action (TRA), which posits that an individual’s intention to behave in a certain way is determined by a combination of his or her own attitude toward the behavior, subjective norm (perceived social influence), and perceived behavioral control. The theory of reasoned action was developed by Martin Fishbein and Icek Ajzen in 1975. According to this theory, individuals’ behaviors are guided by their intentions, which are influenced by their attitudes toward the behavior, social norms, and their perceived ability to perform the behavior. With respect to technology use, the TRA suggests that individuals will use a technology if they have a favorable attitude toward its use, if they perceive that significant other think, they should use it and if they believe, they are capable of using it successfully. The Technology Acceptance Model (TAM model) has been used by scientists to predict whether or not individuals would start using a new piece of software for either personal or professional purposes since 1989. Numerous industries outside of IT have adopted it, each with their own spin on how to use it [12]. The following models for technology adoption are shown by the literature.
2.4 Technology Acceptance Model Understanding how consumers embrace and use new technologies is a common goal, and the Technology Acceptance Model is a popular framework for doing so [13].
The Acceptance of Artificial Intelligence in the Commercial Use …
167
TAM suggests that user opinions on a technology’s utility and simplicity of use are strong indicators of TAM’s claim that broad adoption is likely [14]. Hong and Huang (52,017) used the TAM framework with the innovation diffusion model, the expectation–confirmation theory, and the flow theory to examine what variables impact customers’ choices to use or acquire a wristwatch. Combining TAM with enjoyment, social cognition, privacy, and social participation [15] predict that users will want to use voice assistance. TAM was also used by [16] to study the adoption of service robots in the context of trust and experience satisfaction services [17].
2.5 Acceptance and Use of Technology: Unified Theory UTAUT found that eight moderators (voluntariness, gender, experience, age,) were significant for predicting behavioral intentions and technology usage, in addition to the four core components (performance anticipation, effort expectation, social impact, and enabling circumstances) [18]. Several prominent articles emphasize how UTAUT complements existing theoretical frameworks. Many researchers [19] used a hybrid of UTAUT 2.0, the Preference Modification Technique, and the Propensity to Change Technique to investigate the factors that influence consumers’ plans to use wearable healthcare technology. Because of their significance, the latter two theoretical stances had to be incorporated to account for health-related behaviors and health-related technologies. One’s perceptions of their own coping abilities (response effectiveness, response cost, and self-efficacy) and threats are the first guiding principle of PMT (i.e., vulnerability and severity). Second, PCT argues that users will opt to participate in an activity when the advantages exceed the dangers to their privacy [20]. In a study conducted by [21], UTAUT has been enhanced through the integration of realism maximization theory and anthropomorphism literature, shedding light on how individuals perceive and engage with virtual assistants. Meanwhile, [22] employed UTAUT to examine the factors influencing consumers’ adoption of ARTS and demonstrated that hedonic motivation emerged as the most significant predictor of behavioral intentions to use ARTS [23].
2.6 Planned Behavior Theory Planned behavior theory is a branch of psychology that attempts to explain how thoughts, plans, and actions are interconnected. According to the theory’s core premise, factors like as one’s attitude toward the activity, one’s subjective norm, and one’s perception of their own behavioral control all play key roles in one’s intention to participate in a behavior [24].
168
M. Marouane et al.
To forecast customers’ intentions to use mobile recommendation agents (MRAs), favour businesses that use this technology, and buy the items suggested by MRAs [25]. Another example of the theory’s usefulness is a study conducted by Ajzen and others on consumer proclivity to adopt smart grid technology. They examined a variable related to resistance to change in addition to the three attitude precursors identified by Ajzen. The theoretical framework explained the motivations behind the observed behavior in both circumstances [26].
2.7 Behavioral Reasoning Theory (BRT) Based on Abelson and Tversky’s Theory of Planned Behavior, the Behavioral Intention Theory was developed (BRT). Since it combines the arguments for and against an action as antecedents of its individual perspectives, the Theory of Planned Behavior is able to account for proponents’ and opponents’ opinions in a consistent framework. Sivathanu investigated the role of Internet of Things (IoT)-based wearables in healthcare settings, and [27] extended this line of inquiry to the widespread adoption of IoT produce in the agriculture sector, where AI can be used to manage the voluminous amounts of data generated by sensors. The idea has been developed to consider how people feel and what they desire from their experiences with autonomous cars [28].
2.8 Decision Theories Decision theory is built on the premise that the mind and decision-making process are not inherently mysterious, but rather can be explained. Expert system algorithms have been developed using artificial intelligence, with inspiration from decision theory, particularly in the context of heuristics-based human resource management. Economics and game theory rely on anticipated utility theories to explain human behavior in the face of uncertainty; Matsui demonstrated that CBDT may yield the same findings. The Case-Based Decision Theory (CBDT) method relies on prior examples to inform potentially dangerous judgments [29].
2.9 Theory of Flow When you are really into something, time just seems to fly by to be in a state of “flow” which is to be “psychologically absorbed in and enjoying one’s activity while one is engaged in that action” [30]. As an application of this notion, robots have been built that can adjust the challenge of a job to both attract and retain a user’s attention. By exploring how users’ impressions of their voice assistant’s personality determine
The Acceptance of Artificial Intelligence in the Commercial Use …
169
the flow of their spoken interactions, Poushneh expanded the breadth of flow theory and its potential to influence users’ thoughts and behavior. Researchers [31] found that interactions between people and robots had a more positive impact on cognitive absorption (flow) than interactions between humans.
3 How to Use ICT in Guiding? Information and communication technology (ICT) can be helpful for guiding by providing various resources and tools that can assist in enhancing the guiding experience for both the guide and the participants [19]. Here are some ways in which ICT can be used in guiding: 1. Communication: ICT can be used for facilitating communication between the guide and the participants. This can include email, messaging apps, social media, and video conferencing tools. 2. Information sharing: ICT can be used to share information, such as maps, route plans, and information about the location or activity being guided. 3. Online guides and tutorials: Online guides and tutorials can be developed and shared with participants to provide additional information and support. 4. Digital tools: ICT can provide digital tools, such as apps or websites that can be used for organizing and managing the guiding process. These tools can help in scheduling, tracking, and sharing data with the participants. 5. Multimedia presentations: Multimedia presentations can be created to share information about the location or activity being guided. This can include videos, photos, and other multimedia content [4]. By using ICT in guiding, guides can make their services more accessible and engaging for participants. Participants can benefit from the use of ICT by being able to access information and communicate with their guide more easily.
4 Establishing Constructive Collaboration Through Technology Channels Constructive collaboration through technology channels has become essential in the modern business world. By using communication tools such as emails, real-time chats, and video conferences, collaborators can work together to achieve their goals faster and more efficiently than ever before. The benefits of this collaboration are numerous, including greater flexibility, better coordination, and increased efficiency. It also enables businesses to connect with clients, suppliers, and business partners all over the world, which can lead to new growth and development opportunities. Ultimately, constructive collaboration through technology channels is a key element
170
M. Marouane et al.
in maintaining a competitive business capable of meeting the challenges of the global market [22, 24].
5 The Empirical Model The methodology used in this article is based on a confirmatory approach using the factorial method (PCA) through the SEM model. At this level, we interviewed a statistically significant subset of BMCE bank clients who used financial products for professional purposes in order to answer our research question. The different clients addressed are assumed to be “important” since they have obtained a credit of more than 50,000 usd and they respect the payment deadlines. Description of the confirmatory study sample is shown in Table 1. The second criterion is the skills of the interviewees. In order to make our sample more meaningful, we choose only those clients (users) who have obtained major university degrees in computer science and technology (masters, doctorate) or international certifications in the technological field. Sample criterion: skills’ criterion are shown in Table 2. The factorial method requires a purification of the items in the initial model. It consists of four variables that define several items. They are presented as in the following Table 3. Through Table 3, we were able to elaborate our conceptual model while specifying the first purification of the model according to the SEM method. First purification of the module is shown in Fig. 1. Second purification of the module is shown in Fig. 2.This model supports the importance of our literature review and theoretical explanations by showing that there is a strong correlation between the four research aspects. Cronbach’s alpha for the dimension “The technological Tools of commercial intervention” is 0.911, and the Rho for the variable to be Table 1 Description of the confirmatory study sample Level of study
Number of users
Female
25 users
Man
25 users
Bac + 5
20 users
Table 2 Sample criterion: skills’ criterion Users
The level of education
Degree/certificate specialty
34
Bac + 5
Master’s degree specialized in management sciences
14
Bac + 8
Ph.D. in management and management science
2
Certificates
International certificates
The Acceptance of Artificial Intelligence in the Commercial Use …
171
Table 3 Presentation of the items related to the research variables Variables
Symbol of items
Items
Authors’ references
Technological knowledge
MA 1 MA 2 MA 3 MA 4
I am very familiar with technological issues I consider the technology to be manageable I test technology options before perfect mastery I rely on the menus describing the technology tools to facilitate understanding
Lamagna et al. [4]
Technological intent
BR1 BR2 BR3 BR4
I always look for technology products to carry out my financial transactions I often work with technology tools I feel comfortable using technology I make my complaints quickly if there is a bug in the banking application
The technology tools of commercial intervention
BM1 BM2 BM3 BM4
I always look for the latest technology from the bank Tokpavi et al. [28] I try to check the latest updated options in the banking application I check the new credit granting programs set out by the bank I use the application for its speed of verification and payment
Technological efficiency
TAIA1 TAIA2
The mobile application reduces transaction costs
Song et al. [6]
TAIA3 TAIA4
The transaction through the bank’s website is very secure for me Account verification with the application is very efficient for me I use the mobile application because it eliminates transaction costs
Zhang et al. [6]
explained, “Technological efficiency,” is 0.121, indicating a high degree of reliability and validity between the two. Even though the “MA4” and “BR4” elements in the dimensions are negligible at 0.478 and 0.691, respectively, all other dimensions are substantial [31]. The data were analyzed using partial least square method (PLS) for factor structural equation modeling, using Smart PLS 3.3.2 as the statistical software of choice (SEM). Anderson and Gerbing’s three-step method was followed by the researchers (1988). Starting out with some describing information, he went into detail. Second, to guarantee the construct’s validity and reliability, he approximated the measurement model. Then, he made sure that the structural model could be used to put theories to the test. Details of these three phases are as the following Table 4. Composite reliability is a good indicator of a measuring instrument’s dependability; it should be more than 0.7. Convergent validity is evaluated using the loadings and the extracted mean variance. To guarantee the capacity of the chosen indicators to represent the latent construct being assessed, factor loadings must be at least 0.7 and the average variance extracted (AVE) must be more than 0.5. Convergent validity between all items is shown in Table 5.
172
M. Marouane et al.
Fig. 1 First purification of the module
Fig. 2 Second purification of the module (structural equation model) Table 4 Fornell–Larcker criterion Subjective knowledge
Subjective knowledge
Technological efficiency
Technological intent
The technology tools of commercial intervention
0.871
Technological efficiency
−0.035
0.862
Technological intent
0.039
0.536
0.855
The technology tools of commercial intervention
0.453
0.112
0.163
0.820
The Acceptance of Artificial Intelligence in the Commercial Use …
173
Table 5 Convergent validity between all items Item indicators’ reliability (CR)
Type of measure item Cronbach alpha AVE
Subjective knowledge
TAIA1
0.78
0.8
TAIA2
0.89
0.6
TAIA3
0.78
0.9
TAIA4
0.87
0.8
BM1
0.89
0.9
BM2
0.95
0.7
BM3
0.86
0.9
BM4
0.78
0.9
BR1
0.86
0.8
BR2
0.95
0.7
BR3
0.78
0.9
BR4
0.86
0.5
The technology tools of commercial intervention BM1
0.89
0.6
BM2
0.78
0.8
BM3
0.96
0.9
BM4
0.85
0.7
Technological efficiency
Technological intent
Participate in government programs aimed at flattening the income distribution. This study investigates the potential of RACAP to combat the issue of concealed information by gauging if and how it could promote public engagement. Because they are the ones who have the information that other citizens need to drive knowledge sharing, individuals are largely responsible for the level of verification and exchange of information among their own connections. What this means is that individual will have to actively listen and investigate? Discriminant validity is shown in Table 6. According to the Fornell–Larcker criteria, a latent variable is discriminately valid if it exhibits more heterogeneity among its own indicators in comparison to other variables. This condition applies if the AVE of a construct is larger than the square the correlations between it and the other implicit variables in the model. Table 6 Discriminant validity Alpha Cronbach
rho_A
Friability composite
Average variance extracted (AVE)
Subjective knowledge
0.841
0.857
0.904
0.759
Technological efficiency
0.911
1.235
0.919
0.744
Technological intent
0.812
0.881
0.889
0.732
The technology tools of commercial intervention
0.841
0.877
0.891
0.672
174
M. Marouane et al.
6 Results and Discussions Simply put, the spread is the difference between interest income and interest expenses. The ratio of loans to deposits is known as the spread. A company’s ROE can be calculated by dividing its net income by its total equity. GovLend tracks how much of a bank’s assets are invested in government-backed securities. A good indicator of scale is the logarithm of total assets as of the end of the year. Provisions for credit risk as a share of total assets at year’s end are a measure of uncertainty. Return on Assets (ROA) is a simple metric for gauging a company’s profitability in relation to its total assets [32]. The extent to which a technology is seen as integrated into the larger structure of an organization is a major factor in how well it is received. Faster adoption of a new technology can be attributed to better internal communication about it. To begin, the organization can facilitate the adoption of technologies by making it easier for employees to have faith in the availability of implementation aids and to perceive the support of peers or superiors in the organization’s hierarchy. However, some workers may feel that their mobility is being curtailed by the introduction of new technologies. When forced to use a piece of technology, people tend to lose faith in their own agency. Two items were discarded after classification because their significance level was less than 0.7. The idea that the mind and the decision-making process are not inherently mysterious but can be explained is at the heart of decision theory. Verification of the model’s quality findings is shown in Table 7. Multi-Group Comparison (Hypothesis) is shown in Table 8. Our study’s hypotheses state that there is a correlation between subjectivity and commercial intervention technology tools. Both technological effectiveness (0.000) and technological intent (0.000) influence the commercial intervention technology tools (0.257). Numerous theories have sought to explain how emotions like efficiency affect our actions in virtual communities. Regardless of one’s actual ability level, individuals Table 7 Verification of the model’s quality findings
Index name
Values for the independent model
Chi-deux
4785.78
Degree of liberty (p)
512 (0.000)
Chi-deux/ddl
7.742
RMR
0.952
GFI
0.914
AGFI
0.924
RMSEA (p)
0.045 (0.000)
NFI
0.785
CFI
0.69
CAIC (saturated model)
5869.142
The Acceptance of Artificial Intelligence in the Commercial Use …
175
Table 8 Multi-group comparison (hypothesis) Écart-type (STDEV)
Value t (|O/STDEV|)
Values-p
Subjective knowledge → The technology tools of commercial intervention
0.124
3.593
0.000
Technological intent → The technology tools of commercial intervention
0.128
1.136
0.000
The technology tools of commercial intervention → Technological efficiency
0.161
0.694
0.000
have a greater capacity to persist through a difficult endeavor if they feel they can do it. When discussing the Internet, this is indeed the case. Researchers discovered that those who believed in their own social media prowess were more likely to really utilize these platforms [4]. Personality is another factor that can explain why people have different tech habits. A positive outlook on the use of digital technologies is strongly linked to personality traits like openness, the readiness to change one’s mind, and approach in light of new information. Those who spend a lot of time on social media sites tend to be the kind of people who are always up for gaining new knowledge and expanding their horizons [32]. Finally, a study of attitudes toward online advice found that those who were more willing to try new things had a more positive view of the medium overall. Unfortunately, not every salesperson reacts sensibly to the arrival of cutting-edge technology. Individuals’ cognitive and emotional activities, such biases and emotions, may have an impact on their business decisions and lead them astray from the most profitable and advantageous course of action. The banking sector in Morocco is sizable, but some potential investors have been put off due to a lack of familiarity with the market or a dearth of available tools [5]. Market expansion is predicted as conservatives become more receptive to novel ideas (like e-commerce platforms, marketing websites, online promotional offers with SMS, MMS, etc.). The risktaking propensity of the investor, however, dampens this association. To succeed in these situations, salespeople may decide to forgo using their technical knowledge in favor of a commercial tools-based sales approach. Despite the banking industry’s presumed lack of expansion, salespeople may be able to acquire consumers via the use of technology.
7 Conclusion The risk-taking propensity of the investor, however, dampens this association. To succeed in these situations, salespeople may decide to forgo using their technical knowledge in favor of a commercial tools-based sales approach. Despite the banking
176
M. Marouane et al.
industry’s presumed lack of expansion, salespeople may be able to acquire consumers via the use of technology. Previous research on new technologies and business development on the Internet suggests that, with the exception of remote relationships with clients, the age variable may interfere with the desire to use ICT for guiding purposes. These specialists, the more eminent they are, the more vocally they support this form of partition. Most of the young professionals in our sample are psychologists, who are particularly sensitive to confidentiality concerns while meeting with clients for counseling. A few of the more seasoned professionals in the sector take a more educational approach to the job of mentor. This discrepancy in opinion on the value of “remote contacts” in counseling settings is probably reflected of diverse assumptions about the difficulties inherent in building a constructive collaboration through technology channels. The most knowledgeable specialists and end-users of these technologies seem to be working toward the same aim of using ICT to guide commercial and technical performance. Because of this, the first stage in promoting the general use of ICT is to increase its visibility and educate those who are unfamiliar with it. Our research shows, however, that a strong sense of competence is not always linked to the intent to use ICT in guiding. Professionalism and ongoing training for (commercial) recipients may accomplish both goals of increased familiarity and enhanced competence. There is a severe lack of technology literacy, counseling skills, and ethical issues in currently available commercial guidance training programs. Therefore, digital tools should be seen as a cross-cutting problem rather than just another training subject, and training in counseling skills should take place in both traditional face-to-face modalities and digital counseling venues.
References 1. Davis FD, Bagozzi RP (1992) Extrinsic and intrinsic motivation to use computers in the workplace J Appl Soc Psychol 22(14):1111–1132 2. Ajzen (1991) The theory of planned behaviour. Organ Behav Hum Decis Process 50:179–211 3. Harst L, Lantzsch H, Scheibe M (2019) Theories predicting end-user acceptance of telemedicine use: systematic review. J Med Internet Res 21(5), Article e13117 4. Lamagna M, Groppi D, Nezhad VV, Piras G (2021) A comprehensive review on Digital Twins for smart energy management system. Int J Energy Prod Manage 6(4):323–334 5. Cheung R, Vogel D (2013) Predicting user acceptance of collaborative technologies: an extension of the technology acceptance model for e-learning. Comput Educ 63:160–175 6. Ahmad T, Zhang D, Huang C, Zhang H, Dai N, Song Y, Chen H (2021) Artificial intelligence in sustainable energy industry: Status Quo, challenges and opportunities. J Clean Prod 289:125834 7. Chaudhry B, Wang J, Wu S, Maglione M, Mojica W, Roth E (2006) Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med 144(10):742–752 8. Chen Y, Zhou Y (2020) Machine learning based decision making for time varying systems: parameter estimation and performance optimization. Knowl-Based Syst 190:105479 9. Choi YH, Lee J, Yang J (2022) Development of a service parts recommendation system using clustering and classification of machine learning. Expert Syst Appl 188:116084 10. Hwang J et al (2019) Investigating motivated consumer innovativeness in the context of drone food delivery services J Hosp Tour Manag
The Acceptance of Artificial Intelligence in the Commercial Use …
177
11. Boza P, Evgeniou T (2021) Artificial intelligence to support the integration of variable renewable energy sources to the power system. Appl Energy 290:116754 12. Lee BC, Yoon JO, Lee I (2009) Learners’ acceptance of e-learning in South Korea: theories and results. Comput Educ 53(4):1320–1329 13. Li W, Gong G, Fan H, Peng P, Chun L (2020) Meta-learning strategy based on user preferences and a machine recommendation system for real-time cooling load and COP forecasting. Appl Energy 270:115144 14. Mai F, Tian S, Lee C, Ma L (2019) Deep learning models for bankruptcy prediction using textual disclosures. Eur J Oper Res 274(2):743–758 15. BMC Fam Pract, 17 (2016) Bandura Social foundations of thought and action: a social cognitive theory. Prentice-Hall, Englewood Cliffs N.J 16. Hoque M, Bao Y (2016) Cultural influence on adoption and use of e-health: evidence in Bangladesh 17. Milana C, Ashta A (2021) Artificial intelligence techniques in finance and financial markets: a survey of the literature. Strateg Change 30(3):189–209 18. Barboza F, Kimura H, Altman E (2017) Machine learning models and bankruptcy prediction. Expert Syst Appl 83:405–417 19. Pallathadka H, Ramirez-Asis EH, Loli-Poma TP, Kaliyaperumal K, Ventayen RJM, Naved M (2021) Applications of artificial intelligence in business management, e-commerce and finance. Mater Today: Proc 20. Shadangi P, Dash M (2019) A conceptual model for telemedicine adoption: an examination of technology acceptance model. Int J Recent Technol Eng 8:1286–1288. https://doi.org/10. 35940/ijrte.B1916.078219 21. Heinsch M, Wyllie J, Carlson J, Wells H, Tickner C, Kay-lambkin F (2021) Theories informing eHealth implementation: systematic review and typology classification. J Med Internet Res 23(5), Article e18500 22. Bundy A, Chuenpagdee R, Boldt JL, de Fatima Borges M, Camara ML, Coll M, …, Shin YJ (2017) Strong fisheries management and governance positively impact ecosystem status. Fish and Fisheries 18(3):412–439 23. Dishaw MT et al (2009) Extending the technology acceptance model with task-technology fit constructs Inf. Manag. In: Puccinelli NM et al. (eds) Customer experience management in retailing: understanding the buying process J. Retail 24. van der Vaart V, Atema AW (2019) Evers guided online self-management interventions in primary care: a survey on use, facilitators, and barriers 25. Thompson RL, Higgins CA, Howell JM (1994) Influence of experience on personal computer utilization: testing a conceptual model. J Manag Inf Syst 11(1):167–187 26. Chen TH, Chang RC (2021) Using machine learning to evaluate the influence of FinTech patents: the case of Taiwan’s financial industry. J Comput Appl Math 390:113215 27. Dumitrescu E, Hue S, Hurlin C, Tokpavi S (2022) Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. Eur J Oper Res 297(3):1178–1192 28. Huang CL, Tsai CY (2009) A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting. Expert Syst Appl 36(2):1529–1539 29. Rho MJ, Choi I, Lee J (2014) Predictive factors of telemedicine service acceptance and behavioral intention of physicians. Int J Med Inf 83. https://doi.org/10.1016/j.ijmedinf.2014. 05.005 30. Sang G, Valcke M, Van Braak J, Tondeur J (2010) Student teachers’ thinking processes and ICT integration: predictors of prospective teaching behaviors with educational technology. Comput Educ 54(1):103–112 31. Sheedy E, Zhang L, Tam KCH (2019) Incentives and culture in risk compliance. J Bank Finance 107:105611 32. Fishbein M, Azjen I (1975) Belief, attitude, intentions, and behavior: an introduction to theory and research. Reading Addison-Wesley
Deep Hybrid Model with Trained Weights for Multimodal Sarcasm Detection Dnyaneshwar Bavkar, Ramgopal Kashyap, and Vaishali Khairnar
Abstract Sarcasm detection is one the most challenging task in natural language processing. Though sentiment semantics are necessary to improve sarcasm detection performance, existing DL-based sarcasm detection models do not fully incorporate them. This research suggested the Hybrid RNN and Optimized LSTM for Multimodal Sarcasm Detection (HROMSD) model. The model is processed under the four stages: preprocessing, feature extraction, feature level fusion, and classification. The initial stage of this proposed technique is preprocessing, here input of the multimodal data, which comprises of text, video, and audio are preprocessed. Here, the text will be preprocessed under tokenization and stemming, the video will be preprocessed under face detection and the audio will be preprocessed under filtering technique. Then, the stage of feature extraction takes place, where the features from preprocessed text, video, and audio are extracted, here, n-grams, TF-IDF, improved Bag of Visual Words, and emojis are extracted as the text features; then CLM and improved SLBT based video features are extracted from the video features, and chroma, MFCC, jitter and special features are extracted from the audio features. The resultant extracted features set are subjected for feature level fusion stage, which makes use of an improved multilevel CCA fusion technique. The classification is carried out using Hybrid RNN and Optimized LSTM for detection purpose, where Improved BES (IBES) method utilized to increase the detection system’s performance. When compared to earlier research, the proposed work is more accurate. Keywords Sarcasm detection · Ensemble model · Deep learning · Optimization · Tokenization D. Bavkar (B) · R. Kashyap Department of Computer Science and Engineering, Amity University, Raipur, Chhattisgarh, India e-mail: [email protected] R. Kashyap e-mail: [email protected] V. Khairnar Department of Information Technology, Terna Engineering College, Nerul, Navi, Mumbai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_13
179
180
D. Bavkar et al.
Nomenclature NLP DL ML RF CCA IAC LSTM CNN RNN CUU AOA BES
Natural language processing Deep learning Machine learning Random forest Canonical correlation analysis Internet argument corpus Long-short term memory Convolutional neural network Recurrent neural network Context understanding unit Arithmetic optimization algorithm Bald eagle search
1 Introduction Sarcasm is a type of verbal behavior that uses positive or negative terms in the sentimental analysis. The contrast between a positive mood and a negative circumstance is typically what makes a sentence ironic [1, 2]. There are two types of negative situations: situations with direct negative feeling and regular situations that become negative situations in specific sentences. Because of the polarity between the actual sentiments and literal, sarcasm research can help to advance sentiment analysis in NLP. Sarcasm detection is the task of determining whether a sentence in a particular context contains a specific amount of sarcasm. Sarcasm detection has lately gained the attention of NLP professionals due to its potential use in a variety of NLP applications such as human–machine communication, sentiment analysis, and others. Sarcasm detection methods now in use can be categorized into a number of categories, such as text-based sarcasm identification, multi-modal sarcasm recognition, and others [3]. For the most part, prior sarcasm detection algorithms depended on manually created sentiment features. Many researchers extracted features with sentiment information to detect sarcasm with traditional machine learning models [1]4. Recent studies attempted to employ DL to address the issue because extracting features requires a lot of manual labor. But in numerous relevant researches conducted over the past few years using data gathered from Twitter, there has been a fundamental oversight about the absence of empirical investigations into the best methods for irony and sarcasm recognition [5]. In addition, studies comparing different data preprocessing and dataset manipulation approaches to improve the outcomes of irony and sarcasm recognition have not been conducted, raising concerns regarding the methods of data collection, implementation, and limits.
Deep Hybrid Model with Trained Weights for Multimodal Sarcasm …
181
Most similar studies make use of single turn texts from topic-specific sources like Twitter, Amazon, and IMDB. To illustrate the various excerpt scenarios, sentimental and handcrafted elements are used. These features are suggestive of emotion polarity. After being formed, the data is fed into complex neural architectures and traditional ML classifiers such as SVM, RF, or multilayer perceptions or DL methods [6–8]. Consequently, analytical models that may determine the underlying sentimental content and polarity of passages are produced. The main contribution of this research is as follows: • Implements Hybrid RNN and optimized LSTM for multimodal sarcasm detection (HROMSD), where the Improved BES (IBES) algorithm used for optimize the weight in LSTM. This paper organized as follows, Sect. 2 shows the literature survey, Sect. 3 explains the preprocessing, feature extraction and feature level fusion, Sect. 4 describes the classification, Sect. 5 shows the result, Sect. 6 describe the conclusion, and last section shows the references.
2 Literature Survey In 2020, author Banerjee et al. [9] proposed a synthetic minority oversampling technique to address the problem of imbalanced classes, which can negatively impact a classifier’s ability to detect sarcasm in social media. In 2020, author Potamias et al. [10] developed advanced deep learning approach to address the issue of FL form identification. They offer a neural network methodology that substantially advances their earlier work by building on a recently reported pretrained transformer-based network architecture that was improved further through the use of and creation of a recurrent CNN. In 2020, author Ren et al. [11] presented a multi-level memory network based on sentiment semantics to capture sarcasm expression aspects. In this method, the sentiment semantics were represented by a first level memory network, and the distinction between the sentiment semantics and the context of each sentence was represented by a second level memory network. In 2021, author Hiremath and Patil [12] using an approach, they may identify the fundamental cognitive aspects of human speech by recording it in three different ways: text, voice, and temporal facial features. Due to the unstructured nature of the thoughts and emotions employed to make sarcasm, they may have an impact on facial and glottal organ expressions as well as the speech of sarcasm. The method for acquiring data was similarly challenging in comparison to the manner that it was processed to obtain features. The visual data are thought to be highly intriguing and can build a solid foundation for future NLP research. In 2021, author Ren et al. [13] proposed a contextual dual-view attention network (CDVaN) for sarcasm detection according to the formation mechanism of sarcasm. A CUU was created with the goal of effectively extracting the difference between the
182
D. Bavkar et al.
good scenario and the bad sentiment from the standpoint of the sarcastic generating mechanism. Any issue, including serious ones, is open to sarcastic commentary. The individuals make caustic remarks due to the moronic nature of that particular situation. Yet, the social networking site’s dataset’s class imbalance makes it challenging to obtain the requisite accuracy. In this paper, we will use Synthetic Minority Oversampling approaches, together with other classifiers, to handle this deep hybrid model classification for the detection of sarcasm is used.
3 Preprocessing, Feature Extraction and Feature Level Fusion Process 3.1 Preprocessing Preprocessing is the initial stage of this technique. Input that combines text, video, and audio is referred to as multimodal data. Tokenization and stemming are used to preprocess the text while a face identification model and filtering technique are used to preprocess the video and audio, respectively. (a) Text Preprocessing: Tokenization and stemming techniques are used to preprocess text. Tokenization: This process of converting text into tokens takes place prior to vectorization. It would be simple to filter out the unwanted tokens. Stemming: Stemming normalization is a technique for reducing the number of calculations. It is a method for eliminating the suffix from a word and returning it to its original form. Preprocessed outcome of the text is shown in Table 1. (b) Video- Preprocessing: Here, the video is preprocessed via face detection (Viola-jones) model. Viola-jones [14]: Face identification is challenging since images can seem very differently depending on the lighting, position, occlusion, face expression, and image orientation. This work uses the Viola-Jones face detection algorithm. The most significant object detection technique in the 2000s was Viola-Jones, which is acknowledged as the first object detection system that made real-time meaningful object detection possible. Images with windows are categorized as faces when additional characteristics are found. Each step conducts a analysis in feature; if the final outcome falls below the cutoff value, this window is not identified as a face and the level fails. The three key components of this method is attention cascade structure, integral picture, and classifier learning with AdaBoost for successful face identification in real-time applications. (c) Audio- Preprocessing: Additionally, a Butterworth filtering approach is used to preprocess the audio portion of the input data.
Deep Hybrid Model with Trained Weights for Multimodal Sarcasm …
183
Table 1 Preprocessed outcome of the text S.
Dialog
Preprocessed text
1
I never would have identified the fingerprints of string theory in the aftermath of the Big Bang. My apologies. What’s your plan?
I never would have identifi the fingerprint of string theori in the aftermath of the big bang My apolog what your plan
2
Here we go. Pad thai, no peanuts. But does it Here we go pad thai no peanut but doe it have peanut oil? I’m not sure. Everyone keep have peanut oil I not sure everyone keep an eye on Howard in case he starts to swell up an eye on howard in case he start to swell up
3
Great Caesar’s ghost, look at this place. So Penny’s a little messy. A little messy? The Mandelbrot set of complex numbers is a little messy. This is chaos. Excuse me. Explain to me an organizational system where a tray of flatware on a couch is valid
No.
Great caesar ghost look at thi place So penni a littlmessi A littlmessi the mandelbrot set of complex number is a littlmessithi is chao excus me explain to me an organiz system where a tray of flatwar on a couch is valid
Butter worth Filtering: In the pass band, the frequency response of a signal processing filter known as a Butterworth filter is as flat as is physically possible. The term “maximally flat magnitude filter” can also be used to describe it. For continuous time Butterworth filters, the cut-off frequency is determined by the radius of the circle in the s-plane and the angle formed by the poles connected to the squares of the frequency response magnitude with respect to the origin. Once the cutoff frequency and filter order are determined, the poles that characterize how the system functions can be easily delivery. The differential equation needed to explain the filter can be easily created once the poles have been determined. lth order Butterworth low pass filter’s squared magnitude function is calculated in Eq. (1). |B(gω)|2 = B(gω) × B ∗ (gω) =
1 / )2l 1 + gω gωc (
(1)
Here 2l −1 derivatives of B(gω)2 at ω = 0 are equal to 0. Moreover, the Butterworth response is maximally flat at ω = 0, , the derivative of the magnitude response is always (−) for + ω, the magnitude response is minimizing with ω. Further, the magnitude response for ω >> ωc is determined in Eq. (2). |B(gω)|2 = (
1 / )2 gω gωc
(2)
184
D. Bavkar et al.
3.2 Feature Extraction The preprocessed outcome of the text, video and audio are subjected to feature extraction phase. The features are, • Text based features • Video based features • Audio based Features (a) Text based features: Emojis, TF-IDF, N-grams, improved BoW features, and other features are retrieved from the text. (i) TF-IDF: Among the three well-known depiction strategies, TF-IDF is a prominent format for text demonstration and has a longer history [15]. The following Eq. (3) determine the TF-IDF where T Fab indicate the number of attempts word a appear in document b; D Fa indicates the count of document, where word a appears once. If word a is significant for document b, it must comprise a more T Fab and lesser D Fa . TD − IDFab
( = TFab × log
D D Fa − 1
) (3)
The symbol TF − IDF indicate the extracted features. (ii) N-grams: Any collection of n tokens or words is referred to as an “n-gram.“ Furthermore, the definition of the n-gram model is “a method of including sequences of words or characters that allows us to maintain richer pattern discovery in text, i.e. it attempts to capture patterns of sequences (words or characters following one another) while being responsive to appropriate relations (words or characters following one another)” [16]. sNgram denoted the extracted features. (iii) BoW : It is the shortest method for transforming text into features. The review text’s words were also divided up by the BoW into different word counts. In a corpus of text, it counts the times of a phrase appears. It just considers the order of the words inside the text, not their frequency of occurrence. The semantics of visual words are taken into account in the upgraded evaluation but not in the current BoW evaluation. Improved BoW: The visual words’ histograms are employed as feature vectors in the improved BoW. This is determined in Eq. (4). where E and F are denotes the images, and m indicates the visual word number. K (E m , Fm ) is calculated in Eq. (5) 1 indicates the where L indicate the count of levels, h indicates the current levels, 2 L−m weight of each level, S indicates the scaling factor that is calculated using Tent map ∈ (0, 1), and Dmh denotes the intersection function of histogram, it is determined in the following Eq. (6) where HEhm (k) indicates the count of mth visual word in the kth sub region of image E at level h, and we used 5 levels. IBoW indicate the feature of the improved BoW.
Deep Hybrid Model with Trained Weights for Multimodal Sarcasm …
K (E, F) =
L ∑
k(E m , Fm )
185
(4)
m=1
K (E m , Fm ) = DmL +
L−1 ∑ h=0
1 ( 2 L−m
) Dmh − Dmh+1 ∗ S
4h ) ∑ ( ( ) D HEhm , HFhM = min HEhm (k), HFhm (k)
(5)
(6)
k=1
(iv) Text with emoji features: The vocabulary of the emojis is taken into account when retrieving a sentiment score based on Unicode for text that incorporates emojis along with the text features [16–20]. Emojis are ranked according to their neutrality and sentiment score, which are calculated. The emotion score ranges from − 1 to + 1. Emojis with positive or bad connotations are located on the map in right side, respectively. The majority of negative emojis use sad faces. Positive emojis including hearts, wrapped gift, celebration symbols and trophiesare among the most well-liked ones, along with smiling faces. The most interesting emojis are the neutral ones. The neutrality scale for emojis ranges from 0 to 1, and all of them have an emotion rating of 0.TF = TF − IDF + Ngram + IBoW + EMO denoted the extracted text features.
4 Hybrid RNN and Optimized LSTM for Multimodal Sarcasm Detection (Hromsd) Following feature level fusion, a classifier using a hybrid model with RNN and an optimized LSTM is carried out. Then calculate the average of classification outcome of the both RNN and LSTM. Improved BES algorithm used to tune the weight in LSTM that leads to enhance the performance of the detection system. HROMSD architecture is shown in Fig. 1.
4.1 RNN RNNs are artificial neural networks in which the components plot a structured path between the edges [21]. To address issues with RNN, LSTM is specifically utilised in this inquiry. At each level of the LSTM, a memory cell © that refers to intelligence is incorporated, which serves as an explanation. The memory cell companies are controlled by a forget gated, input gate, and output gate RNN and LSTM work together to provide an input gate that allows signals to flow uninterrupted from the input gate to the output gate, hence resolving the problem of RNN explosion.
186
D. Bavkar et al.
Fig. 1 Proposed architecture of multimodal sarcasm detection
Pre-processing Text
Video
Audio
(Tokenization, Stemming)
(Face Detection)
(Filtering)
Feature Extraction Video
Audio
(Improved SLBT, CLM)
(MFCC, Chroma, Special features, Jitter)
Text
(TF-IDF, Improved BOW, n-grams, emojis) i )
Feature Level Fusion (CCA Analysis)
Classification
LSTM
RNN
Wei ght tuned b y Im proved BES
Classified output [1] Sarcastic
[0] No Sarcastic
Equation (7), (8) describe the gates working and learning. Where w denotes the weight b denotes the bias value of the recurrent learning and labeling. The process of generating new labels, every time the LSTM updated the memory cells (ct ). i t = σ (wi x Oxt + wi h h t−1 + bi )
(7)
( ) gt = ϕ wgx Oxt + wgh h t−1
(8)
) ( f t = σ w f x Oxt + w f h h t−1 + b f
(9)
ct = σ (wox Oxt + woh h t−1 + bo )
(10)
h t = ot . tanh(ct )
(11)
Deep Hybrid Model with Trained Weights for Multimodal Sarcasm …
187
4.2 Optimized LSTM The sequential data that can be operated by LSTM which is the member of the RNN. Process of natural language tasks, LSTM performs better than CNN [22]. xt indicates the input vectors and h indicates the output of hidden contexts. The activation of the LSTM is denoted as ht at time t. In this task, the CRF layer converts the each output h in to a class label. To control the input sequence in LSTM, using the memory gate and forgets gates. The lone LSTM node exists since the input data is a word sequence, and all of its gates are repeatedly utilized. To form a loop in LSTM, merges the input t and output t − 1, then re-enters the input gate. Eqs. (12–17) express the mathematical formula of LSTM. i t = δ(w1 xt + w2 h t−1 + b1 )
(12)
i t ' = tanh(w3 xt + w4 h t−1 + b2 )
(13)
f t = δ(w5 xt + w6 h t−1 + b3 )
(14)
ot = δ(w7 xt + w8 h t−1 + b4 )
(15)
ct = f t .ct−1 + i t . i t'
(16)
h t = ot . tanh(ct )
(17)
4.3 Improved Bald Eagle Search Optimization for Training LSTM The BES algorithm, which similar to the hunting behavior of bald eagles, This algorithm justify the process of every step of the hunting process. As a result, this algorithm can be split into three sections: choosing the search space, looking inside the chosen search space, and swooping [23]. The weight is being tuned as part of the improved bald eagle search optimization to increase performance. Select Stage Bald eagles are choosing the best location that depends on the required availability of food within the chosen search space at this stage. Equation (18) expresses the behavior of this stage. Where α indicates the value from 1.5 to 2. This is control the changes of the position.r denote the value that ranges from 0 and 1. Abest denotes the best location that was selected. Amean denotes, all the information that already used
188
D. Bavkar et al.
in the proceeding points in the search stage. Anew, i = Abest + α ∗ r (Amean − Ai )
(18)
Improved Search Stage In search stage, the Bald eagles are seeks for prey within the selected search space, moving quickly across a spiral space by changing directions. Here Brownian motion is used for enhancing the performance. Brownian movement is thought to be the random zigzagging motion of a particle, which is typically seen under a high power ultra-microscope. The term “Brownian movement” refers to the movement of pollen grains in water, which Robert Brown described. Equation (19) expressed the best location for the swoop using Brownian motion. Where a a value between 5 and 10, Eq. (20) express the Brownian motion, where N (a, b; t1 , t2 ) is a normally distributed random variable with mean a and variance b. The parameters t1 and t2 make explicit the statistical independence of N on different time intervals; that is, if (t1 , t2 ) and (t3 , t4 ) are disjoint intervals, then N (a, b; t1 , t2 ) and N (a, b; t3 , t4 ) are independent. Equation (21) express the calculation of z(i), Eq. (22) express the calculation of y(i ), Eq. (23) express the evaluation of zr (i ), Eq. (24) express the evaluation of yr (i), Eq. (25) express the evaluation of θ (i ), where a indicates a random value, that ranges between 5 and 10 and Eq. (26) expressed the evaluation of r (i ). Here, R is takes a value between 0.5 and 2 for determining the number of search cycles. Ai,new = Ai + y(i ) ∗ ( Ai − Ai+1 ) + z(i ) ∗ ( Ai − Amean ) + X (t + dt) X (t + dt) = X (t) + N (0, (delta)2 dt; t, t + dt)
(19) (20)
z(i) =
zr (i) max(|zr |)
(21)
y(i ) =
yr (i ) max(|yr |)
(22)
zr (i ) = r (i ) ∗ sin(θ (i ))
(23)
yr (i ) = r (i ) ∗ cos(θ (i ))
(24)
θ (i ) = a ∗ π ∗ rand
(25)
r (i ) = θ (i ) + R ∗ rand
(26)
Deep Hybrid Model with Trained Weights for Multimodal Sarcasm …
189
Swooping Stage Bald eagles swing from the ideal location in the search area to their intended prey during the swooping stage. Equation (27) mathematically expresses the swooping stage function. Where c1, c2 ∈ [1, 2], rand value is calculated using chaotic map. This is expressed in Eq. (28), z1(i ) is calculated in Eq. (29), y1(i ) is estimated in Eq. (30), zr (i ) is calculated in Eq. (31), yr (i ) is calculated in Eq. (32), θ (i ) is calculated in Eq. (33) where a indicates a value between 5 and 10 Ai,new = rand ∗ Abest + z1(i ) ∗ (Ai − c1 ∗ Amean ) + y1(i) ∗ (Ai − c2 ∗ Abest ) (27) ) ( (28) rand = cos 0.5 cos−1 Ck z1(i ) =
zr (i) max(|zr |)
(29)
y1(i ) =
yr (i ) max(|yr |)
(30)
zr (i ) = r (i ) ∗ sinh[θ (i )]
(31)
yr (i ) = r (i ) ∗ cosh[θ (i )]
(32)
θ (i ) = a ∗ π ∗ rand
(33)
r (i ) = θ (i)
(34)
Finally, using arithmetic cross over the solution is obtained. We can use arithmetic crossover when encoding real-value data. In an arithmetic crossover, two chromosomes are randomly chosen for crossover and are then combined in a linear way to produce two offspring [24]. This linear combination is designed as per the following computation Eq. (35) and (36) C1 = a.P1 + (1 − a).P2
(35)
C2 = a.P2 + (1 − a). P1
(36)
190
D. Bavkar et al.
Pseudo code of Improved BES (IBES) 1: To Define Random number of populations 2: To determine the fitness value of initial point: 3: WHILE (expected findings are not yet met) Select space 4: For (each point i in the population) 5: Equation was updated. It was expressed in Eq. (18) 6: If f (Anew ) < f ( Ai ) 7: Ai = Anew 8: if f (Anew ) < f (Abest ) 9: Abest = Anew 10: End If 11: End If 12: End For Search in Space 13: For (each point i in the population) 14: Position updated by Eq. (19) with Brownian motion estimation 15: If f (Anew ) < f (Ai ) 16: Ai = Anew 17: If f (Z new ) < f (Abest ) 18: Abest = Anew 19: End If 20: End If 21: End For Swoop Stage 22: For (each point i in the population) 23: Equation was updated by using chaotic map. It was expressed in Eq. (27) 24: If f (Anew ) < f (Ai ) 25: Ai = Anew 26: If f (Anew ) < f (Abest ) 27: Abest = Anew 28: End If 29: End If 30: End For 31: Set k := k + 1 32: Using Arithmetic cross over the solution will be obtained. 33: End WHILE
Deep Hybrid Model with Trained Weights for Multimodal Sarcasm …
191
Sample Audio
Prepro cessed Audio Fig. 2 Sample and preprocessed audio
5 Final Outcome and Discussions of HROMSD 5.1 Simulation Procedure The Hybrid RNN and Optimized LSTM for Multimodal Sarcasm Detection (HROMSD) approach for Multimodal Sarcasm Detection was done using PYTHON. The dataset was provided in [25]. The HROMSD was calculated over the established classifiers, including, COOT, Moth Flame Optimization (MFO), Jaya Algorithm (JA), Cat and Swarm Optimization (CSO), Grasshopper Optimization Algorithm (GOA), AOA and BES, respectively. Furthermore, it was measured with respect to Accuracy, NPV, FPR and other measures. The sample audio, and preprocessed audio is displayed in Fig. 2. The multi-modal objective function is considered. The relevant audio and pre-processed audio are given below.
5.2 Dataset Description “We release the MUStARD dataset which is a multimodal video corpus for research in automated sarcasm discovery. The dataset is compiled from popular TV shows including Friends, The Golden Girls, The Big Bang Theory, and Sarcasmaholics Anonymous. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each utterance is accompanied by its context, which provides additional information on the scenario where the utterance occurs.”
192
D. Bavkar et al.
5.3 Validate the Performance of the HROMSD Regarding Positive Measure The metrics like Sensitivity, Precision, Accuracy and Specificity of HROMSD was assessed and calculated with other approaches such as COOT, MFO, JA, CSO, GOA, AOA and BES. The comparative findings are presented in Fig. 3. The positive measure ought to be high in order to deliver improved results. At the 90% of learning percentage, the HROMSD generated the specificity of 93.84%, in contrast the COOT, MFO, JA, CSO, GOA, AOA and BES yielded the specificity of 80.66%, 84.35%, 65.04%, 78.85%, 71.79%, 80.66% and 71.79%, respectively. This implies that the other standard approaches for sarcasm detection is incompetent over the HROMSD.
(a)
(b)
(c)
(d)
Fig. 3 Comparison on HROMSD with other conventional approaches a sensitivity b accuracy c precision d specificity
Deep Hybrid Model with Trained Weights for Multimodal Sarcasm …
193
6 Conclusion This research work has developed a new multimodal detection technology, which was created under the process of four stages, which is preprocessing, feature extraction, feature level fusion and classification. Text, video, and audio are trained as the input of the preprocessing stage, then the preprocessed outcome was trained as an input of the feature extraction phase, here the features are extracted by using some enhanced techniques. Subsequently the extracted features are fed in to the input of feature level fusion process; here the multilevel CCA fusion technique was applied. Finally the feature level fusion outcome was classified using RNN and LSTM classifiers; here the Improved BES (IBES) approach was applied to the LSTM for tuning the weight. The proposed work is more precise when compared to previous studies. It can be explored that the recent developments in representational learning attempt to further improve our result in the suggested work’s future directions. We are interested in investigating multilingual embeddings for effective code-mixed forms.
References 1. Pandey R, Kumar A, Singh JP, Tripathi S (2021) Hybrid attention-based Long Short-Term Memory network for sarcasm identification. Appl Soft Comput 106:107348 2. Kumar A, Narapareddy VT, Srikanth VA, Malapati A, Neti LBM (2020) Sarcasm detection using multi-head attention based bidirectional LSTM. Ieee Access 8:6388–6397 3. Zhang Y, Liu Y, Li Q, Tiwari P, Wang B, Li Y, Pandey HM, Zhang P, Song D (2021) CFN: a complex-valued fuzzy network for sarcasm detection in conversations. IEEE Trans Fuzzy Syst 29(12):3696–3710 4. Razali MS, Halin AA, Ye L, Doraisamy S, Norowi NM (2021) Sarcasm detection using deep learning with contextual features. IEEE Access 9:68609–68618 5. Eke CI, Norman AA, Shuib L (2021) Context-based feature technique for sarcasm identification in benchmark datasets using deep learning and BERT model. IEEE Access. 9:48501–48518 6. Jain D, Kumar A, Garg G (2020) Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN. Appl Soft Comput 91:106198 7. Chia ZL, Ptaszynski M, Masui F, Leliwa G, Wroczynski M (2021) Machine Learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection. Inf Process Manage 58(4):102600 8. Zhu N, Wang Z (2020) The paradox of sarcasm: theory of mind and sarcasm use in adults. Personality Individ Differ 163:110035 9. Banerjee A, Bhattacharjee M, Ghosh K, Chatterjee S (2020) Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media. Multimedia Tools Appl 79(47):35995–36031 10. Potamias RA, Siolas G, Stafylopatis AG (2020) A transformer-based approach to irony and sarcasm detection. Neural Comput Appl 32(23):17309–17320 11. Ren L, Xu B, Lin H, Liu X, Yang L (2020) Sarcasm detection with sentiment semantics enhanced multi-level memory network. Neurocomputing 401:320–326 12. Hiremath BN, Patil MM (2021) Sarcasm detection using cognitive features of visual data by learning model. Expert Syst Appl 184:115476 13. Ren L, Lin H, Xu B, Yang L, Zhang D (2021) Learning to capture contrast in sarcasm with contextual dual-view attention network. Int J Mach Learn Cybern 12(9):2607–2615
194
D. Bavkar et al.
14. Nawaf Hazim B, Al-Dabbagh SS, Esam Matti WM, Naser AS (2016) Face detection and recognition using viola-jones with PCA-LDA and square euclidean distance. (IJACSA) Int J Adv Comput Sci Appl 7(5) 15. Kim D, Seo D, Cho S, Kang P (2018) Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci 16. Chunping C, Yan Zhu L (2021) A semi-supervised deep learning image caption model based on Pseudo Label and N-gram. Int J Approximate Reasoning 131:93–107 17. Lakshmiprabha NS, Majumder S (2012) Face recognition system invariant to plastic surgery. In: 2012 12th international conference on intelligent systems design and applications (ISDA),pp 258–263 18. Chia Ai O, Hariharan M, Sin Chee L (2012) Classification of speech dysfluencies with MFCC and LPCC features. Expert Syst Appl 39(2):2157–2165 19. Ted Kronvall M, Juhlin A, Jakobsson A, Sparse modeling of chroma features. Signal Process 130:105–117 20. Kavitha M, Gayathri R, Alenezi F (2022) Performance evaluation of deep e-CNN with integrated spatial-spectral features in hyperspectral image classification. Measurement 191 21. Gill HS, Khehra BS (2022) An integrated approach using CNN-RNN-LSTM for classification of fruit images. Mater Today: Proc 51:591–595 22. Ji J, Chen B, Jiang H (2020) Fully-connected LSTM–CRF on medical concept extraction. Int J Mach Learn Cybern 11(9):1971–1979 23. Alsattar HA, Zaidan AA, Zaidan BB (2020) Novel meta-heuristic bald eagle search optimisation algorithm. Artif Intell Rev 53(3):2237–2264 24. Furqan M, Hartono H, Ongko E, Ikhsan M (2017) Performance of arithmetic crossover and heuristic crossover in genetic algorithm based on alpha parameter. IOSR J Comput Eng (IOSRJCE) 19(1):31–36 25. https://github.com/soujanyaporia/MUStARD
InsuraChain: A Blockchain-Based Parametric Health Insurance Platform Soham Panini, Ashish Anand, Rithika Pai, V. Parimala, and Shruti Jadon
Abstract The healthcare sector has emerged a lot in the last few decades. From making the process fast and digital to deliver the requirements in good quality, this industry is growing on a day-to-day basis. But one issue is still unresolved: the timeconsuming process of insurance claims. This research study attempts to develop a solution to reduce the time-consuming process of claiming insurance in healthcare systems. This study begins by exploring the evolution of the healthcare industry, then explores the InsuraChain platform’s architecture, and finally presents the results that have been achieved by implementing the blockchain-based parametric health insurance platform. The purpose of InsuraChain is to streamline the claim process and reduce the required retrieval time by utilizing smart contracts. Moreover, this innovative platform provides users with the ability to access and share their Electronic Medical Records (EMRs) or Electronic Health Records (EHRs) with any doctor across the world, to result in a more accurate diagnosis. The use of blockchain technology ensures the security and privacy of medical records. Keywords EMR/EHR · IPFS · Smart contracts · Insurtech
Soham Panini, Ashish Anand—These authors contributed equally to this work. S. Panini (B) · A. Anand · R. Pai · V. Parimala Computer Science Engineering, PES University, Banashankari Stage III, Bangalore, Karnataka 560085, India e-mail: [email protected] S. Jadon PES University, Banashankari Stage III, Bangalore, Karnataka 560085, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_14
195
196
S. Panini et al.
1 Introduction The healthcare industry and the health insurance industry are two industries that are very closely related, since the improvements in healthcare affect the health insurance industry vastly. The healthcare industry has evolved with time from what was known as healthcare 1.0 being doctor-centric to healthcare 4.0 which is now more patientcentric in nature [1–4]. This has led to enhancing the life expectancy of people thereby affecting the health insurance industry in a positive manner. Even though the insurance industry is steadily improving and evolving with time, it has not been able to adapt to all the developments in the healthcare industry [5]. Currently, the ground reality in India is that the people still have a generation-old insurance reimbursement model, which is mostly prevalent in rural or less developed areas and even in urban areas, where the idea of cashless insurance has just begun to settle. The problem with this is that the reimbursement model requires the person to have enough capital on hand whenever required, while the cashless insurances lack transparency and are only available in the selective network of hospitals and for any hospital not in their network even cashless insurance falls back to the reimbursement model. Traditional health insurance has always followed a very long and cumbersome claim process, with the significant drawback being the need for large sums of capital for any major treatments. These insurances depend on medical bills to be submitted, and then, the process of reimbursement will be started based on their policies. This presents the users with the challenge of first procuring the large capital on the spot to begin the treatment and then at the end of all medical procedures carefully submitting all the bills so that they can be reviewed by an auditor who will have the final say regarding the amount to be reimbursed. The auditor’s job is to reduce the claim amount as much as possible and this is combined with long and ambiguous insurance policies with intentional loopholes which are designed to deny the claim as much as possible, defeats the entire purpose of an insurance policy, which is to give its holders’ surety and monetary stability in their time of need. Cashless health insurances are quite new and are usually present in the urban parts of India. While these do address some of the problems with traditional insurance like the availability of capital, they too have problems of their own. Often cashless insurances have a concept of capping the maximum amount that can be spent on a particular procedure or facility, for example, the maximum room charges allowed might be capped at a low amount. This leads to the situation where only a portion of the charges is covered by the policy and the rest is billed to the user. This is where InsuraChain comes into the picture. It modernizes the insurance platform by using cutting-edge technologies like blockchain, cryptocurrencies, smart contracts, etc. to ensure transparency. The main objective of this platform is to solve the issues of partial claim approval and capital procurement while streamlining the entire insurance claiming process.
InsuraChain: A Blockchain-Based Parametric Health Insurance Platform
197
1.1 Research Contribution The following are the paper’s main research contributions: • Proposed a blockchain-based parametric health insurance platform that provides an effective and viable alternative to traditional health insurance. • Many smart contracts have been developed and implemented for this platform. • IPFS has been integrated with the platform for digitizing prescriptions into EMR/ EHRs.
2 Related Works Blockchain offers transparency and auditability enabling distributed trust among participating peers [6]. The use of smart contracts in blockchain also reduces operation and maintenance cost and thus improves processing time [7]. Bodkhe et al. [8] have presented a framework “BloHosT” where the authors use the decentrality of the blockchain to allow tourists to talk with various stakeholders such as banks, travel agencies, airports, railways, hotels, and local taxis. This happens through a single wallet identifier linked with a cryptocurrency server to initiate payments. Yu et al. [9] proposed a multi-role healthcare data-sharing system based on blockchain, where the authors tried to reduce the storage cost by using the collaborative storage of blockchain. The authors also tried to design a smart contract on insurance to help patients achieve automatic insurance claims, but the authors have not implemented the idea, so it is not clear how effective is the architecture. Vora et al. [10] have suggested a practical method for maintaining the identity and safeguarding the privacy of clinical data by utilizing a strong encryption algorithm. The authors have also covered an authorization framework that makes use of access with different levels of access. Xiaomin Du et al. [11] work on a hierarchical theoretical framework that reveals the impact of smart health care on blockchain and finally develop an application system based on stakeholder theory. The prominent cryptocurrency Bitcoin was first created using blockchain by an unidentified person named Satoshi Nakamoto. With more investigation and advancements, blockchain application cases have expanded to a wide range of other industries, with health care being one of them. Both electronic medical records (EMRs) and electronic health records (EHRs) [12, 13] can now be embedded with blockchain. Purshottam Purswani talks about the idea of blockchain-based parametric health insurance [14]. In parametric insurance, a certain sum is paid based on a certain threshold of the disease being crossed. The sum is calculated based on the predicted loss predicted by the model. The claim procedure will be improved with parametric insurance, which is currently hampered by disagreements and prolonged settlement times. Rajesh Gupta et al. [15] showcase the idea of VAHAK: A Blockchain-based Outdoor delivery Scheme where they use IPFS for storing all the data such as medical supplies, user info. Parimal Mehta et al. [16] talk about the security issues present in
198
S. Panini et al.
5G-enabled UAV networks and tried integrating it with blockchain. Randhir Kumar et al. [17] provided an IPFS-based blockchain storage model to address the issues of transaction storage and block-specific transaction access. Under this proposed storage model, miners store transactions on the IPFS distribute file system and then insert the returned IPFS hash of the transaction into a blockchain block. Hossein et al. [18] propose a Blockchain-Based privacy-preserving Healthcare Architecture where the authors store the hash of patients’ healthcare data into the blockchain. Rajesh Gupta et al. [19] propose a framework named HaBiTs (blockchain-based secure and flawless interoperable telesurgery) where security can be achieved with interoperability by smart contracts. Jaideep Gera et al. [20] discuss fake claims and assertions that are illegally altered by the competent authority. The authors use blockchain technology to create an insurance app as a solution to this issue. Every transaction is cryptographically authenticated and recorded in the blockchain as a set of blocks. This strategy protects claim transactions and thwarts any efforts at fraud. Thenmozhi [21] presents a system for processing health insurance claims that integrate entities like patients and hospitals. The amount is transferred using direct communication between the smart contracts and the Metamask account. Kabra et al. [22] suggested a framework called “MudraChain” for automatic check clearing, where clearing procedures are managed by the blockchain network. In order to make the blockchain-based system secure and impregnable among participating financial players, the authors also discuss a multi-level authentication scheme. Also, this scheme makes use of blockchain to transmit money from payer to payee. Vora et al. [23] have proposed a blockchain-based architecture for effective EHR storage and maintenance. As a result, patients’ private information is protected while allowing for secure and effective access to medical data by patients, providers, and third parties. The above review of related works strengthens the argument that blockchain can be beneficial for parametric health insurance claims’ processing. In addition, the review clearly shows that there is a dearth of works that do not address blockchain, insurance, and health care together. The novelty of this paper is in the approach to combining all these sectors into one platform InsuraChain.
3 Systems in Place 3.1 Traditional Insurance Traditionally, insurances follow the reimbursement model in which the insured person is supposed to have enough capital on hand to avail of any services he/she requires, and this capital is then reimbursed by the insurance company. This reimbursement amount may be the entire amount spent by the insured person or it may be only some part of the total amount depending on the policies, and this presents a dual problem of first procuring the capital and then getting the reimbursements which
InsuraChain: A Blockchain-Based Parametric Health Insurance Platform
199
may not even be the full amount. The reimbursement model also attracts a lot of fraudulent activities like rejecting valid claims or only partially accepting the claims for the benefit of the insurance company. This type of insurance model is not just limited to health insurance, but also extends to other insurances like car insurance [24]. This model has existed for a long time and has often proven to be beneficial to the insurance companies rather than the insured people as companies can design their policies to reject as many claims as possible when the time for reimbursement comes thereby defeating the main purpose of insurance which is to give the insured person a sense of security in his/her time of need.
3.2 Cashless Insurance The way cashless insurances work is that they give the insured person a card to use at the hospital for availing of any kind of services at the hospital. The bill is settled between the hospital and the insurance company directly and the insured person only has to pay for anything beyond the coverage of the insurance. While this looks good on the surface, the real problem is again partial claims, which many insurance companies provide. These companies have a set amount that the user can spend from their coverage on a particular service, for example, a bed in a hospital might only be covered for 3000 per day, and if the bed taken by the user exceeds that amount, the claim for this service may either be rejected or partially fulfilled. Also, with these insurances, often the card only works for a network of hospitals, and for any other hospital, the insurance again falls back to the reimbursement model. This once again defeats the main purpose of health insurance coverage which is to provide its users with a sense of security that they will have enough capital for their medical expenses in their time of need.
4 Proposed System InsuraChain solves this issue of partial claim approval by adopting the concept of parametric health insurance. In parametric health insurance, the insured person is guaranteed a sum of money for a particular disease based on its severity. So by using this concept, InsuraChain adopts a “payment slab” approach where diseases that incur similar costs on the individual are grouped together in a single payment slab, and for any disease present in the slab, the insured person gets the fixed slab amount as his/her insurance claim. The problem of unfair claim rejections is addressed by InsuraChain by using the transparent nature of smart contracts and giving proper access to information with respect to claim acceptance and rejections. Once a smart contract is deployed, it can be viewed by anyone and cannot be altered. This means that before accepting the terms and conditions of any insurance plan offered by InsuraChain, the users
200
S. Panini et al.
can clearly view the conditions in which their claim will be accepted or rejected, ensuring that the root cause of claim acceptance/rejection of the claim is always known to the user. This, however, is not the case with traditional insurance as they often try to complicate their policies by using ambiguous language to hide intentional loopholes, which can be used for rejecting claims unfairly. Since all the transactions happening on the InsuraChain platform are made with “InsuraCoin”, which is a custom cryptocurrency, all the transactions are tracked, and anyone can view the transaction on the block, which cannot be altered to hide any transactions. The patient can use Metamask to view the entire history of the transactions done on the InsuraChain platform. On the developer side, transactions can be tracked by using Ganache. However, this is a major challenge with traditional and cashless insurance as the fraudulent transactions may be hidden.
4.1 Comparison with Systems in Place • InsuraChain allows for all transactions to be tracked by simply entering the walletid on the Ethereum main net, whereas there is no way to track transactions with traditional insurances even cashless insurances provide very minimal tracking. • The claims are instantaneous and hassle-free in the case of InsuraChain and in the case of cashless model, whereas in the case of traditional insurances the claim process is long and tedious. • The traditional insurance model follows the reimbursement model of insurance which is very capital-intensive for the user. The same is the case with the cashless model whenever a hospital outside its network is concerned, and this however is not an issue for InsuraChain as the user can claim for the prescription instantaneously to receive the sum in their Metamask wallet, which can then be used to settle the bill. • InsuraChain provides its users the facility of EMR/EHRs which are electronic health/medical records. These can be used for proper diagnosis of the users by showing the entire medical history to any doctor they choose to visit. This feature is not available in any other model. Table 1 Comparison of InsuraChain with systems in place Insurance methods Parameters
Traditional
Cashless
InsuraChain
Tracked transactions
No
No
Yes
Fast and easy claim
No
Yes
Yes
Reimbursement required
Yes
Sometimes
No
EMR/EHR feature
No
No
Yes
InsuraChain: A Blockchain-Based Parametric Health Insurance Platform
201
5 Architecture The architecture design displayed in Fig. 1 is the representation of components that fall under the proposed mechanism “InsuraChain”. The architecture is divided into two parts, namely, web user interface and blockchain. The first component is the web user interface, which is further subdivided into two parts, i.e., the patient side and the hospital side. The patient-side web user interface allows patients to register on the platform. It allows patients to view all their prescriptions and claim their insurance on any prescription. The patient-side web user interface has an integrated shop from where the patients can buy InsuraCoins so that all transactions happen in only one currency. The hospital-side web interface allows the clinical assistants to fill in patients’ details and also upload prescriptions on the portal. The second component is the blockchain, which is further subdivided into two parts, i.e., the smart contracts and IPFS. The list of smart contracts used and their functions are listed in Table 2. The IPFS is mainly responsible for storing the prescription and fetching the prescription. It mainly does this by using a peer-to-peer (P2P) network model for sharing a prescription in a distributed manner across nodes. After being divided into smaller pieces and scrambled, files are then stored over a network of computers. It recreates the original file when the components are put together based on their hash values.
Fig. 1 System architecture
202
S. Panini et al.
Table 2 Smart contract and its functions Smart contract
Functions
Facilitator
It stores all the custom-defined data structures
Handler
It interacts with the blockchain
Shop
Allows users to trade InsuraCoin
InsuraCoin
Responsible for minting the InsuraCoin
Add
Adds data to blockchain using handler
Fetch
Fetches data from blockchain using handler
Payment
Handles all the insurance policies and their payments
Settlement
Claim verification and claim settlement
6 Implementation InsuraChain was built by dividing the entire platform into three functional subsections: IPFS, smart contracts, and the web user interface. This division can also be seen in the architectural diagram, which is depicted in Fig. 1, as the functional components IPFS and smart contracts are part of the blockchain component, and the web interface is a component on its own. This division simplifies the implementation process and enables parallel development of the components.
6.1 IPFS The main task of the IPFS section is to store the prescription as an electronic medical/ health record or EMR/EHR. IPFS has been used for storing EMRs as it provides decentrality along with enhanced security. Storing the EMRs in a decentralized manner reduces the possibility of a data leak and at the same time ensures the privacy of the medical history [25] for all the users. In order to implement IPFs, there are usually two routes of development, which are either to implement IPFS from scratch using kubo the original IPFS implementation or by using a IPFS service provider like Infura. InsuraChain uses Infura’s IPFS service in order to achieve better load time for the EMRs. IPFS generates a unique hash for every file that is uploaded, and this is known as the CID or content identifier. InsuraChain maps this CID with the wallet address of each patient where the wallet-id is the key and the CIDs of all the prescriptions for the patient are the values, i.e., the map is of the form “wallet-id: array-of-CIDs [index]”. In order to create this map, InsuraChain has a separate interface where the hospitals are required to put every customer’s wallet-id and upload prescriptions one at a time. These prescriptions are then uploaded to the node server which uses the “ipfs-http-client” api to connect to Infura’s IPFS architecture and then uploads the prescription there, which returns the unique CID. Once this CID is obtained, the node
InsuraChain: A Blockchain-Based Parametric Health Insurance Platform
203
server deletes its local copy of the prescription and uses the wallet-id as the key, to push the CID into the map. This map is stored in the contract space of the blockchain as it uses a smart contract for this mapping and can be fetched simply by getting the wallet-id of each patient.
6.2 Smart Contracts Simply put, smart contracts are blockchain-based algorithms that execute when certain criteria are met. They are often used to automate the implementation of an agreement so that all parties can be certain of the conclusion right away, without the need for an intermediary or additional delay. They can also automate a workflow such that when circumstances are met, some pre-defined and agreed action is executed. The power of smart contracts has been leveraged for the tasks mentioned in Table 2. Now since smart contracts cannot be altered once deployed and every condition in smart contracts is clearly stated, for example, each payment slab in the “Web Interface” corresponds to a condition being enforced in the smart contract. InsuraChain provides the users with transparency about the claiming conditions and gives them the assurance that all legitimate claims will go through since the entire process is automated. The handler is a smart contract that is deployed on the blockchain. The handler contract basically contains the mapping of wallet address, PlanID, and subscriptions which helps in keeping track of all the ongoing subscriptions linked to a patient’s wallet address. The add contract uses an interface by which it is able to access the handler contract mappings to add data onto those mapping variables which are then stored in the contract’s space of blockchain.
6.3 Web Interface In InsuraChain, the authors have developed a web interface that is responsible for integrating the various parts of the project, starting from hospital-side IPFS to the client-side website. Different pages were created based on the user’s role. Various functionalities were made available to the user, to interact with the platform InsuraChain. The user just has to press the buttons and all the computation-intensive work would be carried out in the background totally oblivious to the user within seconds.
204
S. Panini et al.
6.4 Flow of Events InsuraChain as a platform is simple to use and emphasizes its user-friendliness through its simple flow of events which is depicted in Fig. 2. As shown in the diagram, the flow of events is as follows: • When the user first comes to the platform, a sign-up using Metamask would be required. If the user does not have a Metamask account, the platform will throw an error saying “please install metamask”. • If the user already exists, they can just sign in using their own Metamask account. • Then, the user is redirected to the home screen where the user can see list of prescriptions and the current subscription plan they are subscribed to. • Now, the user is required to buy “InsuraCoins” from the shop as all transactions on the platform are settled in the platform’s custom cryptocurrency—InsuraCoin [26].
Fig. 2 Flow of events
InsuraChain: A Blockchain-Based Parametric Health Insurance Platform
205
• They see a list of plans where they can now decide which plan suits their needs the best and complete its payment in “InsuraCoins” which they can trade for at any time. • Based on the subscription time period of the plan, the user then has to make all his payments accordingly, and then when the time for claiming the insurance comes, they select the prescription from their list of prescriptions and start the claiming process against it. • If the claim is verified, then the fixed slab amount, mentioned in the plan list, in its entirety is transferred to the user. • For any other reason, if the premiums are not paid, claims have been done more than once on the same prescription or no plan selected, the claim would be rejected. • The users can also simply view their own prescriptions which the hospitals themselves upload before handing them the physical copy and these EMRs can then be shown to any doctor the user chooses to visit in the future for a better-informed diagnosis.
7 Results and Discussion A crucial aspect of society is that health care is always changing to fulfill the demands of both patients and healthcare practitioners. Blockchain technology and its potential to transform health care have received a lot of attention in recent years [27]. This paper talks about providing an efficient solution for a blockchain-induced healthcare insurance system. Blockchain facilitates the verification and traceability of multi-step transactions that require such functions. It can speed up data-standardized processes and reduce regulatory burdens. Blockchain technology provides secure and tamperproof storage of data, making it more difficult for sensitive information to be hacked or leaked. The elimination of intermediaries and manual processes can reduce the costs associated with healthcare insurance. Keeping this in mind, the authors have designed a platform to provide the requirements and it is done by making a platform called InsuraChain. The claims made in this application are settled instantly by computing the smart contracts written for the purpose. By automating many manual processes, the time taken to process claims can be reduced [28]. The user links his/ her wallet while signing in/logging in. After choosing the suitable insurance plan, the payment is done through “InsuraCoin”. Every transaction on this platform is done using this coin. The user can mint and trade as many coins as they wish. Insurance can be claimed by selecting a particular prescription and starting the process. Once verified, the amount to be claimed is transferred to the user. This platform also helps the user view their medical records as well. For the development of InsuraChain, ReactJs has been used to develop the interface of this platform and IPFS has been used to store the prescription. By obtaining data from several peers at once, storing and sharing data with IPFS reduce bandwidth usage. The fastest and most effective approach to send content to a user is for IPFS to retrieve data based on the CID of a user’s request from several nodes at once. The authors have written
206
S. Panini et al.
several smart contracts to provide features like storing data, interacting with the blockchain, allowing users to mint and trade InsuraCoin, fetching and adding data to the blockchain, handling insurance policies, and settling claim verifications. The user only needs to worry about selecting the right plan and submitting it. Healthcare insurance is a highly regulated industry, and it may take time for regulators to fully understand and embrace blockchain technology. Building and maintaining a blockchain-based system require specialized technical knowledge, which may be difficult for some healthcare insurance companies to find or afford. In conclusion, the use of blockchain technology in healthcare insurance has the potential to revolutionize the industry by increasing transparency, security [29], and efficiency, but there are also challenges that must be overcome in order for it to be widely adopted.
8 Future Works As technology is ever-evolving, there are always some innovations coming in every day. By leveraging these advancements in technology, even InsuraChain can grow into a better version of itself. Below the authors have discussed a few potential areas for future work that might enhance the functionality of InsuraChain.
8.1 Digital Currency CBDC or Central Bank Digital Currencies are like cryptocurrencies with the major difference being that they are issued by the central bank of a nation and thereby have their value backed by the government of the country. This reduces the volatility of the cryptocurrency and makes it easier to implement its use in day-to-day life. By incorporating CBDC with InsuraChain, it can increase its acceptance by the hospital as CBDC has the same value as fiat money of the nation.
8.2 Image Recognition Image recognition technologies have been evolving with time and they can now be combined with other AI/ML models which can be used by InsuraChain for better claim verification directly from the image of the prescription. This will further enhance the functionality of InsuraChain and make it an even more powerful platform.
InsuraChain: A Blockchain-Based Parametric Health Insurance Platform
207
9 Conclusion InsuraChain as a platform brings the healthcare, insurance, and blockchain industry together combining them to create an application that can help people get better medical diagnosis and instant health insurance claims while maintaining transparency, security, and privacy of all its users. It tries to break the paradigm of traditional insurance and addresses its issues like partial claim settlements and unlawful claim rejections. By doing all these, InsuraChain provides a viable alternative to current health insurance and tries to fulfill the main purpose of insurance which is to give its users’ mental peace in their time of need. However, there are a few practical limitations, namely the initial setup and onboarding of various hospitals are difficult, it is assumed that doctors do not indulge in fraudulent practices by uploading wrong/ outdated prescriptions, and currently, there are no standard protocols to communicate with other systems. This could make it difficult to share patient data between different healthcare providers or insurance companies. Even with these limitations, InsuraChain platform brings with itself a new approach to health insurance by approaching health insurance in a parametric fashion. This guarantees its users a fixed amount according to the payment slabs of their selected plan. This gives the users a certainty of a fixed amount in their wallets whenever their claim is valid. Also, by using smart contracts for implementing policies, the terms and conditions for every approval and rejection are clearly visible at any time and are unalterable once deployed. This helps with building trust for the platform.
References 1. Hathaliya J, Tanwar S, Tyagi S, Kumar N (2019) Securing electronics healthcare records in healthcare 4.0: a biometric-based approach. Comput Elect Eng 76. https://doi.org/10.1016/j. compeleceng.2019.04.017 2. Hathaliya J, Sharma P, Tanwar S, Gupta R (2019) Blockchain-based remote patient monitoring in healthcare 4.0. In: 2019 IEEE 9th international conference on advanced computing (IACC), pp 87–91. https://doi.org/10.1109/IACC48062.2019.8971593 3. Hathaliya J, Tanwar S (2020) An exhaustive survey on security and privacy issues in healthcare 4.0. Comput Commun 153. https://doi.org/10.1016/j.comcom.2020.02.018 4. Kumari A, Tanwar S, Tyagi S, Kumar N (2018) Fog computing for health- care 4.0 environment: Opportunities and challenges. Comput Electr Eng 72:1–13. https://doi.org/10.1016/j.compel eceng.2018.08.015 5. Gatteschi V, Lamberti F, Demartini C, Pranteda C, Santamarıa V (2018) Blockchain and smart contracts for insurance: Is the technology mature enough? Future Internet 10(2). https://doi. org/10.3390/fi10020020 6. Gatteschi V, Lamberti F, Demartini C, Pranteda C, Santamarıa V (2018) To blockchain or not to blockchain: that is the question. IT Professional 20(2):62–74. https://doi.org/10.1109/MITP. 2018.021921652 7. Zhang P, Walker MA, White J, Schmidt DC, Lenz G (2017) Metrics for assessing blockchainbased healthcare decentralized apps. In: 2017 IEEE 19th international conference on e-health
208
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
S. Panini et al. networking, applications and services (Healthcom), pp 1–4. https://doi.org/10.1109/Health Com.2017.8210842 Bodkhe U, Bhattacharya P, Tanwar S, Tyagi S, Kumar N, Obadidat MS (2019) BloHosT: blockchain enabled smart tourism and hospitality management. In: 2019 international conference on computer, information and telecommunication systems (CITS), pp 1–5. https://doi. org/10.1109/CITS.2019.8862001 Yu Y, Li Q, Zhang Q, Hu W, Liu S (2021) Blockchain-based multi-role healthcare data sharing system. In: 2020 IEEE international conference on e-health networking, application services (HEALTHCOM), pp 1–6. https://doi.org/10.1109/HEALTHCOM49281.2021.9399028 Vora J, Italiya P, Tanwar S, Tyagi S, Kumar N, Obaidat MS, Hsiao KF (2018) Ensuring privacy and security in e- health records. In: 2018 International conference on computer, information and telecommunication systems (CITS), pp 1–5. https://doi.org/10.1109/CITS.2018.8440164 Du X, Chen B, Ma M, Zhang Y (2021) Research on the application of blockchain in smart healthcare: constructing a hierarchical framework. J Healthc Eng 1–13. https://doi.org/10.1155/ 2021/6698122 Mishra AS (2021) Study on blockchain-based healthcare insurance claim system. In: 2021 Asian conference on innovation in technology (ASIAN- CON), pp 1–4. https://doi.org/10. 1109/ASIANCON51346.2021.9544892 Azaria A, Ekblaw A, Vieira T, Lippman A (2016) Medrec: using blockchain for medical data access and permission management. In: 2016 2nd international conference on open and big data (OBD), pp 25–30 (2016). https://doi.org/10.1109/OBD.2016.11 Purswani P (2021) Blockchain-based parametric health insurance. In: 2021 IEEE symposium on industrial electronics applications (ISIEA), pp 1–5. https://doi.org/10.1109/ISIEA51897. 2021.9510001 Gupta R, Shukla A, Mehta P, Bhattacharya P, Tanwar S, Tyagi S, Kumar N (2020) Vahak: A blockchain-based outdoor delivery scheme using UAV for healthcare 4.0 services. In: IEEE INFOCOM 2020—IEEE conference on computer communications workshops (INFOCOM WKSHPS), pp 255–260. https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9162738 Mehta P, Gupta R, Tanwar S (2020) Blockchain envisioned UAV networks: challenges, solutions, and comparisons. Comput Commun 151. https://doi.org/10.1016/j.comcom.2020. 01.023 Kumar R, Tripathi R (2019) Implementation of distributed file storage and access framework using IPFS and blockchain. In: 2019 fifth international conference on image information processing (ICIIP), pp 246–251. https://doi.org/10.1109/ICIIP47207.2019.8985677 Hossein KM, Esmaeili ME, Dargahi T, khonsari A (2019) Blockchain based privacy-preserving healthcare architecture. In: 2019 IEEE Canadian conference of electrical and computer engineering (CCECE), pp 1–4. https://doi.org/10.1109/CCECE.2019.8861857 Gupta R, Tanwar S, Tyagi S, Kumar N Obaidat MS, Sadoun B (2019) Habits: Blockchainbased telesurgery framework for healthcare 4.0. In: 2019 international conference on computer, information and telecommunication systems (CITS), pp 1–5. https://doi.org/10.1109/CITS. 2019.8862127 Gera J, Palakayala AR, Rejeti VKK, Anusha T (2020) Blockchain technology for fraudulent practices in insurance claim process. In: 2020 5th international conference on communication and electronics systems (ICCES), pp 1068–1075. https://doi.org/10.1109/ICCES48766.2020. 9138012 Thenmozhi M, Ranganayakulu D, Geetha S, Valli R (2021) Implementing blockchain technologies for health insurance claim processing in hospitals. Mater Today: Proc. https://doi.org/ 10.1016/j.matpr.2021.02.776 Kabra N, Bhattacharya P, Tanwar S, Tyagi S (2019) Mudrachain: blockchain-based framework for automated cheque clearance in financial institutions. Futur Gener Comput Syst. https://doi. org/10.1016/j.future.2019.08.035 Vora J, Nayyar A, Tanwar S, Tyagi S, Kumar N, Obaidat MS, Rodrigues JJPC (2018) Bheem: A blockchain-based framework for securing electronic health records. In: 2018 IEEE Globecom Workshops (GC Wkshps), pp 1–6. https://doi.org/10.1109/GLOCOMW.2018.8644088
InsuraChain: A Blockchain-Based Parametric Health Insurance Platform
209
24. Khanji S, Iqbal F, Maamar Z, Hacid H (2019) Boosting IoT efficiency and security through blockchain: Blockchain-based car insurance process—a case study. In: 2019 4th international conference on system reliability and safety (ICSRS), pp 86–93. https://doi.org/10.1109/ICS RS48664.2019.8987641 25. Vora J, DevMurari P, Tanwar S, Tyagi S, Kumar N, Obaidat MS (2018) Blind signatures based secured e-healthcare system. In: 2018 international conference on computer, information and telecommunication systems (CITS), pp 1–5. https://doi.org/10.1109/CITS.2018.8440186 26. Chang Z (2020) Research of medical insurance based on the combination of blockchain and credit technology. In: 2020 Asia-Pacific conference on image processing, electronics and computers (IPEC), pp 428–430. https://doi.org/10.1109/IPEC49694.2020.9115161 27. https://www.investopedia.com/terms/b/blockchain.asp 28. Bhamidipati NR, Vakkavanthula V, Stafford G, Dahir M, Neupane R, Bonnah E, Wang S, Murthy JVR, Hoque KA, Calyam P (2021) Claimchain: secure blockchain platform for handling insurance claims processing. In: 2021 IEEE international conference on blockchain (Blockchain), pp 55–64. https://doi.org/10.1109/Blockchain53845.2021.00019 29. Tanwar S, Bhatia Q, Patel P, Kumari A, Singh PK, Hong W-C (2020) Machine learning adoption in blockchain-based smart applications: the challenges, and a way forward. IEEE Access 8:474– 488. https://doi.org/10.1109/ACCESS.2019.2961372
Automating Audio Attack Vectors Chaitanya Singh, Mallika Sirdeshpande, Mohammed Haris, and Animesh Giri
Abstract Over the years, speech recognition voice-assistant systems have become better in performance and now are commonplace in the daily lives of millions of people. But are they secure? Voice-assistant systems usually have high system privilege and no authentication system in place, giving them access to a huge amount of personal data. Studies conducted in recent years use different technologies and methods such as laser light, machine learning, to exploit the vulnerabilities of these systems, but these attacks have been carried out in laboratory environments with bulky equipment. In this paper, we build a portable and easy-to-use proof-of-concept tool to automate two audio-based attack vectors. We then proceed to conduct tests of our tool under varying metrics to determine its efficacy under real-life like conditions. Finally, we conclude by analysing our results and highlighting design changes to mitigate these types of attacks. Keywords Cybersecurity · Internet of Things · Proof of concept · Laser attack · Psychoacoustic attack
1 Introduction With the popularity of smartphones and smart home devices in the last decade, voice-assistant systems have become a technology that millions of people interact with daily. But as this technology becomes more common, new vulnerabilities, and exploits are also being discovered. C. Singh (B) · M. Sirdeshpande · Mohammed Haris · A. Giri PES University, Bengaluru, India e-mail: [email protected] M. Sirdeshpande e-mail: [email protected] Mohammed Haris e-mail: [email protected] A. Giri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_15
211
212 Table 1 Attack vectors implemented in the PoC tool Attack Description
C. Singh et al.
Attack surface
Laser attack
Transmitting audio commands using laser Microphone hardware light as the medium to the voice-assistant system without user’s knowledge Psychoacoustic hiding Hiding audio commands in everyday Voice-assistant application sounds such that they are not detectable by human ears but the voice-assistant system successfully detects and runs the commands. This is a white-box model attack
Voice-assistant systems have high system privilege, this means that they have access to performing high authority actions and access to massive amounts of personal data of users. This is important to integrate actions to be carried out with voice commands, but this also means that if compromised, the adversary then has control over all those actions and personal data. The objective of our paper was to test the efficacy of these attacks in real-life scenarios and to provide any mitigations or defences against these attacks. In order to conduct an analysis, it was first needed to find a way to carry out these attacks successfully and reliably. There was no prior device which consolidated any audiobased attack vectors into a portable and easy-to-use form. Therefore, this paper automates two audio attacks (Sect. 3) as a proof-of-concept (PoC) for a consolidated tool to automate audio attack vectors. Table 1 gives a brief overview of the attacks that were chosen and implemented after extensive research. The attacks were then carried out under varying real-life like test metrics to determine its efficacy and limits (Sect. 4). The results and its analysis also give an insight into what defences can be implemented to mitigate these attacks.
2 Related Work Voice-assistant smart devices are commonplace in most households today. These devices do not have sufficient authentication procedures in place and can allow for voice commands to be executed by unauthorised users. All the attack vectors in the literature exploit this vulnerability. After an extensive study of the literature, multiple attack vectors that fall under different approaches were tabulated, which are briefly discussed as followed. The two attack surfaces were the hardware MEMS microphone architecture which is used widely in voice-assistant systems, and the software of the voice-assistant systems. Under attacks targeting the hardware, “Light Commands” [1] attempts to achieve a remote command injection attack on voice-assistants using modulated laser as the
Automating Audio Attack Vectors
213
medium. MEMS microphones react in the same manner to a light signal and an audio signal. They have a membrane which acts as a capacitor. When hit with an audio signal, the variable sound pressure vibrates the membrane to generate a variation in the capacitance. Application-specific integrated circuits (ASICs) on the microphone board convert the change in output current generated by this variation to an analogue or a digital signal. Incidentally, the same effect is achieved by targeting any changing electromagnetic field on the microphone’s aperture, such as a modulated laser. Another similar approach was the use of ultrasound as the medium. In the “Dolphin Attack” [2], a voice control signal is amplitude modulated on top of a carrier frequency just out of hearing range of humans and played near a voice-assistant. Two types of implementations were taken into consideration. The first one utilised a signal generator to generate the carrier wave, an audio player as the signal source and an ultrasonic speaker to transmit the modulated signal. The second, and more portable, set-up involved using a Samsung Galaxy S6 Edge for both modulation and signal generation, and an ultrasonic transducer connected to the audio jack through an amplifier for transmission. Similarly, “Surfing Attack” [3] used solid as the medium. It achieved it by using a piezoelectric transducer to generate ultrasonic guided waves by creating minute vibrations of the propagation solid medium called lamb waves, which differ in characteristics when compared to ultrasonic waves in air. The attack was tested with aluminium, glass, and medium density fibreboard, the farthest test distance being 30 ft using aluminium. Another exploit was using electromagnetic interference [4], exploiting the vulnerability of nonlinearity of MEMS microphone working as an AM demodulator for signals with carrier frequencies ranging from 22 to 40 kHz. The attack set-up includes a signal generator, mixer, a vector network analyser for testing. The generated signal was then transmitted over a unidirectional antenna, and its effects on different devices were tested. Under attacks targeting the software of the voice-assistant system, “Psychoacoustic hiding attack” [5] exploits Deep Neural Network (DNN)-based ASR systems to demonstrate that DNNs are susceptible to alterations which allow adversarial attacks. The authors target an Open-Source Automatic Speech Recognition (ASR) “Kaldi” [6]. This is achieved with forced alignment technique to get a desired probability matrix, which is then used to augment the DNN using backpropagation. The generated audio is then psychoacoustically hidden under a carrier audio using a gradient descent algorithm. The results are audio snippets with malicious commands not detectable by human ears. Another approach used the concept of homophones to get similar results, exploiting the phonetic inconsistencies in English. The authors in “Cocaine Noodles” [7] created a threat model where an audio is detected differently by a human and a machine. The authors selected a set of audio commands and then used the MFCC parameters in MATLAB [8] to extract the acoustic features, and then used the inverse MFCC module to mangle those acoustic features. The results are audio snippets containing malicious commands that sounded entirely different to a human evaluator. The authors conducted a survey using “Amazon Mechanical Turk” [9], a commercial service to perform remote tasks by humans, to test their results.
214
C. Singh et al.
Fig. 1 Circuit diagram of the PoC tool
Other software-based attacks were device specific, one such being the development of a spyware app [10] disguised as a popular gaming app to access microphone privileges. The spyware can then record incoming and outgoing calls on the phone stealthily and by using NLP techniques, the architecture of which was not described, it can synthesise the activation keywords. Choosing the right time using data from accelerometer, it can launch attack without unbeknownst to the user. The attack is targeted towards Google Assistant on user’s mobile phone. An analysis of attacks targeting specific smart devices using “Alexa Skills” [11] was also considered. The authors performed analysis of approximately 150,000 Alexa skills available across seven countries at the time of publication for specific vulnerabilities. They found that the attackers can publish skills under the disguise of well-known company names to appear credible. They can make source code changes after approval from Amazon to breach the user’s privacy by directly requesting sensitive information from the user bypassing the API permission model, without disclosing what data they are fetching from the user. Two attacks were selected on to maintain the diversity among attack surfaces and based on their efficacy and feasibility: Light Commands [1] and Psychoacoustic Hiding attack [5].
3 Methodology 3.1 Circuit Design The components for the PoC tool as shown in Fig. 1: 1. Raspberry Pi 4B 8 GB 2. 20,000 mAh Power Bank 3. Amplifier
Automating Audio Attack Vectors
4. 5. 6. 7. 8. 9. 10.
215
8.Ω Speaker 9V Battery ILI9225 TFT LCD screen module for Pi Generic 5mW laser module Joystick ADC MEMS MIC module for Pi.
3.2 PoC Tool The PoC tool utilises Robot Operating System (ROS) [14] to bridge between the various components. Listed below are the important nodes, the different topics they publish or subscribe to, and the various services they offer. The pseudocode is as follows. Figure 2 shows the architecture of the PoC tool. The components and their workings are discussed in detail in further sections. The components marked in yellow represent the various ROS nodes running on the Raspberry Pi, the ones marked in blue represent the physical devices connected to the Raspberry Pi and the ones in orange are the victim devices. The components and their workings are discussed in detail in further sections. Figure 3 depicts the flow of control in the PoC tool, how the user can select, configure, and execute various attacks. The user interaction required to navigate between choices and navigate between menus occurs through the two axis joystick shown in Fig. 1, which comes with a push button to select a choice.
3.3 Laser Attack The laser attack, as implemented in “Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems.” [1], utilises a “Thorlabs LDC205C laser driver to drive a blue Osram PLT5 450 B 450 nm laser” [1]. Our implementation uses a 5mW laser module containing three pins: 5V, GND, and signal. The set-up utilises a Raspberry Pi, with a PiFi DAC v2 to support a sampling frequency of 384 kHz, better than the Pi’s native 44.1 kHz sampling frequency. The laser module connections are as given in Table 2. To check the performance of the attack, the attacker set-up was tested on a victim Raspberry Pi connected to a MEMS microphone, and the transmission was recorded using Linux’s Advanced Linux Sound Architecture (ALSA) [15], which was then plotted on Audacity [16]. The results of this experiment are provided in Sect. 4 of this paper.
216
Fig. 2 Architecture of the PoC tool
Fig. 3 Process flow of the PoC tool
C. Singh et al.
Automating Audio Attack Vectors Table 2 Laser connections
217 Laser module
Raspberry Pi 4B
5V GND Signal
5V (Pin 2) GND (Pin 6) TRS, left from Audio Jack
3.4 Psychoacoustic Attack Psychoacoustic attack implementation is based on the work of “Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding” [5]. Dependencies: • WSJ0 data set [13] is a corpus consisting of text from Wall Street Journal news articles in order to train continuous speech recognition systems. It consists of text from articles that fall under 5000 words. The creation of this corpus was funded by DARPA Spoken Language Program. • Installing Kaldi [6] • Gathering audio samples: In the same manner as the original paper, different audio samples were gathered, chunked into 10 s audio snippets using an Audacity [16]. These involved music playbacks, ambient noises, TV broadcasts, etc. Stored all the audio samples in kaldi/egs/wsj/s5/music folder. • Create a text file with target utterances in kaldi/egs/wsj/s5/targets folder. The pseudocode for training the data happens as follows [5] : • • • •
Stage 0 to Stage 7 : The default Kaldi stages for its nnet2 recipe run. Stage 8 : This stage extracts time features. Stage 9 : This stage trains the Deep Neural Network for the above time features. Stage 10 : This stage decodes the data set which is used to create the first targets. It also maps the music snippets to the desired target text output. • Stage 11 : This stage calculates the Hearing Threshold. • Stage 12 : This stage executes forced alignment and back propagation for 100 steps. The model stops if the malicious audio is successful. • Stage 13 : This stage stores the audio files and tests them against the Kaldi ASR.
4 Results and Discussion 4.1 PoC Tool The PoC tool successfully implements the aforementioned three attacks and can be used to carry out these attacks in real time with different audio commands. This is helpful in learning their attack patterns in order to come up with mitigations.
218
C. Singh et al.
Fig. 4 Difference between audio transmitted on laser and the audio received at the MEMS microphone
4.2 Laser Attack The attack was first executed by playing audio of a voice saying “Hey Siri”, over the laser module held at 30 cm from the victim device, which consisted of an MEMS mic connected to a Raspberry Pi 4B, recording using ALSA [15]. Figure 4 shows the similarity between the audio wave transmitted via laser (a recording of “Hey Siri” in a loop) and the audio recorded by the microphone. The wave form in black is the sound transmitted over the laser, and the wave form in blue is the audio recorded at the microphone. To determine the effects of environmental factors on the laser transmission and thus the transmitted audio, the laser light was tested with varying luminosity, ambient decibel levels, and distance. A sine wave was transmitted via the laser light and used a Tektronix TBS 107B_EDU Oscilloscope to register and save the observations. Sine wave was chosen because of its simplicity, any disturbance or variation in amplitude is easier to detect. For the first test set-up, the effect of variable ambient luminosity and ambient noise was observed. The test was conducted in a closed room with multiple light sources, as well as in an outside sunny environment. The distance was set at 0.3 m for all the tests. Luminosity values ranged between 3 lux and 512 lux for tests conducted within a controlled lab environment, and again at 52,000 to 53,400 lux for tests conducted in an outside environment on a sunny day. Ambient decibel levels ranged between 41.7 and 61dB. Figure 5 shows the trend of amplitude values with a variation of lux values from 3 to 512 lux. Figure 6 shows the trend in variation of lux values from 52,000 to 53,400 lux. It was observed from both the figures that the variation in luminosity has minimal effect on the consistency of amplitude values for the lower lux range, while it has a small effect for the higher lux range. Figure 7 shows the oscilloscope reading of laser transmission at 0.3 m with 53,400 lux and 40 dB. For the next test set-up, the effect of variable distance and luminosity on the laser transmission was observed. The decibel range was set between 52.9 and 56.7 dB.
Automating Audio Attack Vectors
219
Fig. 5 Variation in amplitude with lux at 0.3 m
Fig. 6 Variation in amplitude with lux at high range at 0.3 m
The distance values ranged between 0.3 and 46 m. The luminosity values ranged between 165.3 and 492.8 lux. Figure 8 shows the trend of amplitude values with variation in distance values ranging from 0.3 to 46 m. A dip in amplitude values near 15 m was observed, after which the values are back to the previous range. This behaviour can be attributed to inaccuracy in aiming the laser light on the microphone module. Observation was
220
C. Singh et al.
Fig. 7 Oscilloscope reading at 0.3 m with 53,400 lux
Fig. 8 Variation in amplitude with distance
made that there is minimal change in amplitude values till 35 m, after which there is a small decline. Figure 9 shows the trend of amplitude values with variation in ambient luminosity. The dip at the amplitude value at the beginning corresponds to the readings taken at 15 m which, as inferred previously, could be attributed to inaccuracy in aiming the laser light on the microphone module. There is a small variation in amplitude values, but overall there is no pattern of decline and the values are consistent. Figure 10 shows the oscilloscope reading of laser transmission at 46.42 m with 511 lux and 53.7 dB.
Automating Audio Attack Vectors
221
Fig. 9 Variation in amplitude with Lux for different distances
From the test set-ups, it was observed that ambient decibel levels have a negligible effect on the amplitude of the transmitted laser light. Additionally, it was noted that for distances below 35 m and luminosity readings below 512 lux, there is minimal effect on the laser light. Whereas, for distances above 35 m and luminosity readings near 52,000 lux, a small decline in the amplitude value was observed. Inaccurate aim could contribute to this variation, but the same trend is seen in separate sets of tests, leading to conclude that very high luminosity and distance may affect the amplitude values. Sources of Error There is a considerable dip in amplitude value for readings corresponding to 15 m distance and 162 lux, this can be attributed to inaccuracy in aiming the laser light to the microphone module.
4.3 Psychoacoustic Attack The psychoacoustically modified audio was tested against the compromised Kaldi ASR system. The success rate varied depending on the different audio snippets that were trained, but the tool was able to replicate the implementation of the attack vector. The test metric used was ambient noise. White noise was used to vary the ambient noise levels because of its consistent waveform. The decibel level of the testing environment was 45 dB without any experimentation. Figure 11 shows the results of the modification. Blue waveform is the original untampered audio file and green waveform is the altered audio file which psychoacoustically hides the
222
C. Singh et al.
Fig. 10 Oscilloscope reading at 46 m with 511 lux
Fig. 11 Audio pattern of unmodified, modification, and modified file for music snippet
command “ACTIVATE EMERGENCY BREAK AND LOCK ALL DOORS”. The red waveform is the difference between the two audio files, i.e. the modification made on the original file. The modified audio file music.wav was tested with different volumes of ambient noise. White noise was chosen for the ambient noise, and the control decibel level of the test environment was 45dB. The audio command was “ACTIVATE EMERGENCY BREAK AND LOCK ALL DOORS”. Table 3 shows the results of the test. It was observed that the attack fails for background audio over 60 decibels. This was an expected result, as the modifications done to the original audio were softer in order to psychoacoustically hide in the louder audio file. Hence, it is expected behaviour that the attack fails when the ambient noise overpowers the modified audio. From the tests conducted, it was observed that the trained audio successfully works on the compromised Kaldi ASR system under suitable conditions. Furthermore, it was found that Kaldi can be trained with different audio commands to obtain modified audio files which can carry out the psychoacoustic attack on the system. It is also
Automating Audio Attack Vectors
223
Table 3 Testing modified audio command against compromised Kaldi ASR Audio (dB) BG (dB) Result Audio (dB) BG (dB) 55 65 65 65 70 70 70 75 75
0 0 55 60 0 55 60 0 55
Success Success Success Failure Success Success Failure Success Success
75 80 80 80 85 85 85 85 90
60 0 55 60 0 55 60 65 0
Result Failure Success Success Failure Success Success Success Failure Failure
observed that the success of these attacks is dependent on the pattern of the carrier audio file. Future work can be done with ambient noise of variable waveforms. The psychoacoustic test results also highlight that the attack can be carried out by anyone with relevant physical or cloud resources to train the model. It also highlights one limitation of open source security modules—access to the source code of the security module. The psychoacoustic attack was successful because Kaldi ASR system has its source code accessible to everyone.
5 Conclusion 5.1 PoC Tool The PoC tool is a portable device which successfully automates three different audio attack vectors. Such a device would be crucial in conduction of comparative and analytical studies to understand the attack pattern of different audio attacks in order to come up with defences.
5.2 Laser Attack • The primary challenge in implementing laser attack is acquiring the right equipment, as interfacing the modules is difficult in its absence. • The laser attack has a long range as no significant drop in amplitude was found based on varying distance, with constant ambient luminosity and constant ambient noise levels. • The laser attack is sensitive to the amount of ambient luminosity at very high values, resulting in a small decline.
224
C. Singh et al.
• One mitigation to laser attacks is to block the light from getting to the microphone module. This can either be done by near-opaque dust covers or by not having a direct line of sight to the microphone from any external direction. • Given that the responsible disclosure for the attack has been done back in 2019, some appropriate defences have been put in place and future updates to the hardware might deem the attack obsolete. However, the attack is still dangerous with the existing technology until the hardware being used currently is replaced.
5.3 Psychoacoustic Attack • The primary challenge in implementing psychoacoustic attack was acquiring the data set and the resources to train the model. The availability of cloud resources however does provide an easier path to training the model without the need for a laboratory environment or physical resources. • The psychoacoustic attack is versatile, as different carrier audio can be trained to psychoacoustically hide malicious audio commands. This makes its detection very difficult. However, the efficacy also varies depending on the carrier audio file. • The attack is successful if the average audio levels of the modified file are higher than the average background audio levels of the environment. The attack fails at audio levels above 90dB as the audio is too distorted to register anything. The attack works with a uniform ambient noise of 65dB, but fails at 75 dB. • The attack depends on the compromised Kaldi ASR system. Hence, one mitigation against the attack would be to have checks in place to ensure that the updates are being pulled from the official Kaldi repository. • Another mitigation, though inconvenient, would be to have the Kaldi ASR system inactive when not being used explicitly. • The attacks can be mitigated by implementing a checksum on the Kaldi project. However, the psychoacoustic attack gives an interesting insight on how malicious commands can be hidden in a large variety of seemingly non-malicious audio. The analysis would aid in further research on similar attacks on black-box systems.
6 Future Work • Implementing a wider variety of audio attack vectors in the PoC tool. • Laser Attack – Implementing the attack with a more powerful laser to test the upper range of distance. – Formulating the relationship between the focus of the laser and the maximum range for which the attack works.
Automating Audio Attack Vectors
225
– Formulating the relationship between the amount of ambient light and the minimum power of the laser required to generate the minimum amount of current needed at the voice-assistant microphone for the attack to work. • Psychoacoustic Attack – To test and analyse Kaldi with varying types of carrier audio data in conjunction with varying lengths of audio commands in order to gain more insight. – To test and analyse using different data sets. – To implement a psychoacoustic attack on a black-box voice-assistant system (such as Google Assistant) by reverse engineering its behaviour.
Acknowledgements We thank Prof. Animesh Giri for supervising this paper as a guide throughout, and to CSE Dept PES University for the opportunity and resources to work on the paper. We also thank Assistant Prof. Dhanashree G. Bhate and Assistant Prof. Dr. Kaustav Bhowmick for helping with their domain specific knowledge.
References 1. Sugawara T, Cyr B, Rampazzi S, Genkin D, Fu K (2020) Light commands: laser-based audio injection attacks on voice-controllable systems. In: 29th USENIX security symposium 2. Zhang G, Yan C, Ji X, Zhang T, Zhang T, Xu W (2017) DolphinAttack: inaudible voice commands. In: 2017 ACM SIGSAC conference on computer and communications security 3. Yan Q, Liu K, Zhou Q, Guo H, Zhang N (2020) Surfing attack: interactive hidden attack on voice assistants using ultrasonic guided waves. In: Network and distributed systems security (NDSS) symposium 4. Xu Z, Hua R, Juang J, Xia S, Fan J, Hwang C (2021) Inaudible attack on smart speakers with intentional electromagnetic interference. IEEE Trans Microwave Theory Tech 5. Schönherr L, Kohls K, Zeiler S, Holz T, Kolossa D (2018) Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. arXiv preprint arXiv:1808.05665 6. The Kaldi Project GitHub repository. https://github.com/kaldi-asr/kaldi. Last Accessed 3 Mar 2023 7. Vaidya T, Zhang Y, Sherr M, Shields C (2015) Cocaine noodles: exploiting the gap between Human and machine speech recognition. In: 9th USENIX workshop on offensive technologies 8. MATLAB website. https://www.mathworks.com/products/matlab.html. Last Accessed 3 Mar 2023 9. Amazon Mechanical Turk Service. https://www.mturk.com/. Last Accessed 3 Mar 2023 10. Zhang R, Chen X, Lu, J, Wen S, Nepal S, Xiang Y (2020) Using AI to hack IA: a new stealthy spyware against voice assistance functions in smart phones. arXiv preprint arXiv:1805.06187 11. Lentzsch C, Shah SJ, Andow B, Degeling M, Das A, Enck W (2021) Hey Alexa, is this skill safe? Taking a closer look at the Alexa skill ecosystem. In: 28th Annual network and distributed system security symposium, NDSS 12. Nopnop2002, ILI9225 SPI TFT Library for RaspberryPi/OrangePi GitHub repository. https:// github.com/nopnop2002/Raspberry-ili9225spi. Last Accessed 3 Mar 2023 13. CSR-I (WSJ0) Dataset. https://catalog.ldc.upenn.edu/LDC93s6a. Last Accessed 3 Mar 2023 14. Robot Operating System (ROS) Website. https://www.ros.org/. Last Accessed 3 Mar 2023
226
C. Singh et al.
15. Advanced Linux Sound Architecture (ALSA) Project GitHub repository. https://github.com/ alsa-project/alsa-lib. Last Accessed 3 Mar 2023 16. The Audacity Project GitHub repository. https://github.com/audacity/audacity. Last Accessed 3 Mar 2023
Child Detection by Utilizing Touchscreen Behavioral Biometrics on Mobile Sunil Mane, Kartik Mandhan, Mudit Bapna, and Atharv Terwadkar
Abstract Studies show that young children are exposed to smart devices. Early stages of education, children’s internet safety, and children’s interactions with computers are all significantly impacted by mobile usage. With the significant increase in average daily smartphone use, preschoolers’, school-going children’s, and teenagers’ health is ruptured both mentally and physically. This research presents an extensive point of reference for investigating the operation of artificial neural networks and several machine learning algorithms in postulating the classification problem regarding the age of a smartphone user based on their touch. Hence, our work will become a reliable component to forecast the age range and effectively help parents control the time spent on smartphones by their children. Out of a variety of models that were examined, we propose the top five for simulating the behavioral differences between children and adults when using mobile devices. The first approach is based on using artificial neural networks for classification. Next, we used gradient boosting algorithms like XgBoost classifier and LightGBM classifier. Finally, we also analyzed support vector machines and K-nearest neighbors model. Keywords Age detection · Touch gestures · Human–computer interaction · User behavior · Children · Adults · Age group
S. Mane · K. Mandhan · M. Bapna (B) · A. Terwadkar Department of Computer Engineering, College of Engineering, Pune, India e-mail: [email protected] S. Mane e-mail: [email protected] K. Mandhan e-mail: [email protected] A. Terwadkar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_16
227
228
S. Mane et al.
1 Introduction Today’s youth live surrounded by touchscreen gadgets like smartphones and tablets. With simple touches, swipes, and pinches, these gadgets enable very young infants to participate interactively and intuitively for a very long time, leading to adverse effects on their health. Statistics show that 37.15% of children have experienced reduced levels of concentration due to smartphone usage.1 People of all ages use mobile devices for various purposes, such as communication, entertainment, and information access. Because of this, mobile devices have developed into an abundant data source for researching human traits and behavior. One of the characteristics that can be inferred from mobile device usage is the age of the user. There is a dearth of studies on touch-based age-detection classification. Our work tries to fill the research gaps and tries to incorporate the new classification techniques and their complete analysis. We try to build on the existing research on touch-based identification of the age of the mobile phone user. Our aim through this contribution is to help progress the research in this area and which can be used in a variety of ways: • Medical research has proven that in parental control applications, many parents are frustrated by their child’s prolonged mobile phone usage. An earlier report by Common Sense Media found that 59 percent of parents felt that their child is addicted to their smartphones.2 Hence, the following research can be used in improvising such improvising parental control applications such that if touchscreen interfaces were able to distinguish instantly whether the user is a child or an adult, they could block access to certain applications in the mobile phone. • This research could help advertisement companies to identify if the user is a child and hence—(a). Prevent certain kinds of age-restricted advertisements to be switched with advertisements that are appropriate to children. (b). Target marketing of toys or other products which a child is likely to purchase. In this study, we improve the existing methods for quickly determining whether a kid is using a mobile device. This technology, if enabled, can pave the way for other systems. For instance, a system could convert to a more kid-friendly user interface or activate parental control when it notices a child using the device. When comparing the proposed work on touch interaction-based child detection with existing parental control apps and locks, the following major features may differentiate them: • Touch Interaction: The proposed work focuses on using touch interaction patterns on mobile devices as a means of detecting child users. This approach is unique compared to existing parental control apps and locks, which typically rely on password or biometric authentication. 1 https://economictimes.indiatimes.com/news/india/23-8-of-children-use-smartphones-while-inbed-37-15-losing-concentration-mos-it/articleshow/90403164.cms. 2 https://www.commonsensemedia.org/press-releases/new-report-finds-teens-feel-addicted-totheir-phones-causing-tension-at-home.
Child Detection by Utilizing Touchscreen Behavioral …
229
• Dynamic Identification: Touch interaction-based child detection can dynamically identify the child user based on their touch behavior, which may be useful in situations where multiple users share the same device. Existing parental control apps and locks typically rely on user authentication at the time of login or setup. • Ease of Use: Touch interaction-based child detection may be easier to use for children, as they do not have to remember passwords or undergo biometric authentication. Existing parental control apps and locks may be more difficult for children to use and may require parental assistance. • Privacy: The proposed work may have less privacy concerns compared to existing parental control apps and locks, as it does not rely on collecting biometric or personal information from the child user. Existing parental control apps and locks may collect and store sensitive information about the child user. The remaining sections of the paper are organized as follows. In the part after, we review previous research before outlining the purpose of our investigation. The following sections provide details of data preprocessing, models used, and their results. The findings of the examination of the suggested approaches are presented. The final section of our paper includes a summary of our results, a discussion of its broader implications, and potential directions for future research.
2 Literature Survey 2.1 Background From birth through adulthood, children’s cognitive and motor skills develop over time. As a result, how they engage with touch displays changes as well, necessitating the need to develop children’s applications that are appropriate for each age group. • Newborns—Parents have complete control over how their newborns use mobile devices throughout the first few months of life, and babies only receive stimulation from the device without any involvement from them. • 6 Months old—Babies begin engaging with the device by tapping on it because they can now control their hand movement. • 18 Months old—Children begin to engage with displays by learning how to tap, navigate inside apps, and pay more attention to application content. (Morante et al. 2015). • 2–4 Years old—Children can make some touch movements, like tap-and-drag gestures. • 4–6 Years old—Children’s fine motor skills develop, allowing them to do more sophisticated multi-touch actions and more exact line tracing, including letter writing.
230
S. Mane et al.
• Above 6 Years old—They can do complex multi-touch gestures, but their fine motor talents are still developing. According to Vatavu, Cramariuc, and Schipor (2015) research, children aged 7–10 miss targets measuring 7 mm in size 30% of the time.
2.2 Related Work Implications Vatavu et al. [1] developed a method for determining a user’s age group (child or adult) using touch coordinates obtained from touchscreen devices. They conducted 89 experiments with kids between the ages of 3 and 6 and got an accuracy of 86.5%. A method for estimating age and gender was developed by Buriro et al. [2] utilizing data from keystrokes on smart devices. Users entered passwords or secret PINs. Through the use of timing-based tap characteristics, they achieved the highest age estimation accuracy of 87.7%. Children between the ages of 5 and 10’s gestures were studied. According to Shaw and Anthony [3], the scientists searched for overall differences between age groups using a set of 20 predetermined touchscreen movements. For age group detection, Hernandez-Ortegaet al. [4] employed the user’s neuromotor features. They made advantage of [1]’s touchscreen dataset. The age group of a child was determined using accelerometer sensor data by Davarci et al. [5]. The fact that only young children are included in these studies’ dataset means that older children’s touch behavior is not examined. It was discovered in [13] that the most often utilized gestures in kid-friendly apps were tap, drag/slide, free rotation, drag & drop, pinch, spread, and flick. 100 children’s apps from the Apple store contained these gestures. Results of [12] supported the notion that youngsters between the ages of 2 and 4 can employ gestures. The research suggests that application developers should incorporate drag and drop, tap, drag/slide, and flick gestures to their children’s applications starting at age 2, drag and drop gestures for children starting at age 3, and all touch gestures for children starting at age 4 and up. Vatavu et al. [14]’s research revealed that young infants struggle to precisely acquire targets and drag objects on touchscreen surfaces, but as they become older and their cognitive and motor skills advance, they become faster and more accurate. Shafaeat hossain et al. [15] collected touch data for users of age 5 to 61 in 262 user sessions and upon regression gave an accuracy of 73.63%. Li et al. [6] confirmed that a user’s touch behavior can determine one’s age by including adults in their study. Thirty-one participants were studied, including 14 adults and 17 primary school students (aged 3–11). (22–60 years old). They tried out tapping and swiping and got an accuracy of 84%. Even though they widened the age range to 3–11, they missed the age range 12–21, which might share some similarities with the age ranges 3–11 and 22–60. As a result, their tests are still limited.
Child Detection by Utilizing Touchscreen Behavioral … Table 1 Summary of dataset Age group(years) Group size .
0 and .xi , xj = feature vectors Results Sklearn was utilized to implement and perform our assessments. The radial basis function (RBF) kernel was the best fit and gave the highest accuracy for the dataset. Table 8 shows the final accuracies for SVM for different touch activities.
4.5 K-Nearest Neighbors (KNN) The concept of nearest neighbor analysis has been used in several anomaly detection techniques. The k-nearest neighbor method, a supervised learning algorithm in which the outcome of a new instance query is classified based on the majority of k-nearest neighbor categories, is one of the finest classifier algorithms. Three primary aspects affect the KNN algorithm’s performance: • the measurement of distance used to identify the closest neighbors (Minkowski, Euclidean, Manhattan), • the number of neighbors that were utilized to classify the new sample, • the distance rule that was applied to derive a classification from the k-nearest neighbor.
238
S. Mane et al.
Table 9 Hyperparameters of KNN KNN ST DT SDD MDD
Raw Upsampled Raw Upsampled Raw Upsampled Raw Upsampled
Metric
n_neighbors
weights
Manhattan Minkowski Minkowski Minkowski Minkowski Minkowski Minkowski Minkowski
9 9 5 15 11 5 13 13
Uniform Distance Uniform Distance Uniform Uniform Uniform Uniform
Without making any a priori assumptions about the distributions from which the training examples are derived, the k-nearest neighbor rule consistently performs well. Techniques for detecting age group based on k-nearest neighbors demand a distance or comparable measurement between two data instances. In the KNN procedure, we categorize any incoming transaction by finding the closest point to the new transaction. If the closest neighbor is a child, the transaction will be flagged as a child. To break ties, the value of K is employed as a small and odd number (typically 1, 3, or 5). Larger K values aid in minimizing the impact of noisy data. Different methods of calculating the distance between two data instances can be used in this technique. The distance parameter that gives the highest accuracy is calculated using the Minkowski technique compared to Euclidean and Manhatten techniques. The Minkowski distance of order p (where p is an integer) between two points X and Y is defined as ( D(X, Y ) =
n ∑
) 1/ p |X i − Yi |
p
(4)
i =1
where
.
p = order and . X, Y = points
Results Sklearn was utilized to implement and perform our assessments. GridSearchCV from sklearn.model_selection is used for hyperparameter tuning. After the evaluation of the KNN model with default parameters and GridSearchCV, the results showed that GridSearchCV parameters were the best fit and gave the highest accuracy. The parameters obtained are specified in the parameters (Table 9). Table 10 shows the final accuracies for KNN for different touch activities.
Child Detection by Utilizing Touchscreen Behavioral … Table 10 Classification report of KNN Data KNN Raw A P Single tap (ST) 0.90 0.85 Double tap (DT) 0.96 0.97 0.83 0.90 Single drag drop (SDD) Multi-drag drop (MDD) 0.96 0.97
R 0.88 0.94 0.75 0.94
F1 0.86 0.95 0.78 0.95
239
Upsample A P 0.88 0.83 0.96 0.97 0.88 0.87 0.96 0.97
R 0.90 0.94 0.84 0.94
F1 0.85 0.95 0.85 0.95
5 Evaluation Metrics Referring to previous works, we chose accuracy and area under the receiver operating characteristic (ROC) curve as our final evaluation metrics for making conclusions.
5.1 Final Accuracies See (Table 11).
5.2 ROC Curves A receiver operating characteristic (ROC) curve is a graph that depicts a classification model’s performance across all categorization levels. AUC is a metric that aggregates performance across all categorization criteria. The ROC curve is a plot between sensitivity (true positive rate) on Y-axis and specificity (false positive rate) on Xaxis. The models with the best performances were identified and the ROC curves for them are given below.The area under the curve for the best model, having highest accuracy for each event for both raw and upsampled data, is given in caption of each figure, from Figs. 1, 2, 3, 4, 5, 6, 7, and 8.
Table 11 Final accuracies for raw and upsampled data of various models Data LGBM XGB SVM KNN R U R U R U R U A A A A A A A A ST 0.88 0.83 0.91 0.86 0.92 0.81 0.9 0.88 DT 0.92 0.92 0.83 0.92 0.96 0.88 0.96 0.96 0.88 0.88 0.83 0.88 0.88 0.88 0.83 0.88 SDD MDD 0.88 0.96 0.96 0.96 0.96 0.96 0.96 0.96
ANN R A 0.93 0.92 0.88 0.96
U A 0.84 0.96 0.88 0.96
240 Fig. 1 For ST event, AUC value for raw data, ANN model is 0.9686
Fig. 2 For ST event, AUC value for upsampled data, KNN model is 0.8970
Fig. 3 For DT event, AUC value for raw data, KNN model is 0.9375
S. Mane et al.
Child Detection by Utilizing Touchscreen Behavioral … Fig. 4 For DT event, AUC value for upsampled data, KNN model is 0.9375
Fig. 5 For SDD event, AUC value for raw data, ANN model is 0.9609
Fig. 6 For SDD event, AUC value for upsampled data, ANN model is 0.9687
241
242
S. Mane et al.
Fig. 7 For MDD event, AUC value for raw data, ANN model is 0.9921
Fig. 8 For MDD event, AUC value for upsampled data, ANN model is 0.9922
6 Conclusions The study demonstrates that mobile devices can correctly and automatically identify the presence of children based on their touch characteristics From the final accuracies table: • Single tap event has maximum accuracy of 93% achieved by ANN. While, after upsampling, the data KNN model gave maximum accuracy of 88%. • Double tap event has maximum accuracy of 96% achieved by KNN and SVM both. While, after upsampling, the data KNN and ANN model gave maximum accuracy of 96%. • Single touch drag and drop event has maximum accuracy of 88% achieved by LightGBM, SVM, and ANN. While, after upsampling, the data of all models gave maximum accuracy of 88%.
Child Detection by Utilizing Touchscreen Behavioral …
243
• Multi-touch drag and drop event has maximum accuracy of 96% achieved by XgBoost, SVM, KNN, and ANN. While, after upsampling, the data all models gave maximum accuracy of 96%. In conclusion, this study has shown that touch interaction on mobile devices can be used as a reliable method for age classification.
6.1 Limitations Although the proposed technique has demonstrated outstanding performance, which highlights the behavior-based model’s inherent strength, there are still a number of limitations, which are listed below. • The database used for this experiment poses a significant constraint because no people with ages between 6 and 25 have data in it. To the best of our knowledge, the research community does not have access to a database with information on touchscreen interactions. • We also came to the conclusion that the more we could regulate the motions, the less likely it was that there would be any variations as the activities would be precise and direct, leaving little opportunity for them.
6.2 Future Scope • In the future, we would like to gather touch gestures in a situation that is more like the real world, where users are able to choose from a variety of environments and gestures. This would improve the system’s ability to reliably identify users. • Currently, data is gathered from a single mobile device; however, differing screen types and sizes may have an impact on touch sensitivity. In the future, gathering such data and using it to evaluate our models might lead to a greater comprehension of touch interaction with screens. • Another interesting avenue to explore is to combine touch-based biometrics with other biometric modalities, such as facial recognition, voice recognition, and gait analysis, to create a multi-modal biometric system for age classification. • Future studies will also involve discovering and assessing highly personal qualities that could be utilized for user recognition as well as features that are typical of the entire age group. In addition, the proposed method can be integrated with other applications, such as personalization, security, and marketing. Overall, the proposed method has a lot of potential for future research and development in the field of mobile biometrics and age classification.
244
S. Mane et al.
References 1. Vatavu RD, Brown Q, Anthony L (2015) Child or adult ? inferring smartphone users’ age group from touch measure- ments, alone. In: Abascal J, Fetter M, Barbosa S, Palanque P, Winckler M, Gross T (eds) Human-computer interaction-INTERACT. Springer International Publishing, Cham, pp 1-9 2. Buriro A, Crispo B, Akhtar Z, Del Frari F (2016) Age, gender and operating-hand estimation, on smart mobile devices. In: 2016 international conference of the biometrics special interest group (BIOSIG). IEEE, pp 1–5 3. Anthony L, Shaw A (2016) Toward a systematic understanding of children’s touchscreen gestures, pp 1753–1759 4. Fierrez J, Julian A, Acien A, Morales A, Hernandez-Ortega J (2017) Detecting age groups using touch interaction based on, neuro- motor characteristics. Electr Lett 53 5. Erguler I, Sabri O, Soysal B, Dincer E, Davarcı E, Anarim E, Aydin O (2017) Age group detection using smartphone motion sensors 6. Li X, Cheng Y, Malebary S, Xu W, Qu X, Ji X (2018) icare: automatic and user-friendly child identification on, smartphones, pp 42–48 7. Li H, Li S, Cao Y, Sun Y, Zhao J (2020) Xgboost model and its application to personal credit evaluation. IEEE Intell Syst 35(3):52–62 8. Gu XF, Huang YY, Lin J, Liu L, Li JP (2008) Data classification based on artificial neural networks. In: 2008 international conference on apperceiving computing and intelligence analysis, pp 223–225 9. Kumar DA, Bala R (2017) Classification using ANN: a review 10. Bhambhu L, Srivastava D (2010) Data classification using support vector machine. J Theoret Appl Inf Technol 12:1–8 11. Batista G, Prati R, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations 6:20–29 12. Azah N, Syuhada N, Batmaz F, Stone R, Wai P (2014) Selection of touch gestures for children’s applications: repeated experiment to increase reliability. Int J Adv Comput Sci Appl 5:05 13. Nor Azah AA, Touch screen application (iPad): the most used Gestures for children’s applications, Aplikasi Skrin Sesentuh iPad: Gerakan Jari yang selalu digunakan untuk Kanak-Kanak. NCDRC, Journal, in press 14. Vatavu R-D, Cramariuc G, Schipor DM (2015) Touch interaction for children aged 3 to 6 years: Experimental findings and relationship to motor skills. Int J Hum-Comput Stud 74:54–76 15. Hossain M, Haberfeld C (2020) Touch behavior based age estimation toward enhancing child safety, pp 1–8
Hand Gesture-Controlled Wheelchair Minal Patil, Abhishek Madankar, Roshan Umate, Sumiti Gunjalwar, Nandini Kukde, and Vaibhav Jain
Abstract The purpose of this article is to design a hand gesture-controlled wheelchair for physically challenged people who have trouble getting from place to place on a daily basis. Physically disabled folks and elderly people depend on other people in today’s world. According to studies, there are 6 million people worldwide who are paralysed and rely on wheelchairs for transportation. Previously, anyone had to move and provide external support for the wheel chairs. Many persons with impairments can find their requirements met by conventional manual or motorised wheelchairs, but for certain disabled people, using a wheelchair is challenging or impossible. In-depth research on computer-controlled chairs has extensively used sensors and intelligent control algorithms to minimise the need for human intervention. This concept describes a smart wheelchair for physically challenged people. Our goal is to develop a system that enables successful wheelchair user interaction at multiple control and sensing levels. This dependent system acknowledges using the impaired person’s hand gestures, and wheelchair motors are synchronised. A wheelchair can be propelled with the possibility of avoiding obstacles using motors and hand motions. This may be controlled to approach the user from a distance and can be used with both hands. A device based on Arduino, such as the Arduino Nano processor, is used to implement the current task, and it is programmed using the Arduino IDE. To make people independent, we have created a gesture-controlled wheelchair. All that are required are for the user to wear a gesture device with an accelerometer sensor. The wheelchair will travel in the appropriate directions as a result of the sensor picking up the hand’s movement in a certain direction. Keywords Gyroscope · Arduino Nano · RF module M. Patil (B) · A. Madankar Department of ET Engineering, Y. C. College of Engineering, Nagpur, India e-mail: [email protected] R. Umate Datta Meghe Lnstitute of Higher Education and Research, Swangi, Wardha, India S. Gunjalwar · N. Kukde · V. Jain Y. C. College of Engineering, Nagpur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_17
245
246
M. Patil et al.
1 Introduction Wheelchairs have changed from being only chairs with wheels to fully automated personal mobility vehicles to assist those who are physically disabled. Wheelchairs are useful mobility devices for people who are physically challenged as well as sick people who are in and out of hospitals, elderly people living alone, disabled athletes who are participating in sports, and many different emergency situations like road accident rescues, fire rescues, air rescues, maternity emergencies, and a variety of other emergency rescues. Self-moving wheelchairs that possess a certain level of understanding, such as the ability to measure and control speed, sense the existence of obstacles and other capabilities are gradually coming to dominate literature on intuitive wheelchairs. Some groups of people have not been able to completely benefit from and fit into the social groupings because they cannot easily visit basic skill acquisition centres due to the lack of self-mobility devices. This has had some effect on a sizable number of paraplegics who routinely beg for their lives on the streets of most poor countries. This has unquestionably substantially decreased the GDP of the afflicted countries. However, both children and adults place a high value on movement independence. Because they are unable to move about independently, these groups are at a disadvantage, which makes it more difficult for them to attain their academic and professional objectives. The goal of this project is to automate the use of wheelchairs for forward and backward movements. The main objective of this research is to help severely disabled people use electric wheelchairs on their own to restore their independence. To help drivers who are less able to move around independently, a smart wheelchair is outfitted with an Arduino Nano, motors, motor drivers, an RF module, an LCD display, and a computer. By using only hand motions, the wheelchair may be manoeuvred in four distinct directions. The receiver part can assume some of the burden for controlling the wheelchair and avoiding obstacles when the rider is not yet capable of doing so on their own. How much work the rider decides to undertake is decided by both the rider and the carer. Problem Statement Independent movement is essential for physically handicapped people to develop their physical, cognitive, linguistic, and social skills. All people cannot afford the electric wheelchairs’ exorbitant price. Thus, the objective of this project is to develop a more sophisticated control system for a smart wheelchair. Since current wheelchairs require human movement and have only a few functions. The user must move the wheel manually and may become exhausted if they are involved for an extended amount of time. The modern wheelchair also has shortcomings. Due to the fact that the shape and position cannot be used to correct the person’s body, it is not only not healthy enough but also not really comfortable. The ergonomic redesign of an existing wheelchair is a crucial part of human factor engineering. An evaluation of the current wheelchair’s daily usage is the first necessity for this project. Through hand gestures and movement in a certain direction, the wheelchair user delivers instructions. Questionnaire analysis and sufficient object design research are essential to
Hand Gesture-Controlled Wheelchair
247
ensuring that the new wheelchair design satisfies all desirable design requirements. Wheelchairs must be created with the user’s safety and comfort in mind. It should also avert significant problems that could result in an accident. To create an ergonomically pleasing wheelchair design that offers suitable changeable features and other characteristics that may be altered by the user, a number of designs must be created and evaluated using human factor engineering and material selection.
2 Literature Survey The suggested approach is fresh, entirely automatic, and successful in fixing the problem for the impaired person [15]. A wheelchair’s ability to provide an umbrella, foot mat, head mat, and obstacle detection is not a need for using it. The humidity sensor keeps track of the weather, which enables the head mat to operate automatically (rainy, hot, or cold). Moreover, the ultrasonic sensor can find the obstruction thanks to the GPS’s location tracking. The proposed day/season-based framework is effective in working with a live climate, as demonstrated by a model framework execution [1]. The results of the NSSO’s 2011 Survey of Disabled Persons show that disability is a serious problem for India’s public health. The wheelchair is the most popular assistive device when it comes to boosting the personal mobility of impaired people. Power wheelchairs can help people who are unable to push a manual wheelchair. Using a power wheelchair is challenging for those with disabilities who lack strength, motor skills, and visual acuity. To solve this problem, a number of researchers have created intelligent wheelchairs employing mobile robot technologies [7–9]. The components of a smart wheelchair are a computer, a regular power wheelchair, and several sensors. As a result of recent developments in research areas including computer science, robotics, artificial intelligence, and sensor technology, the variety of functionalities that are available in smart wheelchairs has expanded [14]. This study presents the designs of different smart wheelchair prototypes that researchers have suggested. It is understood that there will be difficulties in this area of study in the future. The direction of more study is also mentioned [2]. One of the biggest problems affecting the independence of the aged and disabled is declining mobility. Mobility assistive technologies are now being developed to enhance people’s quality of life. Yet, it is necessary to enhance the current mobility aids [8]. The design and building of a smart wheelchair with different control interfaces are the focus of this research. A variant of the smart wheelchair was created using similar electrical and mechanical innovations to a standard wheelchair that is already available on the market. The technology offers speech and gesture-control interfaces in addition to the mobile application for operating the wheelchair to boost user interaction. The smart wheel chair’s electrical and control system, as well as the mechanical design, is described in greater detail. Tests were completed to check the usefulness of the brilliant wheelchair that was created. In addition, a user study evaluated user preferences for the various controlling interfaces and presented the findings [3]. This research project is an expansion of the writing and introduces a robotized wheelchair framework that has been designed,
248
M. Patil et al.
built, and upgraded with joystick capability for obstruction recognition and independent stops. The DC motor that drove the wheel chair in a linear and directional manner was driven by an Arduino Uno microcontroller unit, which was utilised to synchronise the complete setup. The system would tremendously assist the community of persons who have lost some of their independent movement, enhancing their self-esteem and enabling them to pursue their educational and professional objectives. The Adaptive Neuro-Fuzzy Inference System (ANFIS) was then employed to assess the performance of the built system. The system’s overall intelligence was ultimately rated by the sensitivity rule viewer as having a value of 63.8% in the first trial, 75% in the second, and almost 80% in the third. This indicates the system’s superb functionality, potency, and effectiveness [4]. The biggest obstacle [9] is that people with physical limitations must rely on others to get around. We developed a gesture-controlled wheelchair to enable them to become more autonomous [12]. The user simply needs to wear a gesture-capable device with an integrated accelerometer [13]. When the sensor notices the hand moving in a specific direction, the wheelchair will travel in the appropriate directions. A ultrasonic sensor is also used to detect obstructions in the path. We describe a carefully thought-out method for real-time, hand-based direction identification and control via gestures. A joystick-controlled wheelchair can be purchased nowadays for somewhere between Rs 70,000 and Rs 140,000. We are creating a Hand Gesture-Controlled Wheelchair for Rs 35,000. The wheelchair is constructed in a cost-effective manner while also guaranteeing the users’ mobility, adaptability, and safety [5]. This article will outline a low-cost Smart Wheelchair-based Arduino Nano microcontroller and Internet of Things technology with a number of features that can help disabled people, especially those who are poor and cannot afford a pricey Smart Wheelchair, and [16–18] get the help they need to carry out daily tasks on their own. In conclusion, this initiative will make the Smart Wheelchair inexpensive and accessible to a variety of impaired people [10, 11]. The Arduino Nano, an ESP-12e module for Wi-Fi connectivity, an MPU6050 to detect a fall and use the IFTTT platform to send a voice message, an obstacle detection system with a buzzer and LED that serves as dangers, a voice recognition system, and joysticks to control the wheelchair will all be used [6, 19–21].
3 Methodology As a result of recent developments in scientific areas including computer science, robotics, artificial intelligence, and sensor technologies, there are more potential uses for helping impaired individuals. The capabilities of a normal power wheelchair are improved by smart wheelchairs by combining control and navigational intelligence. A hand gesture-controlled wheelchair normally consists of a computer, motors, drivers, and a standard powered wheelchair base. The physical and mental work needed to manoeuvre this wheelchair is less. This intelligent wheelchair is controlled by hand gestures. It is based on techniques used in mobile robotics research. Two components the transmitter section and the receiver section are put in this wheelchair.
Hand Gesture-Controlled Wheelchair
249
The transmitter section is attached to the person sitting on the wheelchair. When the directions are given by the hands to the wheelchair, it moves in that direction. The wheelchair moves just because of the receiver section installed in it. This section controls and monitors the movement of the wheelchair on which the person is sitting. Smart wheelchairs are available for a variety of users. In order to prevent people with cognitive disabilities from unintentionally selecting a drive command that results in an accident, some smart wheelchairs are built with collision-avoidance features. Some sophisticated wheelchairs are designed for those with significant motor impairments. In this paradigm, the smart wheelchair’s function is to translate hand gestures into high-level orders and execute them. These intelligent wheelchair designs often use artificial intelligence-derived algorithms. The collision of wheelchair with obstacles can be avoided if we apply sensors that detect the obstacle and move wheelchair in other direction to avoid collision of wheelchair with obstacle. The wheelchair is propelled by the electric motors when in gesture-control mode. The motors are inactive and the wheelchair does not move when the arm is in the neutral position. By removing his or her hand from the neutral position, the user controls the direction of the wheelchair’s movement.
4 Working All the components are connected in the block diagrams given below. Initially, the system is divided into two sections, i.e. Figure 1 is of transmitter section and Fig. 2 is of receiver section. In Fig. 1a, the gyroscope and RF module which is connected to the antenna are all attached to the Arduino Nano. This is attached to the person’s hand, which according to the movement of hands will give a signal to the receiver section for moving. The movements such as right and left, forward and backward are all determined by this section.
4.1 Circuit Diagram of Transmitter The circuit diagram depicted in Fig. 1b represents the transmitter, which consists of components such as the Arduino Nano, RF module, gyroscope, and a battery. Battery is connected to voltage regulator which is connected to Arduino Nano. Gyroscope and RF module are connected to Arduino Nano. Gyroscope identifies the signals which were given to it, then it further passes the signal to Arduino Nano which will pass the signal to RF module which will encode the message and send to receiver circuit. In Fig. 2, when the signal is received by the receiver section from the transmitter section, then the wheelchair tends to move in the direction in which we want to move. The Arduino Uno is connected to the different components such as RF module, motors, motor drivers, antenna, and LCD display. When the signal is received by the
250
M. Patil et al.
(a)
(b) Fig. 1 a Transmitter block diagram of the proposed system, b circuit diagram of transmitter
Hand Gesture-Controlled Wheelchair
251
(a)
(b) Fig. 2 a Receiver block diagram of the proposed system, b circuit diagram of receiver
receiver section to move in a specified direction, the motors and motor drivers move in that direction where the person has to move and in which direction the person on the wheelchair is moving is displayed on the LCD display.
252
M. Patil et al.
Table 1 The angle in which the chair is moved S. No.
Functional table Palm movement
Function
Action
Angle in which chair move
1
Forward
front()
Chair moves forward
90 degree
2
Horizontal
stop()
Chair stops
90 degree
3
Left
left()
Chair moves in left direction
45 degree
4
Right
right()
Chair moves in right direction
45 degree
5
Backward
back()
Chair moves backward
90 degree
4.2 Circuit Diagram of Receiver Above circuit diagram (Fig. 2b) is of receiver circuit which contains Arduino Nano, battery, motors, motor driver, LCD display, RF module. In receiver circuit, the encoded message which was received by transmitter circuit is sent to Arduino Nano which sends signal to RF module which decodes the message and sends again to Arduino which then passes the message to motor driver which helps the motor work. Two motors are there for left and right wheels.
4.3 Functional Table The wheelchair can turn left, right, and forward. Table 1 above demonstrates how the wheelchair operates by hand motion and what happens as a result of its operations. When the function front is used, the chair goes straight ahead and 90° to the front. When the left function is used, the chair turns 45° and moves directly to the left. When the right function is used, the chair turns 45° and moves directly in the proper direction. Flowchart for transmitter is shown in Fig. 3. Flowchart for receiver is shown in Fig. 4.
4.4 Merits and Demerits Table Merits and demerits are shown in Table 2.
Hand Gesture-Controlled Wheelchair
253
Fig. 3 Flowchart for transmitter
4.5 Flowcharts To demonstrate how it works, there are two flowcharts for two circuits. In the first flowchart, the gyroscope scans the message that was sent to it, sending the string “LEFT” if it moves in the left direction. Similar to this, it will send the string “RIGHT” if it moves in the proper direction. It will send the string “FORWARD” if it goes in the forward direction. It will send the string “BACKWARD” if it moves backward before moving forward. In the second flowchart, the receiver circuit will scan the string that was sent from the transmitter. The wheelchair will travel to the left if the string “LEFT” is received. Similar to this, the wheelchair will go in the right direction if it receives the string “RIGHT”. The wheelchair will travel forward if it receives the string “FORWARD”. The wheelchair will also move backward if it receives the string BACKWARD.
4.6 Need of Project The goal of this work is to create a wheelchair control that helps physically disabled people by reorganising their hand gestures or hand movements. Using hand gestures
254
M. Patil et al.
Fig. 4 Flowchart for receiver
Table 2 Merits and demerits Merits
Demerits
• Reduced complexity • Low cost
• Low accuracy • ˙If power supply fails, chair would not work
• Great reliability
• Failure of components may occur
• Easy controlling
• Fatal accidents can happen due to improper use of gyroscope
to control the chair’s movement, a physically challenged person could move himself to the target destination with the use of a wheel chair. This essay attempts to offer folks with disabilities who are unable to move the wheelchair on their own a workable solution. These are those who suffer from severe paralysis. Wheelchair-automated control systems have proven to be effective solutions for a wide range of HCI issues. In essence, they let people including those who are disabled use computers and other systems more easily. Wheelchair mounted with DC motors is shown in Fig. 5. Implementation of the proposed system is shown in Fig. 6.
Hand Gesture-Controlled Wheelchair
255
Fig. 5 Wheelchair mounted with DC motors
Fig. 6 Implementation of the proposed system
5 Summary and Conclusion While user-centric wheelchair design has received little attention, there has been significant research into gesture-controlled wheelchairs up to this point. These prototypes were designed without taking into account caregivers and user expectations. Furthermore, the smart wheelchair incorporates sensors and computational units. The normal usability of the wheelchair and user comfort were compromised in several projects. Most of the time, these projects have rigid software and hardware designs that are too expensive for most potential users. Therefore, the goal of this
256
M. Patil et al.
research is to design a hand gesture-controlled wheelchair that is user-centred and takes into account a variety of disabilities without compromising the user’s comfort. For many years to come, smart wheelchair design will be a profitable area of research. The development of smart wheelchairs presents an opportunity to investigate novel approaches for user input, human–robot communication, adaptive control, and shared control. Robot control architectures will benefit from smart wheelchair models as testing grounds. Before smart wheelchairs can be widely adopted by users, there are a number of issues with sensor technology that must be resolved. With current sensors, it is crucial to compromise between cost and accuracy. It will be beneficial to do research on low-cost, sophisticated, and light-weight sensors to raise the degree of acceptance for smart wheelchairs. Smart wheelchairs can serve as a fantastic testing ground for bio-medical sensors. The transmitter and receiver sections make up this system. The wheelchair moves in accordance with the signals received by the transmitter since the receiver is attached to the wheelchair and the transmitter is attached to the hand of the person using it. The wheelchair can move in line with the user’s gestures and is completely capable of supporting loads up to 150 kg. The left, right, forward, and backward movements of the gesture-based wheelchair are controlled by two Arduino processors. It is possible to improvise and make improvements to the wheelchair to make it more accessible to people with total body paralysis. It is possible to improve the wheelchair system by making some adjustments, such as changing the way your eyes move or using a brain signals’ reader.
References 1. Bourhis G, Moumen K, Pino P, Rohmer S, Pruski A (1993) Assisted navigation for a powered wheelchair. In: Systems engineering in the service of humans: proceedings of the IEEE international conference on systems, man and cybernetics. IEEE, Le Touquet, France, Piscataway (NJ), pp 553–58, 17–20 Oct 1993 2. Boy ES, Teo CL, Burdet E (2002) Collaborative wheelchair assistant. In: Rebsamen B, Teo CL (eds) Proceedings of the 2002 IEEE/RSJ international conference on intelligent robots and systems (IROS), vol 30. IEEE, Lausanne, Switzerland. Piscataway (NJ), pp 1511–16, 30 Sep–5 Oct 2002 3. Zeng MH, Burdet E, Guan C, Zhang H, Laugier C (2007) Controlling a wheelchair indoors using thought. IEEE Intell Syst 22(2):18–24 4. Keating D, Warwick K (1993) Robotic trainer for powered wheelchair users. In: Proceedings of the IEEE international conference on systems, man and cybernetics. IEEE, Le Touquet, France, Piscataway (NJ), pp 489–93, 17–20 Oct 1993 5. Nishimori M, Saitoh T, Konishi R (2007) Voice controlled intelligent wheelchair. In: SICE annual conference 2007, international conference on instrumentation, control and information technology, pp 336–340 6. Moon L, Lee M, Chu J, Mun M (2005) Wearable EMG- based HCI for electric-powered wheelchair users with motor disabilities. In: Proceedings of the 2005 IEEE international conference on robotics and automation, pp 2649–2654 7. Simpson R, Poirot D, Baxter MF (1999) Evaluation of the Hephaestus smart wheelchair system. Int Conf Rehabil Robot
Hand Gesture-Controlled Wheelchair
257
8. Roumeliotis SI, Sukhatime GS, Bekey GA (1998) Fault detection and identification in a mobile robot using multiple-model estimation. In: Proceedings of 1998 IEEE international conference on robotics and automation (ICRA), pp 2223–2228 9. Braga RA, Petry M, Moreira AP, Reis LP (2009) Concept and design of the intellwheels platform for developing intelligent wheelchairs. In: Informatics in control, automation and robotics, pp 191–203. https://bit.ly/3v9QzRV 10. Uchiyama H, Deligiannidis L, Potter WD, Wimpey WJ et al (2005) A semi-autonomous wheelchair with Help Star. In: ˙International conference on industrial, engineering and other applications of applied intelligent systems, pp 809–818. https://bit.ly/3oEWufA 11. Ding D, Cooper RA (2006) Electric powered wheelchairs. IEEE Control Syst Mag 22–34 12. Yanco HA (1998) Wheelesley, a robotic wheelchair system: indoor navigation and user interface. Lecture notes in artificial intelligence: assistive technology and artificial intelligence: applications in robotics, user interfaces and natural language processing. Springer, Heidelburg, pp 256–68 13. Mazo M (2001) An integral system for assisted mobility. IEEE Robot Autom Mag 8(1):46–56 14. Lee PK, Lai LL (2009) A practical approach of smart metering in remote monitoring of renewable energy applications. In: Power & energy society general meeting. IEEE PES ‘09 15. Tan HGR, Lee CH, Mok VH (2007) Automatic power meter reading system using GSM network. IEEE RPS 16. Treytl A, Sauter T, Bumiller G, Real-time energy management over power-lines and internet. The proceedings of the 8th international symposium on power line communications and its applications 17. Zhu J, Pecen R (2008) A novel automatic utility data collection system using IEEE 802.15.4compliant wireless mesh networks. In: Proceedings of the 2008 IAJC-IJME international conference 18. Lei Y, Yi Z, Chong-chong Y, Zhen-gang D (2009) Design and research on data analysis platform of the renewable energy monitoring system. In: IE&EM ‘09. 16th international conference on industrial engineering and engineering management. Beijing, China, pp 722–725, 21–23 Oct 2009 19. Lubritto C, Petraglia A, Vetromile C, Caterina F, D’Onofrio A, Logorelli M, Marsico G, Curcuruto S (2008) Telecommunication power systems: energy saving, renewable sources and environmental monitoring. IEEE 30th international telecommunications energy conference. San Diego, USA, pp 1–4, 14–18 Sept 2008 20. SICE Annual Conference (2007) International onference on instrumentation, control and information technology, pp 336–340 21. Bourhis G, Moumen K, Pino P, Rohmer S, Pruski A (1993) Assisted navigation for a powered wheelchair. In: Systems engineering in the service of humans: proceedings of the IEEE international conference on systems, man and cybernetics. IEEE, NJ, pp 553–558, 17–20 Oct 1993
Decentralized Evidence Storage System Using Blockchain and IPFS Neeraj Salunke, Swapnil Sonawane, and Dilip Motwani
Abstract During legal proceedings, producing untampered evidence is crucial. Evidence may be viewed by several parties involved in the investigation who temporarily assume ownership from the time it is collected until it is used in a court. Many times, digital evidence is stored on pen drives, hard disks, etc. Due to recent developments in computer technology, evidence integrity can be compromised by hackers. During this entire process from evidence collection to using them in courts, authorities must ensure that evidence must not get tampered with or lost. In this paper, we proposed a blockchain-based evidence storage system to decentralize the entire process of evidence handling. This system is made using Ethereum blockchain and IPFS which ensures that evidence is immutable, secure, accessible, and auditable. Blockchain is a digital ledger that keeps track of transactions between number of computers without central authority. It is made up of chain of blocks, each containing record of multiple transactions, and once a block is added to the chain, the information cannot be altered. Additionally, blockchain technology reduces the cost of storing and securing evidence by eliminating intermediaries and centralized authority. Interplanetary File System (IPFS) is a protocol that enables users to share files in a decentralized way. Instead of storing evidence in one central location, IPFS works by breaking large files into smaller chunks and spreading those files across different computers or nodes in a network. Keywords Blockchain · IPFS. Dapp · Ethereum · Legal blockchain
N. Salunke · S. Sonawane (B) · D. Motwani Vidyalankar Institute of Technology, Mumbai, India e-mail: [email protected] N. Salunke e-mail: [email protected] D. Motwani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_18
259
260
N. Salunke et al.
1 Introduction Blockchain technology is a decentralized ledger system that enables secure and transparent transactions without the need for a central authority. Each block in the chain contains a cryptographic hash of the previous block, ensuring the integrity of the data and making the system resistant to tampering. Blockchain has been widely applied to various domains, such as finance, supply chain management, and identity verification. Interplanetary File System (IPFS) is a distributed file system that provides a peer-to-peer storage and retrieval of files. IPFS uses a content-addressed system, where files are identified by their content, rather than their location or name. This allows for efficient and decentralized storage and retrieval of files, as multiple copies of the same file can be stored across multiple nodes in the network. Evidence is something that is lawfully presented before a court or other decisionmaking body to establish the validity of a claim. To successfully prosecute the guilty, the appropriate evidence must be properly acquired, evaluated, and presented. Nowadays, evidence is recorded on paper and remains in physical format or digital evidence is stored on devices. Digital evidence can range from logs to video footage, photographs, archives, temporary files, replicant data, residual data, metadata, active data, and even information kept in a device’s RAM (also known as volatile data). Logs could be anything like phone logs, email logs, IP logs, server logs, network logs, database logs, or fingerprints. These logs can provide vital evidence, for example, email logs are stored as CSV files, which are stored on devices as evidence. Due to recent technological advancements, malicious actors like hackers gain access to these devices and change logs which are going to be used as evidence in courts. This can lead to innocent people being blamed and guilty people going free. Footage includes CCTV and mobile videos or voice recordings. Photographs, text documents, files, source codes, and other types of materials are all included in archives. These are stored digitally on devices or in centralized databases. The widespread use of digital technology has made data storage and sharing an integral part of our daily lives. However, centralized storage systems, such as cloud services, have raised concerns about data privacy and security. Decentralized storage systems offer an alternative solution, providing a secure and transparent way of storing and sharing data without relying on a central authority. In the past, there have been cases where evidence was tampered with intentionally or because of negligence. There is no way to make sure that tampering will not happen in future. If evidence is tampered with, it will not be considered as valid proof in court. The idea of this project is to handle evidence right from it gathered till it is presented in court. Apart from digital evidence, some physical evidence may not be fully able to be converted into digital format or may lose important information in the process. These include biological evidence such as blood or DNA samples. These types of evidence require special handling to maintain their integrity. But as time passes, this evidence might deteriorate or get tampered with. In such cases, it is better to convert this physical evidence to some form of digital files and create a replica to preserve
Decentralized Evidence Storage System Using Blockchain and IPFS
261
them on a trusted decentralized platform where they remain temper-proof. Thus, we propose a blockchain and IPFS-based evidence storage system to manage evidence. This paper proposes a decentralized evidence storage system that utilizes blockchain and Interplanetary File System (IPFS) technologies. This system provides a tamper-proof and immutable way of storing evidence, such as legal documents, medical records, or financial transactions, that can be accessed by authorized parties in a secure and decentralized manner. The proposed system addresses the limitations of existing centralized storage systems, such as data loss, hacking, and unauthorized access. The blockchain component of the system ensures the authenticity and integrity of the stored evidence by providing a distributed ledger that records every transaction and maintains the immutability of the data. The IPFS component provides a decentralized and fault-tolerant storage layer that facilitates the efficient and secure retrieval of the stored evidence.
2 Problems in Current System Numerous decentralized storage systems have been proposed in the literature. One of the most prominent systems is the Filecoin network, which incentivizes users to contribute storage and bandwidth to the network through a blockchain-based incentive system. The Storj network is another system that distributes data across multiple nodes in a peer-to-peer network and encrypts data for security. Other systems, such as Sia and MaidSafe, also use a decentralized architecture for storage and retrieval of data. Sia utilizes a blockchain-based payment system to encourage hosts to contribute storage space, while MaidSafe employs a consensus-based approach to ensure the availability and security of data. Despite the benefits of decentralized storage systems, there are still several challenges that need to be addressed, including performance, scalability, and usability. This paper proposes a decentralized evidence storage system that addresses these challenges and provides a secure and efficient way of storing and sharing evidence in a decentralized environment. It has become increasingly difficult for the public to trust the criminal justice system. In India, there are many innocent people serving false imprisonment. These people may have been wrongly convicted because of tampered or damaged evidence collected by investigators from the crime scene. Currently much evidence, court hearings, etc. are noted down on paper and stay in physical format and can be tampered with or get damaged easily. For example, physical copies of photographs taken by investigating agencies from crime sites can be intentionally tampered with or stolen. The court system is very slow, and a case can run for many years. Physical evidence like photographs and documents might deteriorate with time and will remain of no use. Apart from physical evidence, digital evidence is stored on hard disks, pen drives, or on any electronic device. Malicious actors like hackers or cybercriminals may attempt to gain unauthorized access to these devices to damage or alter evidence. Natural disaster or technical failures such as power failure, fire, flood, or malware attack can also cause data loss or damage evidence. When key evidence
262
N. Salunke et al.
gets damages or lost, it can have several negative consequences, such as it can delay the legal process or may become impossible to prosecute a case and the suspect may get acquitted due to lack of evidence. In worst case, innocent person could get wrongfully convicted. Decentralized evidence storage systems utilizing blockchain and IPFS have gained traction in recent years due to the limitations of conventional centralized systems concerning security and privacy. The present literature on this topic presents a comprehensive overview of the advantages, difficulties, and uses of decentralized evidence storage systems. In a study by Zheng et al. (2020), a blockchain-based decentralized evidence storage system was proposed that utilizes IPFS to store evidence files. The system leverages smart contracts to manage file access permissions and ensure data confidentiality. The research indicated that the proposed system outperformed traditional centralized systems in terms of security, dependability, and privacy [1]. Another study by Zhang et al. (2020) introduced a decentralized digital evidence storage system that relies on blockchain and IPFS. The system uses a distributed file storage mechanism to enhance the dependability and availability of evidence data. The study concluded that the proposed system provides a more secure and efficient solution for digital evidence storage compared to traditional centralized systems [1]. Nonetheless, despite the encouraging findings of these studies, there are still some gaps in the literature. For example, there is a dearth of research on the scalability of decentralized evidence storage systems that employ blockchain and IPFS. Moreover, further research is needed on the integration of decentralized evidence storage systems with other emerging technologies like artificial intelligence and machine learning. To sum up, the existing literature on decentralized evidence storage systems using blockchain and IPFS offers valuable insights into the benefits and challenges of this technology. In many parts of the world, proper collection, storage, and handling protocols are not in place to prevent evidence tampering, nor are there any ways to ensure their integrity. If evidence is lost or damaged, it will damage the reputation of investigating agencies or prosecutors. To protect digital evidence, some countries follow protocols like regular backups, disaster recovery plans, and regular audits, but these protocols are costly and do not guarantee that evidence will not get tampered. There is an increasing need for a system where almost all evidence could be digitized and stored on a platform which provides data integrity, accessibility, transparency, security, and authenticity.
3 Solution Using Blockchain and IPFS The proposed decentralized evidence storage system consists of four main components: evidence submission, evidence storage, evidence retrieval, and evidence verification. The evidence submission component allows users to submit evidence to the system by uploading the evidence file, which is encrypted using AES-256 encryption
Decentralized Evidence Storage System Using Blockchain and IPFS
263
and stored on the IPFS network. The system stores the content hash of the encrypted file on the Ethereum blockchain, along with the metadata, such as the evidence type, timestamp, and user ID. The evidence storage component stores the evidence files on the IPFS network using IPFS’s content-addressed system, which ensures the redundancy and availability of the evidence files. The evidence retrieval component enables users to retrieve evidence files from the IPFS network using their content hash. The system uses IPFS’s distributed network to retrieve the files from multiple nodes, ensuring their availability and reliability. The evidence verification component enables users to verify the authenticity and integrity of the evidence files. The system uses the content hash stored on the Ethereum blockchain to ensure that the evidence has not been tampered with or modified. Blockchain is a digital ledger that keeps track of transactions between number of computers without a central authority. It controls the generation of new units, as well as secures and validates transactions, using cryptography. It is a permanent and unchangeable record of all transactions on the network, making it difficult for a single user to alter the ledger without the consensus of the network [2]. Blockchain is primarily known for recording and verifying financial transactions [3], but it can also be used to store other forms of data including files. This is done through a process called ‘hashing’, where a file is converted into a unique digital signature or hash. This hash can be stored on the blockchain as a part of transaction and the file can be stored on the IPFS network. IPFS is an Interplanetary File System to store documents in a decentralized format. There are various reasons why we did not store files directly on the blockchain. Firstly, storing files on blockchain can be impractical due to high cost of storage and current scalability limitations. Also, the speed of access to the files stored on the blockchain will not be fast as traditional centralized storage systems. That is why we use IPFS to store files. All the participants involved in court matters are present on the same network and have access to evidence. Investigators can upload evidence on blockchain. These files can be accessed by lawyers, courts, and common citizens [4]. Use of blockchain will solve all the problems faced by current Indian judiciary system. We built a platform on Ethereum blockchain, on which only investigators or police have the permission to upload files related to evidence. Legal information can be stored on a distributed ledger rather than being emailed back and forth, which increases the integrity of the data. Such system will bring Accessibility, Data integrity, and Transparency to the courts. The proposed decentralized evidence storage system offers several advantages over traditional centralized systems. The system provides high levels of security and transparency by using blockchain technology to store the evidence content hash and metadata. It also offers a high level of reliability and availability by using IPFS technology to store and retrieve evidence files in a distributed and redundant manner. The system ensures the privacy and confidentiality of the evidence files by encrypting them before storing them on the network. Finally, the system is highly scalable and can handle many evidence submissions and retrievals. While several other decentralized storage systems use blockchain and IPFS technologies, such as Filecoin and Storj, the proposed system differs from these systems in several ways. The proposed
264
N. Salunke et al.
system is designed specifically for evidence storage and retrieval, while Filecoin and Storj are general-purpose storage systems. The proposed system also uses blockchain technology to ensure the authenticity and integrity of the evidence, while Filecoin and Storj do not. Finally, the proposed system offers a simple and user-friendly interface for evidence submission and retrieval, while Filecoin and Storj require more technical expertise to use.
4 Working of the System The approach utilized for designing and implementing a decentralized evidence storage system includes the integration of both blockchain and IPFS technologies. The blockchain technology establishes a secure and transparent ledger to manage access permissions, whereas IPFS offers a decentralized and distributed file storage system. The technical aspects of this integration, such as the system architecture design and its implementation, are discussed below. The architecture of a decentralized evidence storage system primarily comprises four key components, namely the user interface, blockchain network, IPFS network, and smart contract. The user interface allows users to interact with the system, while the blockchain network provides a secure and transparent ledger to handle access permissions. On the other hand, the IPFS network is utilized to store and retrieve evidence files, and the smart contract is deployed to manage file access permissions [5]. To implement the decentralized evidence storage system, the process starts with creating a blockchain network. This includes defining the blockchain protocol, creating smart contracts, and establishing nodes to host the blockchain network. Subsequently, IPFS is integrated into the system, which involves configuring IPFS nodes and building a distributed file storage system that enables the system to store and retrieve evidence files. After that, smart contracts are developed to handle file access permissions. This involves specifying the access control rules, designing functions to interact with the blockchain network, and deploying the smart contract to the blockchain network. Finally, the user interface is developed to enable users to interact with the system. This involves creating a web-based interface that enables users to upload, retrieve, and manage evidence files. The interface also features a dashboard that provides users with the ability to monitor the status of their files, manage access permissions, and track system activity. The methodology used for designing and developing a decentralized evidence storage system involves integrating blockchain and IPFS technologies to create a secure and transparent system for managing evidence files. The system architecture typically includes the user interface, blockchain network, IPFS network, and smart contract. The implementation involves creating a blockchain network, integrating IPFS, creating smart contracts, and developing a user interface.
Decentralized Evidence Storage System Using Blockchain and IPFS
265
Regarding the security analysis, our evaluation revealed that the system is wellprotected against potential attacks, including Sybil attacks, 51% attacks, and doublespending attacks. This is thanks to the tamper-proof and decentralized data storage provided by the blockchain consensus mechanism and IPFS-distributed storage architecture. We also evaluated the system’s scalability and found it to be able to handle a large number of transactions without any loss of performance. However, we did note that the cost of storage could increase significantly with a higher number of transactions. Overall, the proposed decentralized evidence storage system provides a dependable and tamper-proof solution for evidence storage, which is suitable for use in a variety of industries such as legal, financial, and health care [6]. In Fig. 1, we created the smart contract in solidity language. Smart contracts are programs which run when certain conditions are met. They must be compiled so that we can run them on the network. Step 1.1 shows compilation of our smart contract. We use hardhat to compile and deploy contracts on blockchain network. Step 1.2 shows migration of smart contract. Finally, smart contract is deployed on Goerli test network in step 1.3. Our Decentralized application (Dapp) is made using React framework. We use web3.js framework to run smart contracts from the frontend. The Dapp is connected to blockchain via web3 which is shown in step 2.1. Users can interact with Dapp through browser, executing the functions defined in the smart contract. When user interacts with the frontend, this framework helps us execute various functions in the smart contract. MetaMask is a cryptocurrency wallet used to interact with Ethereum blockchain. Users can use this wallet to interact with our Dapp. Every user must use a crypto wallet as it keeps our cryptocurrency and tokens. Any type of user can access our dapp and files stored but not everyone can add, update, or delete these. Some addresses have permission to upload files to our system. Such addresses will be used by the investigation agency to upload files. These files
Fig. 1 Flow diagram of evidence storage system
266
N. Salunke et al.
include all the materials related to evidence collected by them during the investigation. They may take photographs of scenes or collect forensic evidence. Crime scene investigators collect evidence such as fingerprints, footprints, tire tracks. File upload process takes place in two steps. Specific users can upload files of any form like JPG, PDF, etc. which is shown in step 3.2. In the next step 3.3, the application uploads file to IPFS. IPFS is an Interplanetary File System which stores files in a decentralized manner. On uploading a file, step 4.1 shows that IPFS returns a file hash or content identifier (CID). It is a label used to point content stored in IPFS. Next step performed is to add this hash value to the blockchain shown in step 5.1. To use this blockchain network or interact with the smart contract, we need to pay some cryptocurrency which is ethers in this case. This amount is called gas fees which we pay using crypto-wallet MetaMask shown in step 6.1. Once the uploaded file gets stored in blockchain, it gets reflected in front end of our website. Any user like police, courts, lawyers, or individuals can access these files.
5 Implementation A. Smart Contract To create a smart contract, having a good understanding of programming language is important, such as Solidity for Ethereum or Chain code for Hyperledger. After writing the smart contract, we must test it for any bugs because smart contract on mainnet is immutable, meaning that once they are deployed on blockchain, their code cannot be changed. We implemented the smart contract using Ethereum blockchain. We chose Ethereum as it is a decentralized blockchain-based software that has smart contract functionality [7, 8] and supports the second-largest cryptocurrency in the world called Ether. Our code is written in solidity language. We only store hash values, i.e., content identifier (CID), on the blockchain. This limits the storage used and limits the amount of gas fees required to run smart contracts on the blockchain. We tested our smart contract on Remix, which is a virtual environment. Later, we compiled the smart contract using a tool called hardhat and deployed it on the Goerli test network. Following is the pseudocode for this smart contract: 1. Create a contract named ‘FileStorage’. 2. // defining variable and its types. Declare a public unit named as ‘fileCount’ and initialize it to 0. Declare a public mapping named as files (key: fileCount, value: File). Declare a Struct named ‘File’ with following fields: { fileId, fileHash, fileSize, fileType, fileName, fileDescription, uploadTime, uploader}. 3. Function uploadFile (fileHash, fileSize, fileType, fileName, fileDescription) If at least one from the fileHash, fileType, fileName and fileDescription is missing.
Decentralized Evidence Storage System Using Blockchain and IPFS
267
return Increment ‘fileCount’ by 1. Create a new ‘File’ struct and add it to the ‘files’ mapping with ‘fileCount’ as the key. 4. End of the function. 5. End of the contract. B. Frontend A widely used framework called React is used for making the front-end part of the project. To integrate the website with blockchain, we used web3 API called web3.js. This API is implemented using javascript. This API allows dApps to interact with the blockchain networks and allows them to read and write data to blockchain and execute transactions. Using a web3 API is important as it can help to reduce the complexity of interacting with blockchain and makes it easier for developers to focus on building dApps. The frontend of our website consists of the address of the user currently using the dapp as shown in Fig. 2. Investigating agencies like the police can choose files related to evidence and upload them. Along with each file, a case ID should be mentioned to keep track of files. Each case ID can have multiple files. Every time investigating authorities upload files on the blockchain using our project, and they need to pay the gas fees for using this network. Figure 3 shows the total amount to be paid as gas fees which is 0.00039456 GoerliETH. Once payment is done, the new file uploaded is visible in the list of all uploaded files.
Fig. 2 Transaction fees
268
N. Salunke et al.
Fig. 3 Amount paid as gas fees
C. Interplanetary File System (IPFS) IPFS is a decentralized and distributed file system that uses a peer-to-peer network to share files. It aims to make the web faster, safer, and more open. It uses a unique hash value to identify data and it allows for faster and more efficient file sharing, reduces the need for centralized storage solutions, and enables a more private way of storing and sharing data [9]. For uploading and storing files over IPFS network, we used web3 storage platform which is a suite of APIs that make it easy to manage decentralized data. This platform makes data available to users across the open IPFS network, which is powered by Filecoin’s proven storage. Filecoin is a token protocol that supports decentralized storage network [10]. We inserted an API token given by this platform in our code.
Decentralized Evidence Storage System Using Blockchain and IPFS
269
When we add a file to IPFS, our file splits into smaller pieces and those pieces are spread across many different nodes or computers in a network. Later it is cryptographically hashed and given unique id called content identifier (CID) [11]. This CID is a permanent record of our file, which is unique and stored on blockchain. Any peers with the hash address of a file can access that file, download it, and read it locally or via the author’s node after it is posted to IPFS [12]. The proposed decentralized evidence storage system uses both the Ethereum blockchain and IPFS protocols for storing and retrieving evidence data. The Ethereum blockchain serves as a secure and immutable platform for managing evidence records and executing the smart contract that governs evidence storage and retrieval. Meanwhile, IPFS is used to store the actual evidence data in a distributed and decentralized manner. The evidence data are encrypted and redundantly stored across multiple nodes, ensuring its confidentiality and availability. To store evidence data on IPFS, the system employs a hash-based addressing mechanism. The evidence data are first encrypted and then hashed to create a unique content-based address. This address is used to locate and retrieve the evidence data from the IPFS network, ensuring its integrity and authenticity. Overall, this combination of blockchain and IPFS technologies provides a tamper-proof, secure, and decentralized solution for storing and retrieving evidence data. Any user like police, lawyers, citizens, etc. can access files from our website. Once user clicks on the link on our website, they are directed to IPFS website where they can view uploaded file. As shown in Fig. 4, file named as ‘My Photo.jpg’ is stored on IPFS network using CID. The URL used to locate any file is, where in this case, path of this file shown in Fig. 4 is: ‘https://dweb.link/ipfs/bafybeigqqzqpvhmi6rxwnjz7bijtzfqmxq3m5l3mf3snhez z35n722hsmi’.
Fig. 4 Accessing uploaded file
270
N. Salunke et al.
6 Security and Privacy Considerations Ensuring the security and privacy of data is of utmost importance in any digital system. While decentralized storage systems that utilize blockchain and IPFS technology offer several advantages over centralized systems, such as enhanced security and privacy, designing a decentralized evidence storage system that is both secure and private requires careful attention to the security and privacy implications of the underlying technologies [13]. In this section, we will delve into the security and privacy considerations of our proposed decentralized evidence storage system that uses blockchain and IPFS technology. We will explore the consensus mechanisms utilized, encryption techniques, network security measures, privacy concerns, auditability, and legal compliance aspects of the system. Through this analysis, our aim is to demonstrate the system’s robust security and privacy features while highlighting areas that require further research and improvement.
6.1 Consensus Mechanism The consensus mechanism utilized in a blockchain for a decentralized evidence storage system is crucial for the security and privacy of the system. Different consensus mechanisms are available, such as PoW, PoS, and DPoS. PoW provides robust security, but it has a high energy consumption and slow transaction speeds. On the other hand, PoS is energy-efficient but could compromise security if a validator holds a significant amount of cryptocurrency in circulation. DPoS is scalable and fast, but collusion attacks could compromise the security of the network. Therefore, the selected consensus mechanism should maintain a balance between security, privacy, scalability, and energy efficiency based on the specific requirements of the evidence storage system. A robust consensus mechanism guarantees tamper-proof evidence storage while safeguarding the privacy of users’ data and identity [14].
6.2 Encryption To ensure the security of a decentralized evidence storage system using blockchain and IPFS, encryption techniques are crucial. Encryption protects the data both at rest and in transit, ensuring that only authorized users can access it. The system uses symmetric encryption, asymmetric encryption, and hashing to encrypt the data. Symmetric encryption is fast and efficient, but the secret key must be securely distributed. Asymmetric encryption is slower but more secure, while hashing ensures data integrity checks. Key management is crucial to the system’s security, and a
Decentralized Evidence Storage System Using Blockchain and IPFS
271
decentralized key management system using a blockchain provides a possible solution. The encryption keys must be securely stored and restricted to authorized users only, and they should be regularly rotated to maintain their security.
6.3 Network Security In a decentralized evidence storage system using blockchain and IPFS, network security is critical. Several measures are taken to ensure network security and protect against common attacks such as DDoS, Sybil, and 51% attacks. Load balancers and firewalls are implemented to prevent DDoS attacks, while identity verification and proof of work are required to prevent Sybil attacks. In the case of 51% attacks, a consensus mechanism is used that makes it expensive for an attacker to gain control of the network. The system uses a distributed network to avoid a single point of failure, with nodes spread across the network to prevent attacks on a particular node. Encryption techniques are also utilized to prevent attacks and protect data integrity. Overall, a combination of measures is taken to ensure network security and prevent attacks that could compromise the integrity of the system [6].
6.4 Privacy Storing evidence on a public blockchain raises several privacy concerns that must be addressed to ensure the confidentiality of user identities and data. The proposed system uses pseudonyms to conceal user identities and encryption techniques to prevent unauthorized access to evidence. The use of access control mechanisms such as permissions and smart contracts further limits access to sensitive data. Compared to other decentralized storage systems, the proposed system provides a high level of privacy due to its pseudonym usage and encryption techniques. However, no system can guarantee complete privacy, and users must exercise caution when uploading sensitive data. In conclusion, the proposed system addresses privacy concerns by using encryption techniques and pseudonyms to ensure user confidentiality and data security.
6.5 Auditability The auditability of evidence is crucial in any evidence storage system. The proposed system achieves this by utilizing cryptographic hashing techniques that allow thirdparty auditors to verify the evidence’s integrity without compromising user privacy. Smart contracts ensure that access to evidence is only granted to authorized parties through a secure channel. Additionally, the system can use a distributed ledger to
272
N. Salunke et al.
store audit logs that record the actions taken on the evidence. Third-party auditors can access the audit logs while maintaining the privacy of sensitive information. This approach ensures that evidence can be audited without compromising user privacy. In conclusion, the proposed system maintains the privacy of sensitive information while still allowing third-party auditors to verify the authenticity and integrity of evidence.
6.6 Legal Compliance Ensuring compliance with legal regulations is a critical component of any evidence storage system. The proposed system addresses this requirement by incorporating a range of features such as encryption, access control, and auditability. To comply with data protection laws, the system stores personal data in a pseudonymous form, with access restricted through permissions and smart contracts. The system also implements measures to ensure that the evidence collected meets the legal requirements for admissibility in court. For example, the system securely stores evidence and timestamps it to prevent tampering or destruction. The use of cryptographic hashing techniques and digital signatures further enhances the integrity and authenticity of evidence. The system logs all actions taken on the evidence, making it audit-ready and meeting legal requirements. In conclusion, the proposed system complies with legal regulations regarding data protection, evidence collection, and storage by implementing encryption, access control, and auditability features, thereby ensuring the integrity, authenticity, and admissibility of evidence in court.
7 Comparison with Centralized Storage Solutions When comparing decentralized evidence storage systems that utilize blockchain and Interplanetary File System (IPFS) with traditional centralized storage solutions, several key differences emerge.
7.1 Security Decentralized evidence storage systems that use blockchain and IPFS offer a higher level of security compared to centralized storage solutions. The immutability of the data stored on the blockchain and the encryption of data stored on IPFS make it challenging for unauthorized users to tamper with or access the data. In contrast, centralized storage solutions are more susceptible to cyberattacks and data breaches since they rely on a single point of failure.
Decentralized Evidence Storage System Using Blockchain and IPFS
273
7.2 Privacy Decentralized evidence storage systems offer superior privacy compared to centralized storage solutions. In a decentralized system, the data owner has complete control over their data, and third-party service providers do not have access to the data. This reduces the vulnerability of the system to data breaches and hacking attempts. On the other hand, centralized storage solutions require trust in the service provider to ensure data security.
7.3 Decentralization Decentralized evidence storage systems provide a distributed network of storage nodes, which offers redundancy and prevents a single point of failure. This ensures that if one storage node goes offline, the data remains accessible on other nodes. On the other hand, centralized storage solutions rely on a single point of storage, which makes them more vulnerable to data loss in the event of hardware failure or natural disasters.
7.4 Speed Centralized storage solutions can be faster than decentralized evidence storage systems. This is because in a centralized system, the data is stored on a single server, making it quicker to retrieve data compared to a decentralized system where data are stored on multiple nodes. However, it is important to note that this speed advantage comes at the cost of compromising security and privacy.
7.5 Cost Decentralized evidence storage systems may be more expensive to maintain and operate compared to centralized storage solutions. This is because the cost of maintaining the blockchain network, IPFS nodes, and storage layer can be higher, which can make it less attractive to smaller organizations or individuals. In contrast, centralized storage solutions are often cheaper to maintain and operate due to economies of scale.
274
N. Salunke et al.
8 Advantages of Proposed System 8.1 Improvement in Transparency Provides transparency and traceability for everyone involved from the beginning of the evidence collection process.
8.2 Evidence Remains Immutable In this system, files are broken up into smaller chunks and each chunk is stored on a different node in the IPFS network. The blockchain is then used to keep track of the location of each chunk and to ensure the integrity of the files [15]. Because each chunk is stored on a different node, it becomes much more difficult for anyone to compromise the integrity of the file. This means once evidence is added on the blockchain, it cannot be altered or deleted [16].
8.3 Restore Trust of People in Courts Innocent people will no longer be incriminated because of tampered evidence [17].
8.4 Improve Courts’ Efficiency All evidence are available to all stakeholders on blockchain. So, there will be no case of evidence getting lost, thereby reducing the time required for trials and allowing courts to clear backlogs in less time.
8.5 Higher Performance IPFS is a peer-to-peer hypermedia protocol. It obtains data from several peers at once based on CID, which can save bandwidth for storing files. IPFS allows for data to be stored across multiple nodes [18]. This way instead of downloading files from one central location, we can download it from many different sources at the same time. This can make downloads faster and more reliable. This also ensures high durability and availability of the stored evidence.
Decentralized Evidence Storage System Using Blockchain and IPFS
275
8.6 Minimized Storage Costs When we upload any file to IPFS, it splits into smaller chunks and is cryptographically hashed. If we upload the same file with minor changes, it again splits into smaller chunks and is stored, but now common chunks across files which are similar having no changes are reused to minimize the storage costs. Also, there is no need for expensive infrastructure or centralized storage solutions, making it cost-effective option.
8.7 Reliable As all active nodes keep complete copies of the ledger, the use of blockchain makes this system reliable. There is no single point of failure with a blockchain. As a result, even if one node fails, the ledger is still easily accessible to all other network users.
9 Use Cases and Applications of the System Decentralized evidence storage systems using blockchain and IPFS have several potential use cases and applications. One of the most prominent applications is in the legal industry, where the immutability and transparency of the blockchain can help to secure and verify evidence in legal cases. This can help to reduce the risk of tampering, fraud, and disputes over evidence. Additionally, the system can be used for other industries that require secure and tamper-proof evidence storage, such as insurance, finance, and health care. The distributed nature of the system also makes it useful for applications that require data redundancy and availability, such as disaster recovery and archival storage. Furthermore, the system can be used to enable secure and transparent sharing of evidence among multiple parties, which can be useful in scenarios such as cross-border legal cases or collaborations between organizations. Overall, decentralized evidence storage systems have the potential to revolutionize the way evidence is stored, verified, and shared in various industries. Here are some potential use cases and applications for a decentralized evidence storage system using blockchain and IPFS.
9.1 Legal and Regulatory Compliance Industries such as finance and health care, which operate in heavily regulated environments, have a legal obligation to maintain precise and immutable records. To comply with the regulatory and legal requirements, decentralized evidence storage systems using blockchain and IPFS can offer a secure and transparent record of all
276
N. Salunke et al.
data and transactions that are difficult to tamper with. This approach can simplify compliance and increase efficiency for such organizations [6].
9.2 Supply Chain Management Industries that rely on supply chains to deliver their products need to maintain an accurate and secure record of all transactions and movements of goods. To achieve this, a decentralized evidence storage system that utilizes blockchain and IPFS can provide a temper-proof and auditable record of all transactions, from the purchase of raw materials to the final delivery of the finished product. This system can enhance transparency and traceability throughout the supply chain, which can help companies to identify inefficiencies and streamline their operations. Additionally, a decentralized storage system can prevent fraudulent activities and reduce the risk of counterfeiting, which can be crucial in industries where authenticity is critical, such as the food and pharmaceutical sectors.
9.3 Intellectual Property Protection The protection of intellectual property rights is of utmost importance in many industries, but safeguarding digital assets can be challenging. A decentralized evidence storage system utilizing blockchain and IPFS can potentially offer a secure and unchangeable record of all intellectual property rights, usage rights, and licensing agreements. This system could ensure the rightful ownership of digital assets and prevent unauthorized usage, ensuring that individuals and organizations can effectively protect their intellectual property rights.
9.4 Digital Identity Verification With the rise of digital transactions and interactions, the need for secure online identity verification has become critical. However, centralized identity verification systems are vulnerable to cyberattacks and data breaches, which can lead to compromised user data. To address this, decentralized evidence storage systems using blockchain and IPFS can provide a more secure and tamper-proof record of all identity verification data. By storing this information on a distributed network of nodes, rather than a single central server, it becomes much more difficult for unauthorized users to access or tamper with the data. This can help to ensure the security and privacy of users’ personal information, making online identity verification more reliable and trusted.
Decentralized Evidence Storage System Using Blockchain and IPFS
277
9.5 Evidence Collection and Preservation Preserving and presenting evidence are essential in legal proceedings, and maintaining the integrity of that evidence is paramount. A decentralized evidence storage system that leverages blockchain and IPFS could offer a secure and tamper-proof solution for storing and accessing legal evidence. By ensuring that all evidence are stored on a decentralized network with an immutable record, the system could provide reliable and auditable records of evidence, making it admissible in court. Furthermore, the system could streamline the collection and management of evidence, ensuring that it is properly cataloged, labeled, and accessible to authorized parties [19].
9.6 Decentralized File Sharing Centralized file-sharing services are at risk of data breaches and cyberattacks. However, a decentralized system for storing evidence that utilizes blockchain and IPFS could offer a secure and confidential method for users to share files without depending on a centralized service provider.
10 Limitations and Future Scope Although the proposed method uses blockchain and IPFS to store tamper-proof data to solve current problems, we have identified several implementation-specific constraints. First, the proposed solution does not entirely restrict access to hash values. The existing implementation is predicated on the idea that people using this system will not share the hashes of files uploaded to the network to other people. A legitimate user can share hashes with individuals outside the network via email or any other software without being hindered by any functionality, which could result in unauthorized access to the file by unauthorized parties. Second, there is a limited adoption and understanding of blockchain technology by the public, law enforcement agencies, and the judicial system. Third, there are also regulatory and legal challenges in court that need to be addressed, such as admissibility of electronic evidence in court. In subsequent developments, we hope to reduce these restrictions. Future work will focus on adding more functionalities to track all uploads, downloads, and other activities done by users. Additionally, work can be done on integration of our proposed system with existing systems used by law enforcement agencies and the judicial system. Also, there is a need for standardization and compatibility between different blockchain platforms to make it easier to transfer evidence between different systems.
278
N. Salunke et al.
Decentralized evidence storage systems that employ blockchain and IPFS technology offer a range of advantages in comparison to traditional centralized storage solutions. They are noted for their superior security, privacy, and decentralized infrastructure, which is less vulnerable to cyberattacks and data breaches. However, the speed and cost-effectiveness of these systems may be inferior to that of centralized alternatives. Ongoing research and development are needed to address these concerns and to explore new applications for the technology, including integration with other emerging technologies such as artificial intelligence and the Internet of Things. It is also important to address the environmental impact of these systems and to find ways to reduce their energy consumption. Overall, decentralized evidence storage systems hold great promise for the future, but more work is needed to fully realize their potential.
11 Conclusion Centralized evidence storage systems are plagued with various challenges such as data tampering, loss, and unauthorized access. Decentralized solutions utilizing blockchain and IPFS present a promising alternative to these challenges by ensuring data integrity, availability, and accessibility. Our proposed decentralized evidence storage system leverages blockchain technology’s immutability and consensus mechanism and IPFS’s distributed nature to provide a secure and reliable evidence storage solution. The evaluation results indicate that the system is efficient, secure, and scalable, making it applicable in legal, financial, and healthcare industries. Nonetheless, the proposed system has some limitations such as high storage costs and network congestion, which require further investigation. Future research could focus on enhancing the system’s scalability and reducing data storage costs [20]. In summary, the decentralized evidence storage system using blockchain and IPFS is a promising solution for secure, reliable, and tamper-proof evidence storage, and it has the potential to contribute significantly to the field of decentralized applications. In this paper, we proposed a new system and provided a prototype for storing evidence on a decentralized network using Ethereum blockchain and IPFS. Blockchain guarantees integrity, accessibility, transparency, security, and authenticity. As it is not feasible to store files on blockchain, we used IPFS to store them. Files stored on IPFS are divided into smaller chunks and cryptographically hashed. This makes the files resistant to tampering. The combination of blockchain and IPFS gives a guarantee that authentic evidence is presented in the court.
Decentralized Evidence Storage System Using Blockchain and IPFS
279
References 1. Zheng Q, Li Y, Chen P, Dong X (2018) An innovative IPFS-based storage model for blockchain. In: IEEE/WIC/ACM International conference on web intelligence (WI). https://doi.org/10. 1109/WI.2018.000-8 2. Muday NA, Chandra GR (2020) Blockchain in the legal profession: a boon or a bane?. In: 2020 8th international conference on reliability, infocom technologies and optimization (trends and future directions) (ICRITO). IEEE 3. Nakamoto S, Bitcoin A (2008) A peer-to-peer electronic cash system. Bitcoin. —URL: https:// bitcoin.org/bitcoin. pdf 4 4. Sonawane S, Motwani D (2022) Issues of commodity market and trade finance in India and its solutions using blockchain technology. In: Emerging technologies in data mining and information security proceedings of IEMIS 2022, vol 1, pp 451–459 5. Karafiloski E, Mishev A (2019) Decentralized data storage on the blockchain: a survey of the state-of-the-art. J Parallel Distrib Comput 130:58–75 6. Dai W, Sun L, Wu D (2019) Secure data storage system based on blockchain and IPFS. IEEE Access 7:54625–54636 7. Buterin V (2014) A next-generation smart contract and decentralized application platform. white paper 3(37):1–36 8. Wood G (2014) Ethereum: a secure decentralized generalized transaction ledger. Ethereum Proj Yellow Paper 151(2014):1–32 9. Guidi B, Michienzi A, Ricci L (2021) Data persistence in decentralized social applications: the IPFS approach. In: 2021 IEEE 18th annual consumer communications & networking conference (CCNC), 9–12 Jan 2021. https://doi.org/10.1109/CCNC49032.2021 10. Psaras Y, Dias D (2020) The interplanetary file system and the filecoin network. In: 2020 50th Annual IEEE-IFIP international conference on dependable systems and networks-supplemental volume (DSN-S). IEEE 11. Benet J (2014) IPFS-content addressed, versioned, P2P file system. arXiv preprint arXiv:1407. 3561. Cornell University 12. Thuraisingham B (2020) Blockchain technologies and their applications in data science and cyber security. In: 2020 3rd international conference on smart blockchain (smartblock). 10.1109 /SmartBlock52591.2020 13. Antonopoulos AM (2014) Mastering bitcoin: unlocking digital cryptocurrencies. O’Reilly Media, Inc. 14. Khairullah M, Zulkernine M, Gondal I (2019) Proof of Storage with blockchain and IPFS. J Syst Softw 157:110391 15. Makina H, Letaifa AB, Rachedi A (2022) Leveraging edge computing, blockchain and IPFS for addressing ehealth records challenges. In: 2022 15th international conference on security of information and networks (SIN). https://doi.org/10.1109/SIN56466.2022 16. Malhotra D, Srivastava S, Saini P, Singh AK (2021) Blockchain based audit trailing of XAI decisions: storing on IPFS and ethereum blockchain. In: 2021 international conference on communication systems & networks (COMSNETS). https://doi.org/10.1109/COMSNETS5 1098.2021 17. Srivasthav DP, Maddali LP, Vigneswaran R (2021) Study of blockchain forensics and analytics tools. In: 2021 3rd conference on blockchain research & applications for innovative networks and services 18. Nyaletey E, Parizi RM, Zhang Q, Choo KK (2019) BlockIPFS—blockchain-enabled interplanetary file system for forensic and trusted data traceability. In: 2019 IEEE international conference on blockchain (blockchain). https://doi.org/10.1109/Blockchain48018.2019
280
N. Salunke et al.
19. Gao F, Zhang J (2019) A new evidence collection and storage model based on blockchain technology. IEEE Access 7:78235–78246 20. Li C, Zhao J (2020) Data storage and sharing scheme based on IPFS and blockchain. IEEE Access 8:123310–123323
Investigation of DCF Length and Input Power Selection for Optical Transmission Systems Manjit Singh, Butta Singh, Himali Sarangal, Vinit Grewal, and Satveer Kour
Abstract A crucial component of the global broadband networks’ telecommunications backbone is optical communication systems. In today’s applications, a wide bandwidth signal transfer with less delay is essential. Optical fibers are currently the transmission medium of choice for long distance and high data rate transmission in telecommunication networks as they offer immense and unparalleled transmission bandwidth with minute delay. The simulation results from an optical transmission system using dispersion compensating fiber (DCF) as a nonlinear compensator are discussed in this work. Although there are many other kinds of optical fiber, DCF is the most frequently used component to correct for dispersion in optical communication systems. The transmission system simulations are carried out using the Opti-System simulator, and they are examined using the quality factor, power level, noise level, along with eye diagram. According to the simulation results, the 2 km length of DCF provides the highest quality factor values for 10 km of optical fiber. Additionally, it is evident that input power needs to be kept up with between 5 and 10dBm to acquire stable quality factor. Keywords Optical transmission system · DCF · Opti-System simulator · Q-parameters · Attenuation coefficient · Dispersion compensation
1 Introduction Distance communication that uses light to transport information is referred to as optical communication, also known as optical telecommunication. It can be carried out visually or with the use of technology. A transmitter converts a message into M. Singh · B. Singh (B) · H. Sarangal · V. Grewal Department of Engineering and Technology, Guru Nanak Dev University, Regional Campus, Jalandhar, India e-mail: [email protected] S. Kour Department of CET, Guru Nanak Dev University, Amritsar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_19
281
282
M. Singh et al.
an optical signal, a channel transports the signal to its destination, and a receiver reconstructs the message from the optical signal that was received are all components of an optical communication system [1–4]. Optical communication systems can be divided into guided and unguided broad groups. According to what the name suggests, the optical beam that the transmitter emits in guided light wave systems is kept within a specific area. Since all guided optical communication systems currently in use make use of optical fibers, the term fiber optic communication systems is used to refer to them [5–7]. Although it should typically refer to both guided and unguided systems, the term “light wave system” is also occasionally used to refer to fiber optic communication networks. Due to qualities like tiny size or dimension, minimal loss, and little interference from the outside environment, optical fiber is employed in telecommunication systems [8]. The channel’s job in communication is to carry the optical signal undisturbed from transmitter to receiver. Due to silica fibers’ ability to transmit light with losses as small as 0.2 dB/Km, optical fibers are typically used as the communication channel in systems. Even then, after 100 km, optical power drops to just 1%. Consequently, fiber losses continue to be a significant design issue that affects the spacing between repeaters or amplifiers in long-haul light wave systems. As light travels through the fiber, its intensity level decreases, resulting in a deterioration of the signal. This phenomenon is known as attenuation [9–13]. Attenuation is the loss of signal strength in networking cables or connections. It can either be inherent within the glass, known as intrinsic attenuation, or it can be caused by external factors, known as extrinsic attenuation. For fiber optics to perform their principal purpose of effective light transmission in long-distance communications, signal strength and shape management is of utmost importance. Signal level deterioration is caused by a variety of factors, including atomic structure, point defects, and structures created during the fiber production process. P(z), the optical power of light after traveling along the fiber length for distance z, can be related to input optical power is as given below P (0) [1, 14–17] as follows: P(Z ) = P(0)e−α Z
(1)
In the above equation, α is the known attenuation coefficient. The equation for attenuation coefficient in decibel (dB/Km) is as written below: α(dB/Km) =
P(0) 10 log . Z P(Z )
(2)
Understanding the nonlinear effects in optical fibers, which result in the variety of constraints on the communications link, is a must for light wave system designers. The two main types of nonlinear effects are those caused by the Kerr effect, which modifies the refractive index of silica, and those caused by stimulated scattering processes. Stimulated scattering processes are of two types stimulated Brillouin scattering (SBS) and stimulated Raman scattering (SRS), which are caused by interfaces between optical signals and molecular or acoustic vibrations in the fiber [1, 18–22].
Investigation of DCF Length and Input Power Selection for Optical …
283
Different effects mentioned above need to be estimated for a specific design in order to determine the signal’s error rate in order to limit distortion to a reasonable level. Because of the market’s demand for higher data rates, it is not possible to design optical transmission systems in a conventional manner; instead, specific techniques for mitigating distortions are needed. The primary considerations taken into account when designing the fiber are fiber losses. The development of optical amplifiers, however, provides a solution to the fiber losses that reduce the effectiveness of optical communication systems. However, a variety of factors, like the dispersion that happens as impulses in the fiber spread out during propagation, affect the performance of fiber. There are several options. Examples of fibers that can be used instead of standard fiber to lessen distortion caused by dispersion include dispersion shifted fiber (DSF), non-dispersion shifted fiber (NDSF), zero dispersion shifted non-zero dispersion, reduced dispersion slope fiber, and dispersion flattened fiber. Devices of a unique type called soliton pulse signals can be employed to combat dispersion. To manage the nonlinear effects, coding approaches are also useful [2, 23–28]. Optical Equalizers (OEQ), Tunable Dispersion Compensators (TDC), and Polarization-Mode Dispersion Compensators (PMDC) are the three different types of Distortion Compensating Devices [4]. A component of the dispersion mitigation system is the polarization controller. The designing of an optical communication system is done in this work, to find the optimum value of input power and DCF length at which we obtain can highly stable signal output. Lin et al. proposed dispersion compensating using DCF in 1980 is which becomes the most popular technique today. Dispersion compensating fiber is a technique that is used to compensate for dispersion (DCF). A single-mode fiber (SMF) having a diameter of tiny core is called DCF. DCF has a significant negative group velocity dispersion value as well as chromatic dispersion. Combining fibers with an acceptable length and chromatic dispersion with the opposite sign (negative) yields an average dispersion that is close to zero. DCF can be positioned anywhere along the network and can range in length from a few kilometers to many more. The demonstration DCF as a nonlinear controller is in 1992 [5–7, 12, 13]. Insertion of negative dispersion into the link used to compensate the positive dispersion in a standard single-mode fiber is the basic principle of DCF and is shown in Fig. 1. Total attenuation and total dispersion of the optical link is given by
Optical Transmitter
DT = DSMF L SMF + DDCF L DCF ,
(3)
αT = αSMF L SMF + αDCF L DCF .
(4)
Single Mode Fiber (SMF)
Fig. 1 Dispersion compensation using DCF
Dispersion Compensating Fiber (DCF)
Optical Receiver
284
M. Singh et al.
It is highly recommended that DCF must have as high a negative dispersion as possible and as low attenuation as possible to avoid high value of attenuation in optical link. Every research project in the field of communication always places the issues impacting the performance of optical communication systems, such as attenuation and dispersion, at the top of its priority list. Solving such issues are technologies like DCF [29–33]. In order to reduce the elements that degrade performance, the optical communication system will undergo a great deal more adjustments. The key needs still need to be improved, including high transmission capacity, minimal losses, good quality signal across long distances, etc. Changes in the environment can have an impact on a system’s performance, which calls for more detailed investigation and analysis [34]. The optical system will produce superior results with the aid of DCF. Dispersion compensation fiber is one of the best and suitable ways to remove dispersion from the optical communication system so that analysis of other factors like attenuation coefficient can be done without any hindrance. DCF uses different techniques to remove dispersion that are pre compensation, post compensation and systematic compensation. DCF fulfills the high bandwidth requirement of the user making it popular in the field of optical communication. DCF helps to remove dispersion from the system; hence, it becomes easy to perform other operations on the system for better signal quality and high performance. It is one of the best techniques to remove nonlinearities from the system. In the era of advance communication technology, optical communication system with the help of DCF is serving the world with its enormous services [35–37]. In this paper, the performance of the optical communication system is enhanced by calculating the optimum DCF length ranges and input power values.
2 Simulative Setup Any communication system uses electromagnetic waves to transfer data from one location to another. The level of international communication has increased thanks to fiber optic communication, which transmits data through optical fiber. The fundamental components of the communication system are an optical transmitter, an optical fiber, and an optical receiver. The primary purpose of a channel is to maintain signal integrity as it transmits data from the transmitter to the receiver block. The main purpose of fiber optics, which are required for various applications, is the efficient transmission of light at the relevant operational wavelength(s). In order to meet the expectations for high speed and high efficiency, optical fiber networks must thoroughly investigate all performance-determining factors. In order to increase the channel’s transmission capacity, dispersion management techniques might be applied. Dispersion compensation fiber is a technique used to compensate for dispersion (DCF). Since nonlinear effects inside the fiber are insignificant, an all-optical method totally eliminates the dispersion. Using DCF at various ranges, an optical communication system is created and researched. The performance of single-mode
Investigation of DCF Length and Input Power Selection for Optical …
PRBS Generator
NRZ Pulse Generator Optical Fiber Length
Ideal Dispersion Compensator
285
EDFA
Photo Detector DCF
CW Laser
MachZehnder Modulator
BER Analyzer
Low-pass Bessel Filter
Fig. 2 Block diagram of proposed system
fiber is assessed using DCF at various distances while developing optical communication systems. Figure 2 depicts the block diagram of the proposed system. The examination of an optical communication system with a 2.5 Gb/s bit rate uses the Opti-System software. The main objective of simulation is to enhance the performance of optical communication system with the use of DCF. The system is divided into three parts: the transmitter, the channel, and the receiver. A laser with a frequency of 193.1 THz and a power of 0dbm is present in the transmitter part. The Non-Return to Zero (NRZ) pulse generator provides NRZ coded signal, which is then digitally modulated by the user data using the Mach– Zehnder (MZ) modulator. The Pseudo Random Binary Sequence (PRBS), which has several working modes, produces data. Subsequently, the transmitter’s output signal is delivered into the DCF, which suppress the spreading of the signal that in turn increases the transmission rate. The output of the DCF is given to optical fiber channel [7–9]. Following an EDFA, photodiode, an eye diagram analyzer, and an electrical parameter, an electrical signal is created at the receiver from the channel’s optical output [38–40]. EDFA is an erbium-doped fiber amplifiers. In terms of longdistance optical fiber communications, it is one of the most effective fiber amplifiers because it can efficiently amplify light in the 1.5 microwave length zone, which is where telecom fibers have the lowest loss. The sequential waveforms that make up a composite image are simply superimposed to create eye diagrams using an eye diagram analyzer. Nonlinearities impact can also be plainly recognized. The simulation is carried out at a wavelength of 1550 nm. The length of SMF is 10 km, and DCF length is kept variable so as to achieve most suitable length giving best output. Value of dispersion for DCF is − 80 ps(nm km) and for SMF value of dispersion is 17.5 ps/(nm km). Dispersion slope and differential group delay of both SMF and DCF are 0.008 ps/nm2 /km and 3 ps/km, respectively. The value of attenuation in case of SMF is 0.25db/km and for DCF its value is 0.6db/Km.
286
M. Singh et al.
3 Results and Discussion An optical communication system with a 2.5 Gb/s bit rate is being designed and analyzed. Several DCF (0, 1, 2, and 3 km) values are used to investigate the system for power levels of 0, 5, 10, 15, 20 dBM. At various ranges, an ideal DCF is computed in order to reduce the nonlinearities of the fiber and increase the system’s transmission capacity. To enhance the overall performance of the optical communication system, the proposed optical communication is simulated at various input power values, and having SMF length is 10 km for all five scenarios, in order to determine the ideal DCF length and ideal input power level. The above simulation has been repeated for varying the DCF length from 0 to 3 km with step size of 1 km. The signal-to-noise ratio of the digital signal is how the quality factor evaluates its quality (SNR). The SNR improved with a higher Q factor value, which decreased the likelihood of bit errors. The simulated result calculated the optimal DCF length to get the greatest and most stable quality factor. Figure 3 demonstrates the graph for various DCF lengths between optical input power and Q factor. From the graph, it is found that at DCF length of 2 km and input power level between 5 and 10 dBm, we obtained highest and constant quality factor. At 2 km, we received the maximum value of quality factor, and the maximum value of quality factor is achieved at 5dBm and 10dBm. After 10dBm, the quality factor decreases. At 3 km, the quality factor value decreases. So, optimum value of DCF length is achieved at 2 km. By comparing our system with ref [1], we find that the optical system operating with the optimum DCF length performs better than the ordinary optical system. Table 1 indicates the value of received Q factor and output power at an SMF length of 10 km. From the results, it is clear that highest Q factor is obtained at DCF length of 2 km. So, optical system performance can be enhanced by using optimum value of DCF. It is 2 km in this simulation. Further, output power is maximum at DCF length 0 km. Thus, the optimum selection of DCF values remain concerned for the optical system performance. The values change when other fiber lengths are considered. All different values helped to investigate properly and design a communication system that is more enhanced and performs better also nonlinear effects are minimized here. The eye diagrams at a DCF length of 0, 1, 2, and 3 km for input powers 0, 5, 10, and 20 dBm are shown in Figs. 4, 5, 6, and 7, respectively. All the eye diagrams depict that even by increasing the DCF length or fiber length best results are not obtained. It is evident that increasing the DCF length has no positive impact on the output of the communication system; rather, the system just operates on its best when the length is increased. As in Fig. 5, it is observed that the distortion is less at DCF length of 2 km and maximum at 0 km. Similarly, at power of 10dBm, distortion is less at 2 km of DCF length and maximum at 0 km followed by 1 km and 3 km. In other words, the system operates more effectively with a DCF length of 2 km as opposed to 3 km.
Investigation of DCF Length and Input Power Selection for Optical …
287
Fig. 3 Optical input power verses quality factor for different lengths of DCF
Table 1 Q factor and output power at an SMF length of 10 km
Length of the DCF (Km)
Q factor
Output power (dBm)
0
83
17.22
1
274
17.19
2
500
17.09
3
126
17.05
4 Conclusion The effectiveness of digital and analog transmission through optical fiber is impacted by attenuation and dispersion. To keep the original signal, the distorted and attenuated signal must be changed. Controlling both attenuation and dispersion is necessary for the system to operate effectively. The fibers must have opposite dispersion values to reduce signal distortion and degradation. Every research project in the field of communication always places the issues impacting the performance of optical communication systems, such as attenuation and dispersion, at the top of it priority list. Solving such issues are technologies like DCF. The system for optical communication will experience a great deal more adjustments in an effort to reduce performance-degrading elements. The key needs still need to be improved, including high transmission capacity, minimal losses, good quality signal across long distances, etc. Changes in the environment can have an impact on a system’s performance, which calls for more detailed investigation and analysis.
288
M. Singh et al.
Fig. 4 Eye diagram at 0 dB power with DCF length of 0, 1, 2, and 3 km
The optical system will perform better thanks to DCFs, and combining various techniques will help produce better outcomes that meet the needs. The optical communication industry is seeing significant investment in order to reduce both current and impending problems. There is a bright future for this industry, and several significant milestones will be reached over time. DCF will be a key component of optical networks. This study examines how an optical communication system’s quality and signal intensity are affected by the DCF length and input power level. The Q factor, power output, and noise power are the measured parameters that are used. It has been determined that a DCF with a length of 2 km and an input power level of 5 to 10 km can be used to produce reliable, high-quality output for a 10 km optical system. It is evident that increasing the DCF length has no positive impact on the system’s performance, it operates best only at a specific value of DCF length.
Investigation of DCF Length and Input Power Selection for Optical …
Fig. 5 Eye diagram at 5 dB power with DCF length of 0, 1, 2, and 3 km
289
290
Fig. 6 Eye diagram at 10 dB power with DCF length of 0, 1, 2, and 3 km
M. Singh et al.
Investigation of DCF Length and Input Power Selection for Optical …
291
Fig. 7 Eye diagram at 20 dB power with DCF length of 0, 1, 2, and 3 km
References 1. Proakis JG (2001) Digital communications, 4th edn. McGraw Hill 2. Ghassemlooy Z, EN554 photonic networks lecture 1: introduction. The University of Northumbria U.K. 3. Agrawal GP (2001) Applications of nonlinear fiber optics. Academic Press, San Diego 4. Lin H, Cohen LG (1980) Optical-pulse equalization of low dispersion transmission in singlemode fibers in the 1.3–1.7 μm spectral region. Opt Lett 5(11):476–478 5. Dugan JM, Price AJ, Ramadan M, Wolf DL, Murphy EF, Antos AJ, Smith DK, Hall DW (1992) All-optical, fiber-based 1550 nm dispersion compensation in a 10 Gb/s, 150 km transmission experiment over 1310 nm optimized fiber. In: Optical fiber communication (OFC). San Jose, CA 6. Singh M, Sappal AS (2020) Radio over fiber (RoF) link modelling using cross term memory polynomial. J Opt Commun
292
M. Singh et al.
7. Wenhua C (2022) Improved compensation of intrachannel four-wave mixing in dispersionmanaged transmission links with mid-span optical phase conjugation. Opt Commun 530:129185 8. Zhao R, Xu N, Shang X, Zhao L, Zhang H, Li D (2021) Generation of Q-switched-mode-locked operations in Er-doped fiber laser based on dispersion compensating fiber saturable absorber. J Luminescence 234:117966 9. Wang F, Lu Y, Wang X, Ma T, Li L, Yu K, Liu Y, Li C, Chen Y (2021) A highly sensitive temperature sensor with a PDMS-coated tapered dispersion compensation fiber structure. Opt Commun 497:127183 10. Nsengiyumva I, Mwangi E, Kamucha G (2022) A comparative study of chromatic dispersion compensation in 10 Gbps SMF and 40 Gbps OTDM systems using a cascaded Gaussian linear apodized chirped fiber Bragg grating design Isidore. Heliyon 8:e09308 11. Chakkour M, Aghzout O, Ahmed BA, Chaoui F, Yakhloufi ME (2017) Chromatic dispersion compensation effect performanceenhancements using FBG and EDFA-wavelength division multiplexing optical transmission system. Int J Opt 8 12. Kaur R, Singh M (2017) A review paper on dispersion compensation methods. Int Res J Eng Technol 4(6) 13. Sharma A, Singh A, Kamal TS, Vishal (2008) Investigations on power penalty at different spectral width using small signal analysis with higher-order dispersion. Optik 119:53–56 14. Ebadil A, Seraji FE, Mohajerani E, Darvishi S (2011) Study of dispersion and its relationship with power confinement in a single-mode optical fiber. In: International conference on laser & amp; fiber-optical networks modeling 15. Basu M, Tewari R, Acharya HN: Effect of grading on the characteristics of a dispersion compensated fiber. Opt Commun 174:119–125 16. Ghosh D, Basu M (2010) Efficient dispersion tailoring by designing alternately arranged dispersion compensating fibers and fiber amplifiers to create self-similar parabolic pulses. Opt Laser Technol 42:1301–1307 17. Afsal S, Athira A, Rahul S (2016) A novel approach for the enhancement of fiber optic communication using EDFA. IEEE WiSPNET 18. Beibei C, Haihu Y, Yiwen W, Liyan Z (2013) Dispersion compensating fibers with improved splicing performance. Phys Procedia 48:96–101 19. Mohammadi S, Mozzaffari S, Shahidi M (2011) Simulation of a transmission system to compensate dispersion in an optical fiber by chirp gratings. Int J Phys Sci 6(32):7354–7360 20. Kimura T, Daikoku K (1977) A proposal on optical fiber transmission systems in a low loss l.0–1.4 μm wavelength region. Opt Quant Elect 9:33 21. Kapron FP, Keck DB, Mauer RD (1970) Radiation losses in glass optical waveguides. Appl Phys Lett 17:423 22. Nakagawa K, Okano Y, Yameda E, Hiramatsu H, Ohgushi Y, Minejima Y et al (1977) Initial trial of optical fiber transmission systems in N.T.T.: Repeater design and performance. In: Proceedings international conference integrated optics and optical fiber communications 23. Wilkins G (1977) Fiber optic cables for undersea communications. Fiber Integr Opt 1:39 24. Kurkjian CR (1976) High strength silica fibers for optical communications. Bull. Am. Ceramic Soc. 55:9 25. Iida D, Honda N, Izumita H, Ito F (2007) Design of identification fibers with individually assigned Brillouin frequency shifts for monitoring passive optical networks. J Lightwave Technol 25(5):1290–1297 26. Takahashi H, Ito F, Kito C, Toge K (2013) Individual loss distribution measurement in 32blanched PON using pulsed pump-probe Brillouin analysis. Opt Express 21(6):6739–6748 27. Enomoto Y, Izumita H, Mine K, Tomita N (2011) Design and performance of novel optical fiber distribution and management system with testing functions in central office. J Lightwave Technol 29(12):1818–1834 28. Wegmuller M, Legre M, Gisin N (2002) Distributed beatlength measurement in single-mode fibers with optical frequency-domain reflectometry. J Lightwave Technol 20(5):800–807
Investigation of DCF Length and Input Power Selection for Optical …
293
29. Iida H, Koshikiya Y, Ito F, Tanaka K (2012) Ultra high sensitive coherent optical time domain reflectometry employing frequency division multiplexing. J Lightwave Technol 30(8):1121– 1126 30. Vidmar M (2001) Optical-fiber communications: components and systems. Informacije MIDEM 31(4):246–251 31. Mitra PP, Stark JB (2001) Nonlinear limits to the in-formation capacity of optical fibre communications. Nature 411(6841):1027–1030 32. Ghassemlooy Z, Popoola W, Rajbhandari S (2013) Optical wireless communications: system and channel modelling with MATLAB. CRC Press Taylor & Francis Group 33. An HL, Lin XZ, Pun EYB, Liu HD (1999) Multi wavelength operation of an erbium-doped fiber ring laser using a dual-pass Mach-Zehnder comb filter. Opt Commun 169:159–165 34. Li MJ, Chen X, Nolan DA, Wang J, West JA, Koch KW (2008) Specialty fibers for optical communication systems. In: Optical fiber telecommunications V A: components and subsystems 35. Fyath RS, Ali HMM (2012) Transmission performance of optical code division multiple access network based on spectral amplitude coding. J Emerg Trends Comput Info Sci 3(3):444–455 36. Saleh S, Cholan NA, Sulaiman AH, Mahadi MA (2016) Self-seeded four wave mixing cascaded utilizing fiber brag grating. In: International conference on advances in electrical, electronic and system engineering. Malaysia, 14–16 Nov 2016 37. Nisar KS, Sarangal H, Thapar SS (2018) Performance evaluation of newly constructed NZCC for SAC-OCDMA using direct detection technique. Photonic Network Commun 38. Sarangal H, Singh A, Malhotra J (2017) Construction and analysis of a novel SAC-OCDMA system with EDW coding using Direct Detection technique. J Opt Commun 39. Chaudhary S, Tang X, Sharma A, Lin B, Wei X, Parmar A (2019) A cost-effective 100 Gbps SAC-OCDMA–PDM based inter-satellite communication link. Opt Quant Electron 51:148 40. Singh M, Sappal AS (2021) Digital predistortion of radio over fiber (RoF) link using hybrid Memetic algorithm. J Opt Commun
Resource Optimization with Digital Twins Using Intelligent Techniques for Smart Healthcare Management Sreekanth Rallapalli, M. R. Dileep, and A. V. Navaneeth
Abstract With advancements in artificial intelligence (AI) and machine learning (ML) new technologies are emerging which can assist the organizations in many ways from resource optimization to efficiently maintaining the facilities. Digital twin is such a technology where a virtual model is created to accurately reflect the physical object. It can be applied in the fields of construction, manufacturing, energy, automotive, and health care. Even though the technology is complex to understand but when properly implemented, it can be used to solve the complex problems efficiently. For resource optimization in healthcare management real-time data of hospital operations and its surrounding environment data related to number of patients suffering with a particular disease, its criticality and also cases related to accidents need to be captured. This helps the patients efficiently search the nearby hospitals for admission and better care. Digital twins enable the hospital management to detect the number of bed shortages. Digital twins can be used to replicate staffing systems, capacity planning, workflows, and care delivery models to improve efficiency, optimize costs, and anticipate future needs. In this paper, we study the architecture of building the digital twins for resource optimization in hospital. We also study the existing architecture, identify the gaps, and propose a novel architecture to efficiently optimize the hospital resources. We propose machine learning-based optimization techniques model for optimizing the resources. This will enable the patients to get assisted with better healthcare services. Keywords Artificial ıntelligence · Digital twins · Healthcare · Optimization · Resources
S. Rallapalli (B) · M. R. Dileep · A. V. Navaneeth Department of Master of Computer Applications, Nitte Meenakshi Institute of Technology, Yelahanka, Bengaluru, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_20
295
296
S. Rallapalli et al.
1 Introduction Without risking the life of the patient, for an experimental treatments a digital representation is required. To choose the best surgical option out of available options, to develop personalized treatments, to maximize the staffing workflows, to build the resource effectively, there is a need for advanced technology. In recent days, digital twin’s technology can assist the healthcare management in solving the complex problems effectively. In [1], authors have proposed the innovative digital platforms that are used in today’s business. The digital twin market is expected to grow annually at the rate of 39.1%. A digital twin technology is based on artificial intelligence and machine learning algorithms. It also utilizes the real-time data from Internet of Things (IoT) devices, sensors, and other devices. Healthcare organizations need to invest in a wide variety of technologies in order to use the digital twin’s technology. This include the connected infrastructure which means to collect the data application APIs and sensors. In [2], Kritzinger has provided a detailed review and classification of digital twins for manufacturing. In order to build the data models, modeling and analytical technologies like AI, ML, and 3D modeling are required. Since data come from various sources and variety of data like structured, unstructured or semi-structured data need to be acquired from a common source. In [3], digital twins in context of visualization and decision making have been explained. In health care, we often collect text data, image data, hand written data, and audio and video data. This data need to be processed and analyzed for resource allocation. Through advance visualizations, optimization of machine line production process can be achieved with IoT.
2 Digital Twin A lot of definitions are provided by digital twins in various research articles which are published. The initial terminology of digital twins is provided by Grieves in 2003 [4]. Digital twin paradigm has been provided even by aeronautical administration like NASA in 2012 [5]. They describe the digital twin as the mirror of life of its equivalent physical flying object. In [6], Chen mentioned that the digital twin is a computerized model of its equivalent physical model with all the features. In [7], authors mentioned the digital twin as living model of the physical system which will adapt all changes and process accordingly. In [8, 9], it is mentioned that digital twin is a virtual model of any existing physical model with all the features. There are few misconceptions of the digital twin. It is considered as a digital model, and no exchange of data happens between the physical model and digital twin model. Few reflect the digital twin as a digital shadow. Figure 1 represents the digital twin flow of data from physical system to virtual object what is referred as a digital object.
Resource Optimization with Digital Twins Using Intelligent Techniques …
297
Fig. 1 Data flow of digital twin
Digital twins support from one end to another end, and it is highly visible and traceable. This will also enable the organizations to oversee the day to day supply chains and build a new models if needed. Digital twins can also take real-time IoT data and apply AI and data analytics to optimize performance.
3 Digital Twin Technologies Digital twin technologies required the integration of other technologies based on IoT, AI, ML, cloud, and other image processing and visualization techniques. IoT sensors sense the data which are generating from edge devices and enables constant data transmission which then will be used to create a replica of a physical object. Artificial intelligence algorithms that automatically analyze the data obtained from the field provides valuable insights and future predictions. Cloud technology provides the storage of the data on virtual environment, computing power to run the powerful AI algorithms. Cloud also helps for easy data access from any location. For visualization of physical objects and users interaction with digital content, we may use the techniques like Extended Reality (XR). Digital twin initially was used for only a single device. But with the advanced techniques like AI and ML, now much more complex issues can be resolved. Sustainability of digital twins can be driven by the way they can capture, organize, and display the data. It is also driven by the way it can create the virtual models.
4 Literature Review A number of scientific papers is being published on digital twins. In [10], authors have reported that digital twins is based on physical and digital modules. The first module consists of products and process, while the other one is a computer module which process the information and optimize the decision making. In [11, 12], Zhuang proposed the three levels in digital twin architecture. He focused on three aspects the element, behavior, and the rule. After considering the literature review of the
298
S. Rallapalli et al.
Fig. 2 Architecture of digital twins
scientific papers and relevant articles, the architecture of the digital twin system can be of the following four components [13] which are shown in the below Fig. 2. The first component of the architecture is the physical system where this system need to be cloned virtually. In [14], main applications of digital twins with respect to design and prototype is discussed. In [15], authors have confirmed that digital twin is a key tool for new product design, development, testing, and resource allocation. Virtual systems can capture visual characteristics of the physical system and behavioral characteristics as discussed in [16]. Exchange of data should happen between the physical system and virtual system. System data plays a major role in developing and operation architecture of the digital twins. In the process of building digital twins, the data is originating from the physical system to virtual system with its equivalent characteristics and behavioral characteristics. In [17], it is said for the decision making process the data from virtual base is also needed for the digital twin operation based in the received responses. Communication interface (CI) allows the process of collection, integration, exchange of data and information from the physical environment and virtual environment. CI can be of the form of sensors and not necessarily be the physical cables. Digital twins can be applied in wide variety of engineering tasks and creating smart healthcare solutions [18–21].
4.1 Related Work There are numerous applications in healthcare management with respect to digital twins. To maximize the efficiency of the healthcare system, a personalized medicine needs to be implemented. Current clinical practices uses the concept of one size fits
Resource Optimization with Digital Twins Using Intelligent Techniques …
299
for all. Digital twins can remove this barrier by providing a precision medicine for individual patients. In [22], authors have critically reviewed the applications of digital twins in health care for an evidence-based medicine. In [23], authors have provided a detailed updates and challenges of digital twins in healthcare management. In this paper, a lot of research articles of digital twins in health care were reviewed. Most of papers only models were developed as part of personalized medicine care. The limitations of the previous works is that optimizing resources for healthcare systems were not taken into consideration. As part of future direction, there has a scope of work to be done to optimize the resources in healthcare sector.
5 Reflection Model Based on Digital Twin In this section, we consider a machine learning (ML)-based Twin of a Resource Root cause Analysis process. The idea behind this process is to find the deficiency in the resource allocation in the healthcare management. This will help the hospitals to identify the shortage of the resources required during an emergency situation. Once these resources required are identified they can implement the corrective and preventive actions. The main purpose of doing this is to optimize the resources and enhance the utilization of available resources. There are several ML models which can be used to perform this operation here we consider this problem under Multinomial Logistic Regression. Let us consider an optimization twin of the resource allocation process, which provides an automated schedule for the healthcare providers so that they can well respond for any emergency kind of situation in case of shortage of beds, nursing staff, demand for life saving drugs or doctor availability. In order to develop the digital twin in this situation, we need a set of parameters where we can consider for design and develop a digital twin. Input features, input data, rules for transformation of input data to output, and the final goal are the major factors which influence the development of the digital twin for optimizing the resources. Input features are collected from IoT for machine learning and optimization of the resources. For training of the ML model, a good historical data is required like frequently occurring scenarios and few rarely occurring ones. The main goal of the model will be to build a relation between the input data and output labels and be able to predict the exact labels in the future. Internet of Things (IoT) connects the different hospitals to collect the data which is depicted in Fig. 3. For effective optimization of the resources the actual data is merged with master data table and feature values are stored in the form of an index table. As an example, let us consider the following scenario of the healthcare facilities management data. H1: Healthcare facility 1 needs 5 beds, needs 10 specialized doctors, 3 ambulances, 4 nurses, and drugs on day1. The same healthcare facility needs 10 beds, needs 5 specialized doctors, 6 ambulances, 8 nurses, and drugs on day 2.
300
Fig. 3 Healthcare facilities data from different hospitals connected through IoT
S. Rallapalli et al.
Resource Optimization with Digital Twins Using Intelligent Techniques …
301
H2: Healthcare facility 2 uses 2 beds, needs 20 specialized doctors, 5 ambulances, 8 nurses and drugs on day1. This healthcare facility on day 2 needs 15 beds, needs 2 specialized doctors, 8 ambulances, 10 nurses, and drugs on day 2, and so on. We shall have a large set of data when we collect data from different healthcare facility. Machine learning and optimization model require complex mathematical model to solve the problems efficiently. ML can use its libraries such as scikit-learn, frameworks such as PyTorch and TensorFlow. Another way to implement a reflection model based on a digital twin is to use a data-driven model that is trained on sensor data from the real-world asset. This type of model can be used to make predictions about the asset’s behavior based on patterns in the sensor data. Machine learning algorithms such as Neural Network, Random Forest, and Gradient Boosting can be used to develop such models.
6 Machine Learning and Optimization 6.1 Logistic Regression ML Model In this section, we use the logistic regression ML model (as in Fig. 4) for resource planning in healthcare management. The resources are limited to availability of staff doctors and nurses, beds, ICUs, equipments, lifesaving drugs, and ambulances. We define the input variables as follows. y1 , y2 ,…, yn input feature variables for the machine learning model u1,I are the coefficients for the ith label. X is the predicted output, Z i —linear predicator function
Fig. 4 Logistic regression ML model
302
S. Rallapalli et al.
for the ith label, yi —ith label prediction for the observation, L—loss function, σ − activation function, Yi—actual label for the observation. Logistic regression uses an equation which is very equivalent to linear regression. The input variables here y1, y2,…,yn are combined in linear mode and uses the coefficients to predict the output. The logistic equation is given by the following equation: x=
e B0+B1∗y , 1 + e B0+b1∗y
where x is the predicted output and B0 is the bias or intercept. B1 is the coefficient for single input value y. The data resources are collected from open datasets or it can be collected through IoT.
7 Proposed Optimization Model Based on Digital Twin In this section, we propose an optimization model based on digital twin. For this model, we consider the following features: Bβ,p,i Oo,p,i Mm,p,i TOo,p TMm,p OAVo,i MAVm,i
Binary variable denoting if the staff β (in the staff table) of hospital p (in the hospital table) is available on day i Binary variable denoting if ICU at index o (in the ICU table) is scheduled to be booked on hospital p on day i Binary variable denoting if beds at index m (in the Bed occupancy table) is scheduled to be avaialble hospital p on day i Time taken by ICU o for a patient of hospital p Time taken for bed occupancy m in a hospital p Availability hours of ICU at index o on day i Availability hours of resources at index m on day i
The proposed model for optimization of resources based on the digital twin is given as per Fig. 5. The main challenge here is to evaluate to know that if the model which is shown here provides the output what it is intended to do. It may not be possible to do it for single iteration, and we need to have a multi-levels iterations in order to get the correct weights of the information. This model collects the information through communication interface like sensors and passes that data through the various machine learning models like Regression Model which is discussed in Sect. 6. Once the data is collected and then it is compared with the resource availability in various healthcare systems. A digital twin-based virtual model is built.
Resource Optimization with Digital Twins Using Intelligent Techniques …
303
Fig. 5 Optimization model based on digital twin
8 Results The following Table 1 illustrates the optimization of the resources before and after utilization of the optimization model based on the digital twin. Table 2 shows the results of the data after the optimization model using ML is run through and final output after the resources are shared among other healthcare facility wherever required.
8.1 Result Analysis In this section, we will analyze that the resources information collected from various hospitals is scheduled properly. For this reason, we attach a flag if any of the resources not utilized by any of the hospital on a particular day. We can also group the resources that are effectively utilized on day to day basis in each and every hospital. In hospitals, resources are used efficiently and needs are not more than what the hospital is able Table 1 Input data before optimization Healthcare facility
Resources available
Flag
H1
Beds—20, Ambulance—4, Doctors—15, ICU—3, Nurses—10
No
H2
Beds—10, Ambulance—2, Doctors—20, ICU—3, Nurses—6
Yes
H3
Beds—15, Ambulance—3, Doctors—10, ICU—2, Nurses—7
Yes
H4
Beds—20, Ambulance—4, Doctors—15, ICU—3, Nurses—10
No
H5
Beds—10, Ambulance—2, Doctors—20, ICU—3, Nurses—6
Yes
H6
Beds—15, Ambulance—3, Doctors—10, ICU—2, Nurses—7
No
304
S. Rallapalli et al.
Table 2 Output data after optimization ML model is implemented HFa
Resources available
Flag
Optimized resources
H1
Beds—20, Ambulance—4, Doctors—15, ICU—3, Nurses—10
No
Requirements from H2 given at H1
H2
Beds—10, Ambulance—2, Doctors—20, ICU—3, Nurses—6
Yes
Resource request sent to H1
H3
Beds—15, Ambulance—3, Doctors—10, ICU—2, Nurses—7
Yes
Resource request sent to H4
H4
Beds—20,Ambulance—4,Doctors—15, ICU—3, Nurses—10
No
Requirements at H3 given at H4
H5
Beds—10, Ambulance—2, Doctors—20, ICU—3, Nurses—6
Yes
Resource request sent to H4
H6
Beds—15, Ambulance—3, Doctors—10, ICU—2, Nurses—7
No
Requirements at H3 given at H4
a Healthcare
facility
to meet. If such resources required exceed the capacity of the unit, those requirement can be quickly passed to nearby similar units where the resources are idle. In this way, we can optimize the resources of all the networked hospitals and save patient lives in case of emergencies like pandemics.
9 Conclusion and Future Work In the era of AI, ML, and IoT, lot of data being collected through various sources. The data collected through this can be processed in efficient way. To do this, we build an ML models. ML models can use the data and predict the outcomes. Optimization of resources is very important for any organization. With optimization the resources can be efficiently used and shared among other partners in case of emergencies like pandemic. This paper focused on such optimization of the resource. Hospital data is collected through IoTs, and an optimization ML model is developed and used for efficient utilization of the resources. Digital twins are used to replicate the physical model. With the usage of digital twins the optimization can be efficiently managed. Historical data and real-time data of the day to day operations in the hospitals and its surrounding environment will assist in creating the digital twins. This will assist the hospital management staff to detect any shortages in the beds and also helps in shortage of the hospital staff like doctors, nurses, operation rooms,
Resource Optimization with Digital Twins Using Intelligent Techniques …
305
ICUs, and drugs. Digital twins can improve the efficiency while decreasing the costs. Further development in this can be the usage of deep learning for digital twins. Deep learning can predict the exact features from the input. This well help the healthcare management facility to efficiently manage the resources.
References 1. Hilbolling S, Berends H, Deken F, Tuertscher P (2020) Complementors as connectors: managing open innovation around digital product platforms. R&D Management 50(1):18–30 2. Kritzinger W, Karner M, Traar G, Henjes J, Sihn W (2018) Digital twin in manufacturing: a categorical literature review and classification. IFAC-Papers Online 51(11):1016–1022 3. Bao J, Guo D, Li J, Zhang J (2019) The Modelling and operations for the digital twin in the context of manufacturing. Enterp Inf Syst 13(4):534–556 4. Grieves M (2014) Digital twin: manufacturing excellence through virtual factory replication. White Paper 1. NASA, Washington, DC, USA 5. Glaessgen E, Stargel D (2012) The digital twin paradigm for future NASA and U.S. Air force vehicles. In: Proceedings 53rd AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics materials conference 20th AIAA/ASME/AHS adaptive structures conference 14th AIAA, p 1818, 2012 Apr 2012 6. Chen Y (2017) Integrated and intelligent manufacturing: perspectives and enablers. Engineering 3(5):588–595 7. Liu Z, Meyendorf N, Mrad N (2018) The role of data fusion in predictive maintenance using digital twin. In: Proc. Annu. Rev. Prog. Quant. Nonde- struct. Eval. Provo, UT, USA, 2018, Art. no. 020023 8. Zheng Y, Yang S, Cheng H (2018) An application framework of digital twin and its case study. J Ambient Intell Humanized Comput 10(3):1141–1153 9. Vrabi£ R, Erkoyuncu JA, Butala P, Roy R (2018) Digital twins: understanding the added value of integrated models for through-life engineering services. Procedia Manuf 16:139–146 10. Alam KM, EL Saddik A (2017) C2PS: a digital twin architecture reference model for the cloud-based cyber-physical systems. IEEE Access 5:2050–2062. https://doi.org/10.1109/ACC ESS.2017.265 7006 11. Tao F, Cheng J, Qi Q et al (2018) Digital twin-driven product design, manufacturing and service with big data. Int J Adv Manuf Technol 94:3563–3576. https://doi.org/10.1007/s00170-0170233-1 12. Huang S, Wang G, Lei D, Yan Y (2022) Toward digital validation for rapid product development based on digital twin: a framework. Int J Adv Manuf Technol 1–16. https://doi.org/10.1007/ s00170-021-08475 13. Dos Santos CH, Montevechi JAB, de Queiroz JA et al (2021) Decision support in productive processes through DES and ABS in the digital twin era: a systematic literature review. Int J Prod Res 59:1–20. https://doi.org/10.1080/00207543.2021.1898691 14. Wright L, Davidson S (2020) How to tell the difference between a model and a digital twin. Adv Model Simul Eng Sci 7:1–13. https://doi.org/10.1186/s40323-020-00147-4 15. Lo CK, Chen CH, Zhong RY (2021) A review of digital twin in product design and development. Adv Eng Inform 48:1–15. https://doi.org/10.1016/j.aei.2021.101297 16. Zhuang C, Liu J, Xiong H (2018) Digital twin-based smart production management and control framework for the complex product assembly shop-floor. Int J Adv Manuf Technol 96:1149– 1163. https://doi.org/10.1007/s00170-018-1617-6 17. Montevechi JAB, Santos CH, Gabriel GT et al (2020) A method proposal for conducting simulation projects in Industry 4.0: a cyber-physical system in an aeronautical industry. In: Proceeding of the 2020 winter simulation conference. Orlando, USA, pp 2731–2742
306
S. Rallapalli et al.
18. Madni A, Madni C, Lucero S (2019) Leveraging digital twin technology in model-based systems engineering. Systems 7(1):7 19. Brosinsky C, Westermann D, Krebs R (2008) Recent and prospective developments in power system control centers: adapting the digital twin technology for application in power system control centers. In: Proceedings IEEE international energy conference (ENERGYCON), pp 16, June 2018 20. Brandtstaedter H, Ludwig C, Hubner L, Tsouchnika E, Jungiewicz A, Wever U (2018) Digital twins for large electric drive trains. In: Proceedings of petroleum and chemical industry conference Europe (PCIC Europe), pp 15, June 2018 21. Soe RM (2017) FINEST twins: platform for cross-border smart city solutions. In: Proceedings of 18th annual international conference on digital government research, pp 352357, June 2017 22. Armeni P, Polat I, De Rossi LM, Diaferia L, Meregalli S, Gatti A (2022) Digital twins in healthcare: is it the beginning of a new era of evidence-based medicine? A critical review. J Pers Med 12(8):1255. https://doi.org/10.3390/jpm12081255. PMID: 36013204; PMCID: PMC9410074 23. Sun T, He X, Li Z (2023) Digital twin in healthcare: recent updates and challenges. Digit Health 3(9):20552076221149651. https://doi.org/10.1177/20552076221149651. PMID: 36636729; PMCID: PMC983057
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile Sink in Wireless Sensor Networks Nami Susan Kurian and B. Rajesh Shyamala Devi
Abstract The fundamental task of wireless sensor networks is data gathering. Clustering aims at gaining load balancing and extended network lifetime. Clustering schemas and trajectory optimizations are effective methods for last decade which are found as the strategies to boost the energy efficiency in the sensor network environment. The clustering reduces energy hole issues or funeral effect by disseminating the aggregated or collected data to sink node or destination terminal through the elected cluster heads. Static sink maximizes the multihop transmissions within the sensor network and frequently results in energy hole problem, which significantly drops the energy in sensor nodes near the sink. The suggested schema not only enhances the network lifespan by efficient selection of cluster head with equal-sized cluster formation but also improves the data gathering mechanism through the mobile sink concept. The protocol is simulated using MATLAB for various parameters, and it is observed that the novel proposed methodology exceeds the conventional protocol regards to energy consumption and network lifetime. Keywords Clustering · Mobile sink · Duty cycling · Energy consumption · Traveling salesman problem
1 Introduction An infrastructureless, ad hoc network with thousands of nodes that self-organize and have the capability to communicate with the environment, gather, aggregate and forward data to the sink is termed as a wireless sensor network. The application perspective of WSN is widely extended with the advent in micro-sensors and wireless communication technology [1]. The energy consumption in sensor nodes being battery powered is the biggest challenge in WSN. The energy of nodes is utilized for transmission, reception, sensing, aggregation, and processing of data in the network. N. S. Kurian (B) · B. Rajesh Shyamala Devi Hindustan Institute of Technology and Science, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_21
307
308
N. S. Kurian and B. Rajesh Shyamala Devi
Either a single-hop or a multihop method of data transmission to the sink is possible. In former, there is direct transmission of packets between sink and sensor nodes contrarily; in later, intermediate nodes communicate to reach the sink. With reference to energy efficiency, inter-hop communication exceeds single-hop communication; however, it causes an energy hole problem. To curb the drain or depletion of node energy in long-distance transmission, clustering concept is introduced. Clustering [2] provides extended lifetime, load balancing and diminishes data packet collision. In a sensor network, the position of the sink could be stationary (fixed) as well as active (mobile) in accordance with the specific purpose. There are many problems associated with static sink nodes: (a) Funeral effect, the sensor node near sink consumes more energy resulting in faster death of nodes, (b) communication overhead, exchange of control packets results in high control overhead, (c) connectivity constraints, wireless environment results in frequent path loss leads to data loss. Dynamic sink nodes are attached to vehicles, wildlife, or people, enabling them to roam out about sensory environment and aggregate sensed values over absurdly short and dependent connections. The mobile sink strategy [3], which enhances data collecting flexibility with minimal energy consumption, was designed to enhance energy utilization and network longevity. In a conventional static WSN, the sensor collects data on a regular basis and transmits to the fixed gateway. However, static nodes cause nodes around the sink to die quickly, culminating in a funeral effect letting to energy hole problem. In order to address the limitations with respect to static nodes, the concept of sink mobility was proposed. Figure 1 shows how data are gathered using static and mobile sinks. Finding the sink trajectory in a controlled and uncontrolled environment to collect the data is challenging. In a controlled environment, trajectory of the mobile sink is predetermined, but in an uncontrolled environment, its movement is random and unpredictable making hard the accessibility of sensor nodes. Many researchers have found that movement of sink node in large-scale monitoring environments helps to save energy usage to a wider extend since transmission energy of nodes is reduced. Figure 1 depicts the one-hop direct transmission of data from cluster head with a static sink which takes place in many protocols, e.g., LEACH. Consider that from the randomly distributed nodes in a sensor network, the nodes 6, 7, 1, 9, 8 form a cluster. Similarly, (12, 10, 15, 14, 13), (17, 16, 18, 2, 1), and (4, 20, 3, 5, 19) form other clusters. The nodes 5, 12, 17, and 4 act as cluster heads, while cluster members are sensor nodes connected to them. Figure 2 shows a mobile sink that travels around the sensory network and visits each cluster head to gather information and then returns to the initial position. The effectiveness of a mobile sink-based clustering strategy for data collecting was demonstrated in this research. Two key concerns of hotspots and network lifespan could be resolved by the concept of mobile sink.
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile …
309
Fig. 1 Direct data transmission using static sink
Fig. 2 Mobile sink data gathering at cluster heads
The remaining part of the document is organized as follows. The study attributed to developing energy-efficient routing techniques is presented in Sect. 2. Section 3 characterizes the energy model and Sect. 4 illuminates the proposed methodology with efficient clustering and mobile sink data collection. Section 5 highlights the evaluation metrics for the performance comparison of proposed and existing protocols
310
N. S. Kurian and B. Rajesh Shyamala Devi
and the simulation results of the suggested methodology using the existing models. Section 6 includes the conclusion of the paper.
2 Related Works Researchers worked on various energy-efficient algorithms on cluster-based routing techniques. The clustering technique starts with the development of LEACH protocol, a hierarchy-based clustering protocol with reduced energy mechanism [4]. The socalled umbrella protocol separates the entire network into various clusters controlled by a master cluster for each group to aggregate the sensed values to reach the sink node. TDMA slots are allotted and cluster members communicate only during their time slots else continue operating in the low-power state sleep mode. The limitation of LEACH protocol is the random calculation of cluster heads on each cycle which depends on probability, disregarding computations for residual energy. In [5] developed A-LEACH, a different LEACH methodology variation based on residual energy and weighted probability. Protocol assumed a heterogeneous environment with varied initial energy. However, the approach has limitations to ensure efficiency in applications with variable initial energy. Many researches were done on incorporating LEACH protocol with bio-inspired algorithms. In [6], LEACH protocol based on bio-inspired algorithm called genetic algorithm was proposed, where the CH selection is based primarily on the optimization of node probability to become the cluster head. LEACHGA uses a centralized method since a significant count of control and advertisement messages must be sent during the process. A chain-based approach that eliminates the drawbacks of LEACH protocol is PEGASIS [7]. Energy reduction takes place as nodes operate in parallel manner and the transmitting distances got reduced, and the leader node is rotated after every round. The limitation lies on the selection of leader node as the energy of the node is not considered. MIEEPB protocol [8] is a chain-based protocol that uses mobile sink concept to collect data to form an effective mechanism that is resource efficient. In order to maximize network longevity, the sensory field is partitioned into four equal zones, and sink mobility is implemented across a multi-chain paradigm. The protocol shortens the distance between chain-connected nodes and minimizes network latency in data transmission. It minimizes overload on the nodes and the burden on chain leaders. The protocol takes mobile sink concept through fixed path that enhances the network lifetime. In [9], researchers worked on MSIEEP protocol based on the controlled mobile sink. The number of operational nodes and leftover energy are used as the primary criteria for choosing the cluster leader. The limitation is that it leads to increased delay due to region division. EECA [10] is an efficient clustering mechanism that follows a predefined node deployment strategy. In order to provide the best cut energy usage for the cluster-head selection, a two-stage approach is implemented for electing the cluster head. In first stage, primary parameter taken for selection of anchor CHs is residual energy, and for the candidate cluster, two parameters are
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile …
311
considered, energy and distance. The potential CHs compete for the role of CH on delayed transmission-broadcast technique in the second stage. RZ-LEACH, a modified version of LEACH protocol [11], enhances network performance based on speed and sturdiness and also on longevity based on mobile sink and rendezvous node concept. It follows the two stages of LEACH protocol resulting in initialization or set-up phase as the first and transmission or steadystate phase as the second. The protocol assumes equal energy for all nodes in the initial stage [12]. The protocol divides the network region into identically sized k-dimensional grid cells. Each of the grid cells is allocated to clusters using a technique modeled by the k-dimensional tree methodology ensuring that the utilization of resource or energy consumption of individual cluster is identical while collecting data. The TCBDGA technique was developed in [13], and it employs a clustering strategy for the formation of tree, with RP, rendezvous points serving as the tree’s root. When branching increases, forwarding demand at the RP’s nearest nodes also increases, leading in an energy hole problem. In [14], CB algorithm uses a clustering technique based on the rendezvous points that are allocated based on binary search. Cluster heads are chosen randomly, and nodes join the cluster head depending on their proximity to it. This method restricts mobile sink transmission over a denser network. To summarize, there are different data aggregation mechanisms introduced by various researchers, but energy conservation and lifetime are not improved to an expected level due to lack of parameters taken and limitations in protocol design. The proposed model uses a mobile sink data gathering approach with equal sized clustering and efficient cluster-head selection using three factors.
3 Energy Analysis of Routing Protocols The radio transmission model assumed for the work is shown in Fig. 3. The energy needed for the operation of transmitting and receiving cırcuit is Eel = 50 nJ/bit. For an appropriate bit error rate to be achieved by the transmit amplifier, an important parameter of wireless communication is kept at εamp = 100 pJ/bit/m2 . We assume that the data need to travel ‘d’ distance from source to destination for the transmission of n-bit message. The suggested protocol uses the same energy dissipation model as the LEACH protocol used. Equations 1 and 2 estimate the quantity of energy required for the transmission of data at transmitter E T which is: E T (n, d) = E T −el (n) + E T −amp (n, d), E T (n, d) = E el ∗ k + εamp ∗ n ∗ d 2 k E el + E f nd 2 , if d < d0 = . k E el + E m nd 4 , if d ≥ d0
(1)
(2)
312
N. S. Kurian and B. Rajesh Shyamala Devi
Fig. 3 Radio transmission model
The electronics circuit energy dissipation is indicated by E el . E f is the energy dissipated in open free space and E m denotes energy dissipated in multipath. The amount of resource in terms of energy needed for reception of signal E R is represented as the reception energy subtracted from energy dissipated from electronic circuit and is derived in Eqs. 3 and 4. E R (n) = E R−el (n),
(3)
E R (n) = E el ∗ n,
(4)
/ d0 =
Ef . Em
(5)
The threshold distance is given in Eq. 5. In a network that uses direct communication, the sensor nodes perceive and connect with the end nodes as a single hop. In accordance with the (distance) proximity of the sensor node, the energy dissipation varies, and when the sink node is located far from the nodes, a large amount of power as energy is dissipated, thereby reducing the network lifetime. Sensor node dissipates energy during transmission and reception. Considering the linear network, the separable distance between the nodes is mentioned as ‘t.’ When each node tries to communicate via direct communication to the sink node with a distance ‘pt,’ energy utilized during direct communication is shown in Eq. 6: E direct = E T (n, d = p ∗ t) = E el ∗ n + εamp ∗ n ∗ ( pt)2 .
(6)
Contrast to direct transmission, the MTE routing, minimum transmission routing is a multihop routing where node delivers its message to the nearest node in the direction of base station. Hence, the amount of energy utilized for MTE routing for a node at ‘pt’ is in Eq. 7:
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile …
313
E M T E = p ∗ E T (n, d = t) + ( p − 1) ∗ E R (n) p E el ∗ n + εamp ∗ n ∗ t 2 + ( p − 1) ∗ E el ∗ n = n((2 p − 1)E el + εamp pt 2 .
(7)
4 Proposed Methodology MS-CDG, the proposed protocol, outperforms the existing work on LEACH by using an efficient clustering mechanism and routing approach. LEACH is an umbrella protocol that overcome the drawbacks of other conventional protocols that depends on the direct transmission, minimum energy transmission, static clusters, and multihop routing. In LEACH, the cluster head is elected in rotation basis to divide the energy equally among the sensor nodes. LEACH uses localized coordination to offer scalability and resilience for dynamic networks and incorporates data fusion to reduce the amount of signals that must be broadcasted to the base station. Locally, the nodes are grouped into tiny clusters, with one node chosen as the cluster leader. The CH, cluster head may die quickly if there is no randomization of cluster heads or if they are fixed and chosen in prior, shortens the energy of individual nodes diminishing the network longevity. After the formation of cluster head, transmission time of each sensor is scheduled by the local cluster-head CH selection which takes place based on Eq. 8. For a node ‘m,’ if the threshold T (m) is less than threshold, it has been chosen to lead the cluster at this time. T (m) =
P 1 − P × t mod
1 P
if m ∈ G.
(8)
When Eq. 8 is written by taking residual energy into consideration, T (m) =
P 1 − P × t mod
1 P
×
E residual K opt . E initial
(8a)
E residual —leftover energy of the sensor node, E initial —energy of the node when deployed. P—desired % of cluster heads, t—present round. The optimum clusters K opt in Eq. 9 that can be formed from the sensor network can be formulated. K opt =
√
/ N /2π
Ef D . 2 εamp dbs
(9)
D represents the network diameter, N represents all the sensor nodes in the sensory network, and d bs is the distance between CH and base station. Assumptions for proposed protocol:
314
N. S. Kurian and B. Rajesh Shyamala Devi
Fig. 4 Flowchart for the proposed protocol MS-CDG
Following assumptions were made for the execution of proposed protocol: a. All nodes within sensory network—aside from the mobile sink—are static. b. The sensor network is homogeneous. Mobile sink is supposed to have limitless energy. The proposed protocol MS-CDG overcomes the shortcomings of LEACH protocol by carrying out the operation in four phases. The flowchart is shown in Fig. 4.
4.1 Clustering Phase In LEACH, a single parameter that determines the cluster-head selection is the probability value (p). The proposed protocol takes more efficient method of performing the cluster-head selection using three parameters—as node degree, residual energy,
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile …
315
and distance [15]. A threshold value is set for each of the parameters and the nodes satisfying the criteria are eligible to serve as the cluster head for the present round. Degree of node: In a densely packed distributed network, how many neighbor nodes attached or linked to a certain node are the degree of the node. Cluster head is chosen at the initial set-up phase, the CH is chosen, and the node with the most neighbors is considered for cluster-head selection, that optimizes network performance. Node degree can be represented by N d and is expressed as Eq. 10, where Nc (Ni ) denotes the number of nodes to which Ni is attached. Nd = Nc (Ni ).
(10)
Residual Energy: The amount of energy left over after the consumption is termed as residual energy. This parameter is used to select the nodes depending on the highest leftover energy for the cluster-head selection. The metric is represented as Er and is represented as Eq. (11). E i represents the initial energy and E u represents the amount of energy utilized. Er = E i − E u .
(11)
Distance: The distance metric is proportional to transmission power (RSSI) and delay. If the distance is more, the time required to travel the distance is higher resulting in high delay. In Eq. (12), Nsink,x , Nsink,y reflect the mobile sink node’s location. Ni,x , Ni,y denote the x and y locations of node Ni . The distance is denoted by the symbol di,sink and is mentioned in Eq. 12. Rather than taking the probability criteria of LEACH in Eq. 8, the master node or the cluster head of proposed protocol depends up on the three parameters in Eqs. (10–12). di,sink =
/
Nsink,x − Ni,x
2
2 + Nsink,y − Ni,y .
(12)
4.2 Equal Sized Clustering Most researchers propose clustering protocols based on the assumption of equal sized clusters. The random nature of sensor nodes does not guarantee equal sized clusters, which makes load unbalanced resulting in reduced network performance. To address this, a node revamp strategy that accomplishes task scheduling for better load balancing is devised that diminishes energy utilization while achieving uniformly sized clusters with no duplication was developed. Where applicable, the suggested technique first builds primary clusters and then re-furbishes the primary clusters using a second-best option cluster head. This approach elects the cluster-head selection which depends on three various parameters—leftover energy, total count of neighbor
316
N. S. Kurian and B. Rajesh Shyamala Devi
nodes, and separation/range. In the initial stage, the cluster heads send advertisement messages and nearby nodes join the master head depending on RSSI. In the cluster refurbish step, clusters are re-organized with the aim of equal sized clusters by sending the nodes from cluster length—maximum to minimum focusing on the second-best cluster head [16]. After identifying the biggest cluster head, the range between CM and all other master heads is determined. The closest cluster member to the cluster head will be assigned to the second-best cluster head. The equal size clustering algorithm is executed through two different stages in which cluster creation step is the first, whereas the re-furbishment phase is the second. Initial phase is executed same as LEACH protocol where the probability is the constraint for CH selection. In the second step, cluster re-furbish phase, the three inputs taken are the number of CH, number of nodes per cluster, and maximum cluster members. The biggest cluster with the most nodes is initially found. The biggest cluster with the most nodes is initially found. If the cluster has members more than the threshold, the second-best range from the member node to the next biggest cluster is calculated, and the node with the smallest distance would approach next cluster head based on the calculation in order to maintain the equal size clustering. Sorting of second-best cluster in ascending order is done. The equal size clustering is carried out in descending order from largest to the smallest cluster based on the distance in every stage. After distributing the second-best cluster head to the first ‘k’ sensor nodes and updating the cluster, the cluster is updated. Once the equal size cluster is formed, remove the corresponding cluster from further processing.
4.3 Duty Cycling Phase The energy consumption is high if the node is awake at all time. To overcome this, proposed protocol uses the duty cycling mechanism within intra-clusters. We consider the mobile node to be operational all across the network. In the proposed protocol, each node wakes up at its TDMA schedule to exchange the data and returns to sleep state to save energy by turning off the radio. There are three scheduling modes that correspond to the states: listen, sleep, and transmit [17]. By turning off the transceiver when in sleep mode, the sensor node uses less energy, extending the lifetime of the network.
4.4 Routing Phase Traveling salesman problem (TSP) [18] is used to route the mobile node across the sensory network for data collection and information aggregation. When the cluster heads are chosen, it provides the location information to the target mobile sink. According to the proximity range/location, shortest path is found and the sink node
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile …
317
visits each cluster head only once and traverses back to the starting position. This avoids visiting the same nodes multiple times, resulting in energy consumption. Communication among CMs and CH: After the cluster heads are chosen, the sink is informed of their location. The collected information is transferred to cluster head by the member nodes using TDMA scheduling thereby preventing collision. The other cluster heads switch their radio to sleep state until the allotted timeslot. This reduces the energy consumption leading to network longevity. Communication Among CH and Mobile Sink: After the cluster members transfer information to the head of cluster, the target node [19] gathers data from the CH after arriving at the range of communication. Data aggregation at CH takes place to avoid overloading of packets, which reduces energy. There is only one way in which communication takes place, direct from cluster head to mobile sink in single-hop fashion.
5 Simulation and Results Network Lifetime: It defines the operational state of how long the nodes are alive and are capable in transmission. WSNs will have to rely heavily on leveraging numerous intrinsic trade-offs between mutually incompatible aims during runtime. The consumption of energy is inversely proportional to network lifetime. Utilization of energy can be reduced through clustering, load balancing with the incorporation of mobile sink concepts. Delay: Delay is an imperative QoS parameter considered for data forwarding in time-constrained applications in WSN. Clustering and mobile sink reduce the delay in transmitting the packets. Throughput or packet delivery ratio (PDR): The success rate is defined by the metric throughput. It is measured on the successful arrival of packets at end node upon the total packets that the source transmits to receiver. The parameters taken into consideration are prescribed in Table 1. Illustrations: The process involved in the proposed protocol MS-CDG and the simulation outputs are shown in the above figures. Figure 5 shows the random deployment distribution of sensor nodes in a sensory field. Nodes are deployed randomly and a mobile node is incorporated that traverses throughout the sensory field. In Fig. 6, all nodes compete to become cluster master head and its election is dependent on three factors—distance, residual energy, and number of neighbor nodes. The cluster head is chosen from among the nodes that meet the criteria shown in figure as node 6, node 12, node 17, and node 4. Cluster heads broadcast the advertisement message and wait for the join message from cluster members. In Fig. 7, cluster members join the nearest cluster head culminating in unequal clusters. Figure 8 shows the cluster re-furbish stage that forms equal sized clusters thereby balancing the load
318
N. S. Kurian and B. Rajesh Shyamala Devi
Table 1 Network setup Parameter
Value
Simulation tool used
MATLAB
Simulation area
400 × 400 m2
Count of sensor nodes
100
Deployed ınitial energy
0.5 J/node
Number of rounds
2500
Energy to run transmitter and receiver Eel
50 nJ/bit
Amplification energy Ef (d < d0)
10 pJ/bit/m2
Amplification Energy Em (d > d0)
0.0013 pJ/bit/m2
on cluster head. Node 16 joins the CH node 17 and node 20 joins the CH node 4. Once the cluster re-furbish stage completes, the location of the master cluster heads is communicated to mobile sink and the mobile sink visits the cluster head based on the predefined time after calculating the shortest distance using TSP. Mobile sink traverses to node 6 in Fig. 9, then to node 12 in Fig. 10, node 17 in Fig. 11, node 4 and finally to original position in Fig. 12. Figure 13 shows how many nodes are alive over a period of time based on the number of rounds. As the time or rounds increase, the count of alive nodes decreases. MS-CDG outperforms LEACH by keeping nodes alive for increased number of rounds. Three-factor clustering and maintaining equal sized clusters along with the target mobile sink concept reduce the load on cluster heads and increase energy efficiency in turn enhancing the network longevity. Figure 14 gives the comparison of dead nodes per rounds. Nodes dead faster in LEACH than MS-CDG as LEACH elects the cluster head based on probability and sink node is static. At round 650, almost all nodes in LEACH protocol are dead, but in MS-CDG, the nodes are alive till round 1300. Figure 15 shows the energy utilization per rounds. The residual energy drains faster in LEACH compared to MS-CDG. The count of successful packets reaching the receiver is high in MS-CDG then LEACH
Fig. 5 Random distribution of sensor nodes in the sensory field
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile …
319
Fig. 6 Three-factor selection of cluster head—distance, residual energy, and number of neighbor nodes
Fig. 7 Sensor node joins the cluster head (cluster set-up phase)
shown in Fig. 16 in terms of throughput. The delay based on the suggested protocol is less compared to existing work as the mobile node visits each master head or cluster head and gathers data on time. When the data are transmitted to nearby nodes and reach the sink node, there are possibilities of loss of data and delay can be high. But, in the proposed work, MS collects data directly and it is observed that delay in data gathering is less in the suggested protocol. When sink node moves, the other nodes can save energy to a greater extend leading to more alive nodes and lifetime.
320
N. S. Kurian and B. Rajesh Shyamala Devi
Fig. 8 Cluster re-furbish stage for equal sized clusters
Fig. 9 Mobile sink visits the CH (node 6) based on predefined time t1 after calculating the shortest path
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile …
Fig. 10 Mobile sink position at cluster head (node 12) at time t2
Fig. 11 MS position at cluster head (node 12) at time t3
321
322
N. S. Kurian and B. Rajesh Shyamala Devi
Fig. 12 MS position at cluster head (node 17) at time t4 and MS position at cluster head (node 4) at time t5
Fig. 13 Number of rounds versus number of alive nodes
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile …
Fig. 14 Number of rounds versus number of dead nodes
Fig. 15 Number of rounds versus residual energy
323
324
N. S. Kurian and B. Rajesh Shyamala Devi
Fig. 16 Number of rounds versus packet delivery ratio
6 Conclusion The proposed work introduced a unique mobile target sink-based clustering protocol with the incorporation of three-factor approach for selecting equal sized efficient clusters. The primary purpose of the research idea would be to address the issue related to energy gaps and increase network longevity by improving performance metrics. This suggested approach overcomes the shortcomings of LEACH protocol by considering three criteria for optimizing the fitness value for efficient CH selection—node degree, residual energy, and distance. The equal sized clusters are obtained using the technique that first builds unequal initial stage clusters and then re-furbishes the primary stage clusters using a second-best choice cluster head to make all clusters of equal size. The concept incorporated in proposed methodology is that the mobile sink node stays and stops at the CH position to collect information from CHs, allowing every CH to utilize the mobile sink without exhausting high energy using TSP approach. The work is carried out in MATLAB simulation tool and comparison with the existing protocol is done. With respect to latency, packet delivery ratio, network longevity, and residual energy, MS-CDG protocol outperforms existing methodologies. The future work can be done by finding an efficient routing protocol for finding the trajectory of mobile sink.
MS-CDG: An Efficient Cluster-Based Data Gathering Using Mobile …
325
References 1. Karray F, Jmal MW, Garcia-Ortiz A, Abid M, Obeid AM (2018) A comprehensive survey on wireless sensor node hardware platforms. Comput Netw 144:89–110. https://doi.org/10.1016/ j.comnet.2018.05.010 2. Shahraki A, Taherkordi A, Haugen Ø, Eliassen F (2020) Clustering objectives in wireless sensor networks: a survey and research direction analysis. Comput Netw 180. https://doi.org/ 10.1016/j.comnet.2020.107376 3. Xie G, Pan F (2016) Cluster-based routing for the mobile sink in wireless sensor networks with obstacles. IEEE Access 4:2019–2028. https://doi.org/10.1109/ACCESS.2016.2558196 4. Heinzelman W, Chandrakasan A, Balakrishnan H (2000) Energy-efficient communication protocol for wireless sensor networks. In: Proceedings of the Hawaii international conference system science. https://doi.org/10.1109/HICSS.2000.926982 5. Gopal K, Shrivastava V (2012) 10.1.1.1065.3702.Pdf. 427–431 6. Liu J, Ravishankar CV (2011) 12-L10072.pdf. 1:79–85 7. Lindsey S, Raghavendra C, Member F, Sivalingam KM (2001) Data gathering in sensor networks using the energy delay metric.pdf. 13:924–935 8. Jafri MR, Javaid N, Javaid A, Khan ZA, 1303.4347.Pdf 9. Abo-Zahhad M, Ahmed SM, Sabor N, Sasaki S (2015) Mobile sink-based adaptive immune energy-efficient clustering protocol for improving the lifetime and stability period of wireless sensor networks. IEEE Sens J 15:4576–4586. https://doi.org/10.1109/JSEN.2015.2424296 10. Du T, Qu S, Liu F, Wang Q (2015) An energy efficiency semi-static routing algorithm for WSNs based on HAC clustering method. Inf Fusion 21:18–29. https://doi.org/10.1016/j.inffus.2013. 05.001 11. Mottaghi S, Zahabi MR (2015) Optimizing LEACH clustering algorithm with mobile sink and rendezvous nodes. AEU Int J Electron Commun 69:507–514. https://doi.org/10.1016/j.aeue. 2014.10.021 12. Zhou Z, Du C, Shu L, Hancke G, Niu J, Ning H (2016) An energy-balanced heuristic for mobile sink scheduling in hybrid WSNs. IEEE Trans Ind Informatics 12:28–40. https://doi. org/10.1109/TII.2015.2489160 13. Zhu C, Wu S, Han G, Shu L, Wu H (2015) A tree-cluster-based data-gathering algorithm for industrial WSNs with a mobile sink. IEEE Access 3:381–396. https://doi.org/10.1109/ACC ESS.2015.2424452 14. Almi’ani K, Viglas A, Libman L (2010) Energy-efficient data gathering with tour lengthconstrained mobile elements in wireless sensor networks. In: Proceedings of conference local computer network LCN, pp 582–589. https://doi.org/10.1109/LCN.2010.5735777 15. Sinde R, Begum F, Njau K, Kaijage S (2020) Refining network lifetime of wireless sensor network using energy-efficient clustering and DRL-based sleep scheduling. Sensors (Switzerland) 20:1–26. https://doi.org/10.3390/s20051540 16. Singh J, Yadav SS, Kanungo V, Yogita Pal V (2021) A node overhaul scheme for energy efficient clustering in wireless sensor networks. IEEE Sens Lett 5:5–8 (2021). https://doi.org/ 10.1109/LSENS.2021.3068184. 17. Niu B, Qi H, Li K, Liu X, Xue W (2017) Dynamic scheming the duty cycle in the opportunistic routing sensor network. Concurr Comput Pract Exp 29:1–14. https://doi.org/10.1002/cpe.4196 18. Verma A, Prasad JS (2017) Optimum path routing algorithm using ant colony optimisation to solve travelling salesman problem in wireless networks. Int J Wirel Mob Comput 13:131–138. https://doi.org/10.1504/IJWMC.2017.088080 19. Chauhan V, Soni S (2020) Mobile sink-based energy efficient cluster head selection strategy for wireless sensor networks. J Ambient Intell Humaniz Comput 11:4453–4466. https://doi. org/10.1007/s12652-019-01509-6
A Machine Learning-Based Approach for Classifying Socially Isolated Individuals in a Pandemic Context Md Ulfat Tahsin , Sarah Jasim , and Intisar Tahmid Naheen
Abstract Social isolation has become a concerning factor after the COVID-19 pandemic, as the lockdown periods significantly restricted people’s interaction with the outside world. Existing studies have shown its correlation with a person’s mental and physical health. This paper focuses on identifying socially isolated individuals based on specific criteria, such as gender, family or self-earning, online or offline interactions, taking care of physical and mental health, and a couple more. The questionnaire also contains universally used PHQ9, K6, and UCLA LS3 scales to test individuals for depression, distress, and feeling of loneliness. All these input parameters were analyzed, combinations of these parameters were evaluated, and then the correlated parameters were used to prepare a machine learning-based web application that could asses an individual’s sense of feeling unwanted or socially isolated. Five supervised machine learning algorithms were employed, and it was found that the logistic regression model gave the best performance (73.6% accuracy). This study also features a web application that shows the successful implementation of our proposed approach. Keywords Social isolation · Mental health · COVID-19 restrictions · Machine learning
1 Introduction Social isolation has become a spiking concern since the COVID-19 outbreak has imposed limitations on in-person interactions and social activities. Social isolation refers to reduced or lacking social interaction and is a quantitative indicator of a M. U. Tahsin (B) · S. Jasim · I. T. Naheen Department of Electrical and Computer Engineering, North South University, Dhaka, Bangladesh e-mail: [email protected] S. Jasim e-mail: [email protected] I. T. Naheen e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_22
327
328
M. U. Tahsin et al.
smaller social network. The sense of social isolation causes individuals to feel withdrawn from society and restricts their social and interpersonal interactions, leading to a smaller social network. An individual’s social network provides emotional care and support, as well as other types of resources, such as education, work, and financial and physical support. The diminished social network of a socially isolated individual leads to a lack of social support, reduced social interactions, and interpersonal relationships, which has negative impacts on the well-being of the individual. Studies have shown that social isolation increases the risk of a wide array of physiological and psychological conditions such as dementia, depression, physical decline, and in the worst cases, mortality [1, 2]. It impacts the hormonal, cardiovascular, and emotional aspects of an individual, as well as their circadian rhythm [3]. During the COVID-19 pandemic and the subsequent lockdowns, many individuals started developing psychologically degraded symptoms, including depression, anxiety, anger issues, stress, and other mental health problems [2]. The pandemic escalated the cases of social isolation, which resulted in the deterioration of mental health conditions that led to increased suicidal thoughts, self-harm, and the tendency to harm others [4]. As per 2019 statistics, about 1.3% of people die of suicide, and the highest proportion belongs to the age of 75 years and above [5]. However, with the outbreak of the COVID-19 pandemic in 2020, the monthly suicide rates increased by 16% between July and October 2020 due to the detrimental effects of the pandemic and lockdown periods. A sense of social isolation during the pandemic also resulted in lower or decreased life satisfaction, food satisfaction, housing satisfaction, and work satisfaction. Socially isolated individuals also showed a distrust for the government and other institutions, as well as the tendency to abuse drugs [6]. The repercussions of the restriction periods could be seen in the mental health of individuals, causing issues like social isolation, loneliness, anxiety, and depression [7, 8]. Even after the restrictions had been eased or withdrawn, the problem of social isolation persists due to the lasting impacts of the pandemic on mental health. Such effects also contribute to developing a sense of social anxiety, difficulties in interaction and can cause individuals to encounter issues while transitioning back to normal and regular life [9–12]. This concept of social isolation leads us to a concerning point that requires necessary actions to deal with such mental health issues of people within a pandemic context. The existing literature in this domain emphasizes classifying socially isolated adults and individuals using a qualitative approach [1]. There are also several impactful research done by previous authors who used machine learning methods to examine the effects of social isolation on self-disclosure of loneliness [13]. We could also find the implementation of machine learning algorithms to detect depression [14, 15], deep learning models to predict loneliness [16], and also the impact of COVID-19 on the mental health of individuals [15, 17, 18]. Motivated by the existing works, this study focuses on this widely escalating problem of classifying social isolation among people resulting from a pandemic by employing an automated system. Therefore, this paper focuses on developing a platform with a machine learning model in the background that can remotely identify socially isolated individuals through a web application and alert them to seek support during a pandemic.
A Machine Learning-Based Approach …
329
Our application targets all ages, including youngsters, adolescents, adults, and the aging population, who are facing issues with social interaction or feeling isolated due to the lasting effects of the pandemic [19]. This inspired us to develop our machine learning-based web application that can be useful to such individuals to assess their mental health condition. They can thus determine whether to consult a mental health professional to prevent their condition from developing further complications. Our unique contribution in this paper are: • Utilizing an online survey dataset of psychological distress to classify social isolation in a pandemic and post-pandemic context. • A novel machine learning and web-based application that can be used for classifying socially isolated individuals. The rest of this paper is organized as follows: Sect. 2 presents a review of related work. Dataset details, pre-processing steps, and other tools and techniques fall under the methodology in Sects. 3. The machine learning algorithms in Sect. 4, results follow it in Sect. 5, execution platform and web implementation in Sect. 6, and the conclusion in Sect. 7 of the paper.
2 Related Work Quality of life in the present world can be assessed with several factors like health, wealth, security, environment, etc. However, research has shown that family, friends, and other interpersonal relationships are significant metrics for evaluating the quality of life [20]. It has also been proved by previous researchers like Christina Victor et al., which explored the effects of loneliness, social isolation, and living alone among elderly people [21]. Among many other mental health issues, social isolation problems seem to rise due to the lengthy home quarantine limitations imposed on different countries to control the spread of the deadly COVID-19 virus [22–24]. Calati et al., in their review study on suicidal thoughts, behavior, and social isolation, explored and discussed the primary social constructs, which include living alone, loneliness, and social isolation to be deliberately responsible for causing self-harm, suicidal attempts, and worst cases suicide itself [25]. Hui Yang et al., in their paper on loneliness prediction in older people, investigated the risks associated with loneliness [26]. Conversely, in our paper, we could figure out the problem of social isolation irrespective of the correlation with age that resulted from the COVID-19 pandemic because it has undoubtedly increased mental stress, depression, and loneliness among elders and young people [21]. Even after the restrictions have been eased or removed, many individuals struggle with psychologically degraded symptoms, such as social anxiety, as an aftermath of the pandemic and its effects. A study by M. H. Lim et al. found that many individuals struggled with elevated levels of social anxiety after the restrictions had been eased [11]. A similar study by J. T. K. Ho et al. showed that college students faced increased social anxiety in social interactions during their transition back to normal life [7]. These studies showed that those with pre-existing
330
M. U. Tahsin et al.
anxiety problems experienced the highest levels of social anxiety throughout the transitional phases. According to a review paper by Alan R. Teo et al., social isolation and social anxiety disorder were found to be closely related [27]. Different machine learning-based approaches are now being popularly implemented to figure out the harmful issues created due to mental health concerns among people. Machine learning algorithms have been proven to differentiate people who are vulnerable to suicidal risks efficiently. Scott R Braithwaite et al. have presented and validated some studies in this domain in their paper [28]. Nazmun Nessa Moon et al. implemented popular ML models, including the random forest classifier, Naive Bayes, and k-neighbor classifier, to explore the depression of employees in the job sector of Bangladesh [14]. Hui Yang and Peter A. Bath used XGBoost and LightGBM models to predict loneliness in elderly people [16]. Machine learning-based approaches have also been used to analyze the impact of COVID-19 on mental health, which Mostafa Rezapour et al. explored by implementing machine learning models such as decision trees, multinomial logistic regression, Naive Bayes, k-nearest neighbors, support vector machines, neural networks, random forests, gradient tree boosting, XGBoost, LightGBM, and statistical methods such as synthetic minority oversampling, and Chi-Squared Test on survey data collected from the Inter-university Consortium for Political and Social Research [29]. Cornelia Herbert et al. investigated the effects of the COVID-19 pandemic on the mental health of university students studying in Egypt or Germany by conducting an online survey and then applying machine learning algorithms to predict an individual’s personality and find its correlation with their mental health [18]. Kennedy Opoku Asare has also explored the use of machine learning models to predict depression by using behavioral markers from smartphone usage [15]. Sunhae Kim et al. researched the use of machine learning models and the LSNS scale to predict suicidal ideation of individuals due to social isolation on a dataset collected in South Korea [30]. Aditi Site et al. also used the LSNS scale and machine learning algorithms to predict the social isolation of the elderly by collecting data on their mobility using sensors [31]. In our research, LSNS6 scale was used to deduce the target variable. LSNS stands for Lubben’s Social Network Scale. Lubben et al. found a standard scale of six questions validated by performance testing on three European dwellers [32]. This is an already validated scale by clinical psychologists and experts. Lubben et al. concluded that the cut-off value of 12 can be perfectly justified as sensitive and specific enough to be considered for determining socially isolated individuals.
3 Methodology The different aspects followed for the methodology part are described in this section. This includes processing the collected dataset, deducing the target variable, cleaning, correcting, and performing the necessary modification, extracting useful features for the model, and designating the training and testing dataset. After these steps, the processed dataset is fed into the different machine learning algorithms. Finally, the
A Machine Learning-Based Approach …
331
Fig. 1 Proposed system architecture
best-performing model is deployed in our web application to be presented to the user. Our proposed system architecture along with its workflow is illustrated in Fig. 1.
3.1 Dataset Description In this section, the dataset is described in brief, along with its source, contributors, and how we further pre-processed it for our machine learning model. The dataset is collected from “The Mental Health Impact on COVID-19” from Japan, contributed by Tetsuya Yamamoto et al. It is a self-reported dataset collected using an online survey to investigate the psychological impacts of a moderate lockdown (non-coercive lockdown) in Japan following the proclamation of a state of emergency. Data on psychological discomfort, depression, stressors, and coping were gathered from 11,333 individuals [20]. No encoding was done from our side on the dataset as all the columns were converted to numeric form previously by the dataset contributors. To train the model and make predictions, 15 parameters were selected out of the 33 in the dataset (Table 1). ID column was initially dropped along with the label used as the target variable for classification. Among the selected parameters, all but one had 11,333 entries. Out of the 15 parameters, the ones with the highest correlation to the Label and LSNS6 score were chosen to train the machine learning models. The correlation cut-off value for input feature selection is 0.1, and it is calculated using the correlation matrix that is generated using scikit-learn, and is illustrated in Fig. 2. The “Income” column had missing values for which its correlation value is initially less than 0.1. The missing values in the “Income” column are handled using the imputation technique discussed in the following section. Table 1 describes 15 selected input features in brief. For features, “sex” and “married,” there were two options to select from—male/female and single/married, respectively. “Income” had nine options of earning range starting from, less than 2 million to more than 20 million, respectively. For k6, PHQ9, and UCLA_LS3, there is an option to manually input the score using a sliding bar in the web application. And all the remaining features had seven options, starting with 1 = not at all, and 7 = extremely.
332 Table 1 Input features and their meaning [20] Input feature name 1. UCLA_LS3 2. Optimism 3. Interaction_Online
4. Healthy_Diet 5. Interaction_Offline 6. Preventive_Behaviors
7. Exercise 8. Activity 9. PHQ9 10. K6 11. Healthy_Sleep 12. Married 13. Deterioration_Interact 14. Sex 15. Income
M. U. Tahsin et al.
Description/conveyed meaning Total score of the UCLA loneliness scale version 3 I thought about the future positively I interacted with my family or friends using online chat or video calling (except work or class) I took meals considering the nutrition balance. I interacted with my friends or friends on a face-to-face basis (except work or class) I spontaneously refrained from going out or altruistically took preventive behaviors (e.g. wearing a mask) to prevent coronavirus disease infection to my family or others I exercised for my health (whether indoors or outdoors) I engaged in activities such as hobbies with absorbing interest Total score of the Patient Health Questionnaire-9 Total score of the Kessler Psychological Distress Scale-6 I kept regular awakening time and bedtime approximately Marital status A personal relationship with a close person such as family or friends got worse Gender Annual household income
3.2 Missing Value Handling The dataset has missing or NaN values in the “Income” column. We used the imputation technique [33] to impute the missing values as it improved its correlation with the target variable. The correlation of the “Income” column with the target variable increased by 5.89% after the imputation technique was adopted to handle the missing values.
A Machine Learning-Based Approach …
333
Fig. 2 Feature importance
Fig. 3 Gender, marital status, and target label across the dataset
3.3 Exploratory Data Analysis (EDA) We have 15 input features. Thirteen are numerical, and the rest are categorical. We plotted histograms for the numeric features. Histograms are utilized to analyze and understand the distribution of the samples. As per the distribution and requirement of the input parameters, we employed proper scaling techniques as discussed in the “scaling technique” subsection. For the categorical features, we used count plots to check the prior imbalance and bias. We illustrated the count plots in our paper to show that our dataset is quite balanced, apart from some minor anomalies. And that is why we did not incorporate any data balancing technique into our methodology. We achieved these insights through the EDA. The illustrations are in Fig. 3.
334
M. U. Tahsin et al.
3.4 Scaling Techniques For scaling the input parameters, we implemented both standard and min-max scalers [34]. A standard scaler is used for the portion of the data frame that is normally distributed, which is “UCLA_LS3,” whereas the other columns are scaled using a min-max scaler, which are “PHQ9, and K6.” No scaling was done on the categorical features, which are for the inputs named “Optimism, Healthy_Diet, Activity, Healthy_Sleep, Income, Interaction_Online, Interaction_Offline, Preventive_Behaviors, Exercise, Deteriorating_Interact, Married, and Sex.” And scaling is also avoided for the random forest classifier for all the input variables.
3.5 Target Variable Our primary goal is to identify socially isolated individuals using the ML model. For that, we used Lubben’s Social Network Scale [32]. This scale is used to identify socially isolated individuals with the aid of an instrument. The instrument consists of two parts, named “family” and “friendship.” Under the “family” category comes three questions considering the people with whom the individual is related through birth, marriage, adoption, etc. Under the friendship category, three more questions consider all of the friends of the individuals who live in the neighborhood. Based on the previous research by James et al., the performance of an abbreviated version of the Lubben Social Network Scale among European Community-dwelling elderly people was evaluated [32]. It proved that the optimal cut-off score with a reliable enough sensitivity and specificity is 12. If the LSNS6 score of an individual based on the LSNS6 instrument is less than 12, the individual is considered socially isolated. A score above the cut-off value 12 identifies an individual as normal. We used the LSNS6 score column to create our target variable. The target variable or the label is a classifier labeled 1 when the individual LSNS6 score is less than 12 (indicating socially isolated) and 0 (indicating normal). The target variable has been named “Label.” This LSNS6-dependent label is used and incorporated with the web application to identify whether the individual user is socially isolated. The label column has a distribution ratio of 55–45 among the lonely and normal individuals, respectively.
4 Machine Learning Algorithms In this paper, five supervised classification [35] algorithms were used. Since the dataset did not have a large amount of data (11,333 data points), we only used machine learning algorithms and did not opt for deep learning-based models. Our model selection is justified by a very good balance between the training and testing accuracy, which shows that none of our selected models is overfitting the data. Besides, these
A Machine Learning-Based Approach …
335
models are commonly used by other researchers in this field or by the researchers’ community as demonstrated in the literature review section of this paper. The machine learning algorithms used are briefly stated below: 1. Logistic Regression: Logistic regression is a machine learning algorithm used to describe and analyze the relationship between a binary dependent variable and one or more independent variables. It uses a logistic or sigmoid function to convert the numerical results of each outcome into a probability between 0 and 1 [36]. The probabilities are then converted into either 0 or 1 (yes or no) to give the predictions. 2. K-Nearest Neighbor Classifier: KNN algorithm classifies a new data point into a category based on the class the nearest data points belong to [37]. First, the number of nearest neighbors that will be checked, k, is selected. The distance between the new data point and the closest point belonging to a particular class is calculated to classify the new data point. The most commonly used distance metric is the Euclidean distance. However, Manhattan distance and Hamming can also be used. 3. Random Forest Classifier: Random forest classifier uses an ensemble of uncorrelated decision trees [38] made with random subsets of the data set. Majority voting is then used to predict the model’s outcome [39]. 4. Naive Bayes: It is based on the Bayesian theorem of conditional probability [40]. In Naive Bayes, it is assumed that each attribute makes an equal and independent contribution to the outcome [41]. This project used Gaussian Naïve Bayes as most features are normally distributed. 5. Support Vector Machine: The SVM classifier algorithm considers all the data points in an N-dimensional space and creates an optimal hyperplane or a decision boundary between them to separate them into distinct classes. To find the optimal hyperplane, the SVM algorithm tries to maximize the margin between the data points and the hyperplane by using a Hinge loss function to minimize the cost function during training [42]. For each model, the hyperparameters were tuned to get the best results. Then the best-performing model was chosen to be deployed in the web application (Figs. 5 and 6), which is logistic regression. We also deployed SVM and Naive Bayes for evaluation purposes in our web application.
5 Results The accuracy metrics used for the result analysis are briefly discussed below followed by model performances in Table 2.
336
M. U. Tahsin et al.
Table 2 Testing and other accuracy metrics for each model Deployed Train Text Target Precision model accuracy accuracy (%) (%) Logistic regression KNN
0.734
0.736
0.734
0.734
SVM
0.757
0.735
Random 0.732 forest Naive bayes 0.734
0.739
0 1 0 1 0 1 0 1 0 1
0.694
0.74 0.73 0.75 0.72 0.74 0.72 0.73 0.72 0.75 0.64
Recall
f1
0.82 0.63 0.80 0.66 0.80 0.65 0.81 0.62 0.68 0.71
0.78 0.68 0.77 0.69 0.77 0.68 0.77 0.67 0.71 0.67
5.1 Accuracy Metrics Precision Precision metric is used to determine the percentage of correctly predicted positives. Precision is calculated as: Precision =
.
true positive true positive + false positive
(1)
Recall Recall metric is used to determine how many actual positive outcomes the model succeeded to determine out of all positive outcomes. Recall is calculated as: Recall =
.
true positive true positive + false negative
(2)
f1 score f1 score is obtained by combining precision and recall metrics into a single metric. This metric is used to solve problems relevant to imbalanced data. f1 score is calculated as: f1 score =
.
2 ∗ ( pr ecision ∗ r ecall) true positive + false positive
(3)
5.2 Model Results Table The train test split for all the deployed models was 80–20%. Table 2 shows the machine learning models used, the testing accuracy, and other performance metrics for each target variable. It is followed by a graph in Fig. 4 that illustrates the AUC scores for each model.
A Machine Learning-Based Approach …
337
Fig. 4 Showing area under curve (AUC) for the used ML models
The above results in Table 2 show that the best test accuracy is obtained from the logistic regression model, which is 73.6%. It also had the best overall results regarding precision, recall, and f1 score. Logistic regression was tuned with certain hyperparameters (solver = newton-c, penalty = l2, .C = 0.1, random_state = 1, max_iter = 1) to obtain the best results. Logistic regression is a fairly simple model that works well on linear datasets and binary classification [42, 43]. The other algorithms had performance metrics close to the logistic regression algorithm; however, the worst performance was obtained from the Naive Bayes classifier with an accuracy of 69.3%.
6 Execution Platform and Web Application Google Colaboratory was used to develop the machine learning model, a product from Google Research. Through this platform, it is possible to collaborate with other team members working on the same machine learning model. Python 3.8 programming language was used for coding. The web application was built using Streamlit, where the three best-performing models were deployed. A snapshot of the deployed user interface of the web application using Streamlit has been illustrated in Figs. 5 and 6.
338
M. U. Tahsin et al.
Fig. 5 Web application UI 1
Fig. 6 Web application UI 2
7 Conclusion This study brings forward the adverse effects of social isolation on people’s mental and physical well-being and how machine learning can identify socially isolated individuals in contemporary times. The imposition of lockdowns and quarantine periods due to the pandemic elevated the cases of social isolation and the risks associated with it and had lasting effects on the mental health of many individuals as an aftermath. The problem of social isolation continues to exist even in recent times, and often, people are unable to get the help they need due to feeling isolated. Social isolation is a significant concern as it is not only a personal problem but a societal one as well. Interpersonal connections are a fundamental part of society and how it functions and the disruption in a social network due to social isolation impacts society as a whole, since the resources and services shared by the social network are also nega-
A Machine Learning-Based Approach …
339
tively affected. Thus, a readily accessible web application that can identify socially isolated individuals and encourage them to seek necessary medical intervention can be beneficial. Our web application features a set of simple questionnaires that can predict whether an individual is socially isolated. The machine learning algorithm, logistic regression, gave the best overall results and thus was deployed in our web application. As part of our future work, we intend to integrate suicide prevention hotlines and provide a list of the nearest mental health counselors to improve our support strategy. Additionally, we plan on integrating an online learning mechanism for the machine learning algorithm that will improve the model’s performance over time. Data and Code availability statement: Data is open accessible for all by Yamamoto et al. [20]. Codes are available upon request. Acknowledgements The authors are grateful to Kazi Shawpnil from United International University for her input, detailed comments, and help with paper formatting that contributed to improving this paper.
References 1. Hwang J, Wang L, Siever J, Medico TD, Jones CA (2019) Loneliness and social isolation among older adults in a community exercise program: a qualitative study. Aging Ment Health 23(6):736–742. https://doi.org/10.1080/13607863.2018.1450835 2. Sepúlveda-Loyola W et al (2020) Impact of social isolation due to COVID-19 on health in older people: mental and physical effects and recommendations. J Nutr Health Aging 24(9):938–947. https://doi.org/10.1007/s12603-020-1500-7 3. Singer C (2018) Health effects of social isolation and loneliness. J Aging Life Care 28(1) 4. Prince M et al (2007) No health without mental health. The Lancet 370(9590):859–877. https:// doi.org/10.1016/S0140-6736(07)61238-0 5. Stone DM, Jones CM, Mack KA (Feb.2021) Changes in suicide rates—United States, 2018–2019. MMWR Morb Mortal Wkly Rep 70(8):261–268. https://doi.org/10.15585/mmwr. mm7008a1 6. Clair R, Gordon M, Kroon M, Reilly C (2021) The effects of social isolation on well-being and life satisfaction during pandemic. Humanit Soc Sci Commun 8(1) 7. Gualano MR, Lo Moro G, Voglino G, Bert F, Siliquini R (2021) Monitoring the impact of COVID-19 pandemic on mental health: a public health challenge? Reflection on Italian data. Soc Psychiatry Psychiatr Epidemiol 56(1):165–167. https://doi.org/10.1007/s00127-02001971-0 8. Uddin M, Shawpnil K, Mugdha SBS, Ahmed A (2023) A statistical synopsis of COVID19 components and descriptive analysis of their socio-economic and healthcare aspects in Bangladesh perspective. J Environ Public Health 9. Zheng L, Miao M, Lim J, Li M, Nie S, Zhang X (2020) Is lockdown bad for social anxiety in COVID-19 regions? A national study in the SOR perspective. Int J Environ Res Public. Health 17(12):4561. https://doi.org/10.3390/ijerph17124561 10. Ho JTK, Moscovitch DA (2022) The moderating effects of reported pre-pandemic social anxiety, symptom impairment, and current stressors on mental health and affiliative adjustment during the first wave of the COVID-19 pandemic. Anxiety Stress Coping 35(1):86–100. https:// doi.org/10.1080/10615806.2021.1946518
340
M. U. Tahsin et al.
11. Lim MH et al (2022) A global longitudinal study examining social restrictions severity on loneliness, social anxiety, and depression. Front Psychiatry 13:818030. https://doi.org/10.3389/ fpsyt.2022.818030 12. McLeish AC, Walker KL, Hart JL (2022) Changes in internalizing symptoms and anxiety sensitivity smong college students during the COVID-19 pandemic. J Psychopathol Behav Assess 44(4):1021–1028. https://doi.org/10.1007/s10862-022-09990-8 13. Hommadova Lu A, Mejova Y (2022) All the lonely people: effects of social isolation on self-disclosure of loneliness on Twitter. New Media Soc 146144482210999. https://doi.org/ 10.1177/14614448221099900 14. Moon NN, Mariam A, Sharmin S, Islam MM, Nur FN, Debnath N (2021) Machine learning approach to predict the depression in job sectors in Bangladesh. Curr Res Behav Sci 2:100058. https://doi.org/10.1016/j.crbeha.2021.100058 15. Opoku Asare K, Terhorst Y, Vega J, Peltonen E, Lagerspetz E, Ferreira D (2021) Predicting depression from smartphone behavioral markers using machine learning methods, hyperparameter optimization, and feature importance analysis: exploratory study. JMIR Health Health 9(7):e26540. https://doi.org/10.2196/26540 16. Yang H, Bath PA (2020) Predicting loneliness in older age using two measures of loneliness. Int J Comput Appl 42(6):602–615. https://doi.org/10.1080/1206212X.2018.1562408 17. Rezapour M, Elmshaeuser SK (2022) Artificial intelligence-based analytics for impacts of COVID-19 and online learning on college students’ mental health. PLOS ONE 17(11):e0276767. https://doi.org/10.1371/journal.pone.0276767 18. Herbert C, El Bolock A, Abdennadher S (2021) How do you feel during the COVID-19 pandemic? A survey using psychological and linguistic self-report measures, and machine learning to investigate mental health, subjective experience, personality, and behaviour during the COVID-19 pandemic among university students. BMC Psychol 9(1):90. https://doi.org/10. 1186/s40359-021-00574-x 19. Bu F, Steptoe A, Fancourt D (2020) Who is lonely in lockdown? Cross-cohort analyses of predictors of loneliness before and during the COVID-19 pandemic. Public Health 186:31–34. https://doi.org/10.1016/j.puhe.2020.06.036 20. Yamamoto T, Uchiumi C, Suzuki N, Yoshimoto J, Murillo-Rodriguez E (2020) The psychological impact of ‘Mild Lockdown’ in Japan during the COVID-19 pandemic: a nationwide survey under a declared state of emergency. Int J Environ Res Public Health 17(24):9382. https://doi. org/10.3390/ijerph17249382 21. Victor C, Scambler S, Bond J, Bowling A (2000) Being alone in later life: loneliness, social isolation and living alone. Rev Clin Gerontol 10(4):407–417. https://doi.org/10.1017/ S0959259800104101 22. John A, Pirkis J, Gunnell D, Appleby L, Morrissey J (2020) Trends in suicide during the covid-19 pandemic. BMJ m4352. https://doi.org/10.1136/bmj.m4352 23. Abbas J, Wang D, Su Z, Ziapour A (May2021) The role of social media in the advent of COVID-19 pandemic: crisis management, mental health challenges and implications. Risk Manag Healthc Policy 14:1917–1932. https://doi.org/10.2147/RMHP.S284313 24. Kumaravel SK et al (2020) Investigation on the impacts of COVID-19 quarantine on society and environment: preventive measures and supportive technologies. 3 Biotech 10(9):393. https:// doi.org/10.1007/s13205-020-02382-3 25. Calati R et al (2019) Suicidal thoughts and behaviors and social isolation: a narrative review of the literature. J Affect Disord 245:653–667. https://doi.org/10.1016/j.jad.2018.11.022 26. Yang H, Bath PA (2018) Prediction of loneliness in older people. In: Proceedings of the 2nd international conference on medical and health informatics, Tsukuba Japan, 2018, pp 165–172. https://doi.org/10.1145/3239438.3239443 27. Teo AR, Lerrigo R, Rogers MAM (May2013) The role of social isolation in social anxiety disorder: a systematic review and meta-analysis. J Anxiety Disord 27(4):353–364. https://doi. org/10.1016/j.janxdis.2013.03.010 28. Braithwaite SR, Giraud-Carrier C, West J, Barnes MD, Hanson CL (May2016) Validating machine learning algorithms for twitter aata against established measures of suicidality. JMIR Ment Health 3(2):e21. https://doi.org/10.2196/mental.4822
A Machine Learning-Based Approach …
341
29. Rezapour M, Hansen L (2022) A machine learning analysis of COVID-19 mental health data. Sci Rep 12(1):14965. https://doi.org/10.1038/s41598-022-19314-1 30. Kim S, Lee K (2022) The effectiveness of predicting suicidal ideation through depressive symptoms and social isolation using machine learning techniques. J Pers Med 12(4):516. https://doi.org/10.3390/jpm12040516 31. Site A, Vasudevan S, Afolaranmi SO, Lastra JLM, Nurmi J, Lohan ES (2022) A machinelearning-based analysis of the relationships between loneliness metrics and mobility patterns for elderly. Sensors 22(13):4946. https://doi.org/10.3390/s22134946 32. Lubben J et al (2006) Performance of an abbreviated version of the lubben social network scale among three European community-dwelling older adult populations. The Gerontologist 46(4):503–513. https://doi.org/10.1093/geront/46.4.503 33. Richman MB, Trafalis TB, Adrianto I (2009) Missing data imputation through machine learning algorithms. In: Haupt SE, Pasini A, Marzban C (eds) Artificial intelligence methods in the environmental sciences. Springer Netherlands, Dordrecht, 2009, pp 153–169. https://doi.org/ 10.1007/978-1-4020-9119-3_7 34. Ahsan M, Mahmud M, Saha P, Gupta K, Siddique Z (2021) Effect of data scaling methods on machine learning algorithms and model performance. Technologies 9(3):52. https://doi.org/ 10.3390/technologies9030052 35. Singh A, Thakur N, Sharma A (2016) A review of supervised machine learning algorithms. Int Conf Comput Sustain Glob Dev INDIAComp 1310–1315 36. LaValley MP (2008) Logistic regression. Circulation 117(18):2395–2399. https://doi.org/10. 1161/CIRCULATIONAHA.106.682658 37. Zhang Z (2016) Introduction to machine learning: k-nearest neighbors. Ann Transl Med 4(11):218. https://doi.org/10.21037/atm.2016.03.37 38. Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm for machine Learning. J Appl Sci Technol Trends 2(01):20–28. https://doi.org/10.38094/jastt20165 39. Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. In: Liu B, Ma M, Chang J, (eds) Information computing and applications, vol. 7473. Berlin, Heidelberg: Springer Berlin Heidelberg, pp 246–252. https://doi.org/10.1007/978-3-642-34062-8_32 40. Hayter AJ (2012) Probability and statistics for engineers and scientists, 4th edn. Brooks/Cole, Cengage Learning, Boston, MA 41. Webb, Geoffrey, Risto (2010) Naive Bayes. Encycl Mach Learn 713–714 42. Wang L (ed) (2005) Support vector machines: theory and applications. Springer, Berlin 43. Bahel V, Pillai S, Malhotra M (2020) A comparative study on various binary classification algorithms and their improved variant for optimal performance. In: IEEE region 10 symposium (TENSYMP). Dhaka, Bangladesh, pp 495–498. https://doi.org/10.1109/TENSYMP50017. 2020.9230877 44. Ohrnberger J, Fichera E, Sutton M (2017) The relationship between physical and mental health: a mediation analysis. Soc Sci Med 195:42–49. https://doi.org/10.1016/j.socscimed.2017.11. 008 45. Hall MA (1999) Correlation-based feature selection for machine learning 46. Netuveli G (2006) Quality of life at older ages: evidence from the English longitudinal study of aging (wave 1). J Epidemiol Community Health 60(4):357–363. https://doi.org/10.1136/jech. 2005.040071 47. Kirasich K, Smith T, Sadler B (2019) Random forest vs logistic regression: binary classification for heterogeneous datasets. SMU Data Sci Rev 1
Simulation on Natural Disaster Fire Accident Evacuation Using Augmented Virtual Reality G. A. Senthil, V. Mathumitha, R. Prabha, Su. Suganthi, and Manjunathan Alagarsamy
Abstract Disaster situations are dangerous, highly dynamic, and unpredictable. Human behavior after a disaster is difficult to predict because it is influenced by unknown and irrational factors. Unpredictable behavior deviates from what science refers to as “ration al choice.” Creating awareness without entering disaster situations does not provide a realistic sense of the situation. It is advised to use immersive simulators to enter artificial yet authentic world’s. It is difficult to have a sense of realistic disaster scenarios when raising awareness without actually experiencing them. It is advised to use immersive simulators to enter artificial yet authentic worlds. It is difficult to have a sense of realistic disaster scenarios when raising awareness without actually experiencing them. Immersive simulations are advised since they will enable users to enter virtual yet accurate environments of natural fire disaster. Simulations can depict dangerous and novel or unique circumstances without having to deal with the severe repercussions of an emergency or actual tragedy. Immersive learning opportunities like medical simulators are excellent teaching aids. Since they allow participants to experience emergency circumstances in a safe and engaging G. A. Senthil (B) · V. Mathumitha Department of Information Technology, Agni College of Technology, Chennai, India e-mail: [email protected] V. Mathumitha e-mail: [email protected] R. Prabha Deptartment of Electronics and Communication Engineering, Sri Sai Ram Institute of Technology, Chennai, India e-mail: [email protected] Su. Suganthi Department of Artificial Intelligence and Data Science, Sri Sai Ram Institute of Technology, Chennai, India e-mail: [email protected] M. Alagarsamy Department of Electronics and Communication Engineering, K. Ramakrishnan College of Technology, Trichy, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_23
343
344
G. A. Senthil et al.
manner, simulations can help us plan for emergencies and manage crises. The research proposal of this work is simulation of Mixed Reality (MR) is Integrated Augmented Virtual Reality (AVR). Keywords Augmented Virtual Reality (AVR) · Mixed Reality (MR) · Head mount display · Simulation · Awareness · Fire accident · Fire disaster · Emergency evacuation
1 Introduction A Novel Technology, such as Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR), together known as Xtended Reality (XR) as show in Fig. 1, is computer animation application areas that enable users to communicate with computer-generated visualized in both the real and the virtual worlds. VR scenes are entirely virtual environments with virtual components that conceal the actual world and replicate physical items, while AR scenarios are real environments with integrated but non-dominant virtual objects that can respond to the user and/or the scene [1]. The AR and VR applications, as seen in the image below, immerse the user in a virtual environment that resembles reality by leveraging communication systems that give and receive information and are worn as goggles, headsets, gloves, or body suits. AVR eliminates the requirement to visualize a tough circumstance in favor of a dynamic approach. It offers an excellent learning experience by instructing individuals in the most realistic fashion attainable, without the issue occurring in real life.
Fig. 1 Mixed reality for augmented virtual reality
Simulation on Natural Disaster Fire Accident Evacuation Using …
345
To practice evacuations in the event of a fire, a virtual reality system was developed. The numerical fire simulations used to simulate real-world conditions were used to study the spread of the virtual fire’s flames and smoke. To address the shortcomings of traditional smoke-spreading spreading models in the office, a multi-grid, multi-basestate database model was used. The flame and smoke are represented by particle systems and textured images. The technology immerses the user completely in a virtual world, with realistic interactions between the user and the virtual environment. The system can display the best evacuation procedures for building safety evaluations, and each correct move will be scored. However, because of the risk of high-temperature gas leaks or poisonous gas leaks, faulty electrical equipment, and short circuits, which can result in a fire breakout or asphyxiation, the operation and maintenance of such power sources in offices are inherently dangerous. As a result, it is critical to include appropriate safety regulations and safeguards in the workplace. Even with these efforts, fires continue to occur. Fire safety engineering concepts based on performance must include early awareness of the need for evacuation by residents in order to protect people, properties, and the environment from fire. Training for evacuations ought to be more realistic. Our goal is to create an AVRbased evacuation training system that incorporates gaming features while keeping evacuation training and game entirely distinct. The training system should also replicate fire accidents in unexpected situations. Prior to beginning the development, we developed an interest in monitoring participants in a VR fire accident simulator as they evacuate. Fire accidents may cause terrible damage and take many lives. Education about fire accident disasters aids in readiness. Have a strategy that includes everyone in your home. Examine all available exits and evacuation routes in your house. Families with kids might start making a floor plan of their home, including two exits from each room, as well as doors and windows. Mark the position of each smoke alarm as well [2–6]. Another way is to have presentations and exercise using learning materials. Its benefits include its simplicity and lack of time and expense on a wide scale. Nevertheless, the simulations and guidance may be partial (for example, by brief explanations or basic drawings), and the key lessons may be ineffective in a genuine catastrophe situation [7–10]. Additionally, fire-related event assessments show a relationship between delayed evacuation and death rates. Additionally, records from real fire evacuations demonstrate that non-evacuation jobs are commonly performed during fires. As a result, high-risk establishments must undertake frequent evacuation exercises and risk assessments. To solve this, we employ sophisticated virtual reality [11].
346
G. A. Senthil et al.
2 Related Work Saghafian et al. [12] existing system of fire safety analysis and human behavior in fire (HBiF) Virtual Reality (VR) plays major role by creating new scenarios increases the opportunity to gain more data about the HBiF. These data’s can be used for the development of the fire safety all over the world. This young method exceeds the old training method of data collection and keeps the process ecologically valid. Furtherly, a SWOT process is done for this clear and holistic development of this research. Kwegyir-Afful et al. [13] the virtual reality to create scenarios to analyze the premovement of the people by splitting them into two groups and made to perform the task in the VR. As 26 members per team were made to perform a task before the fire outbreak and evacuate the area. On the other hand, the other team was not made do any task until the fire outbreak their ultimate aim was to evacuate. By this process, the results were evaluated and VR created a unique opportunity to learn about the behavioral patterns of the human in different fire scenarios. Ramdan et al. [14], the EPR of the personals has been monitored using the IVR. A group of 54 were split into two teams, and both groups were made to evacuate of a gas power plant in the VR. The movements of both groups are noted carefully. But, there was no change in the evacuation time as expected by the researchers. This process shows that the IVR can be used to check for the EPR and to study well about the safety and ergonomics of the personals. Jung et al. [15] VRS is used to create a disaster preparedness training in a hospital. This may increase the ecological validity of the process in the hospital. VRS plays a major role in the DPT in hospitals for the medical staffs. This increases their knowledge about the fire evacuation process. Lorusso et al. [10] An existing school building’s emergency fire evacuating procedure is given. The findings indicate that the suggested simulated world’s system may be used to recreate fire emergency scenarios. It may be used to assist decision-makers in developing emergency procedures and as a training device for firemen to replicate emergency escape operations. This article “Virtual and Augmented Reality in Disaster Management Technology: A Research Analysis of the Last 15 Years” examines the application of virtual and augmented reality in disaster management technology during the previous 15 years. The paper emphasizes that virtual and augmented reality technologies are becoming more essential instruments in disaster management, delivering a variety of benefits such as cost-effectiveness, safety, and enhanced disaster response training. The article also emphasizes the need for continued research and development in this area to improve the effectiveness of these technologies in disaster management. The authors conclude that virtual and augmented reality technologies have the potential to improve disaster response efforts, but further research is needed to fully realize their potential. Overall, the article provides a comprehensive overview of the current state of virtual and augmented reality technologies in disaster management and highlights the need for continued research in this area [16–35].
Simulation on Natural Disaster Fire Accident Evacuation Using …
347
The paper “Flood Action VR: A Virtual Reality Framework for Disaster Awareness and Emergency Response Training” describes a virtual reality (VR) framework meant to improve flood water-related disaster awareness and emergency response training. The paper emphasizes the significance of proper disaster management training in mitigating the effects of natural catastrophes like floods. The Flood Action VR framework is designed to provide an immersive and interactive environment that allows users to experience a flood scenario and practice their response skills. The article presents the technical details of the Flood Action VR framework, including the software and hardware used, as well as the features and capabilities of the framework. The authors conclude that the Flood Action VR framework is an effective tool for improving disaster awareness and emergency response training for flood-related incidents and has the potential to be adapted for other types of natural disasters as well. Overall, the article provides a detailed overview of the Flood Action VR framework and its potential applications for disaster management training [13, 36–44].
3 Proposed Work 1. The proposed work of this article is to give a real situation in virtual environments for real-time assessments which guide people to evacuate themselves in case of any fire accidents in the workplace with the help of advanced virtual reality in mimicking real situations. 2. The system proposed would be excellent for instructing individuals on fire safety drills. One of the exercises that this paper most strongly reinforces is PASS (Pull, Construction Operation [19]). fire extinguisher training (Aim, Squeeze, Sweep). The user will be able to fully immerse himself/herself in the virtual world and save others by employing the fire safety exercises taught by the Virtual Reality (VR) environment. For this project, the physical hardware will be an Oculus Quest 2. 3. The Oculus Quest Headset would let the user wield a fire extinguisher physically and feel it react to their controls while doing so in the virtual world. 4. By displaying a digital world, VR can be used to conveniently simulate any kind of scenario, which reduces the financial and other costs of preparations for a mock drill for disaster management training that is typically not possible or feasible to recreate either due to being inherently unsafe, prohibited by regulations, or requiring significant resources. 5. Depending on how realistic a simulation is, it may also assist the responder to feel less traumatized, enabling them to think more clearly when it matters most for their safety.
348
G. A. Senthil et al.
4 Methodology The advanced virtual reality model is mimicking the real situation in virtual environment to evacuate people in fire accident simulator. A fast frame rate and short latency are essential for the experience of immersion in virtual reality. We discovered that the virtual reality fire accident simulator provides realistic pseudo-evacuation experiences and alters participant evacuation behaviors based on the presence of an accompanier in a preliminary comparative experiment comparing one-participant and two-participant scenarios. The participant will experience a fire accident in the real world, and a mark will be generated for every correct move.
4.1 Design for Fire Accident Environment Evacuation Figure 2 Creating a virtual environment for fire safety training necessitates a number of steps, including 3D modeling of the objects to be used. Object and surface texturing, environment design, sound design, and game interface programming are all areas of expertise.
4.1.1
Augmented Virtual Reality Simulated
Augmented Virtual Reality (AVR) can be a powerful tool to simulate disaster situations and prepare responders for real-life emergencies. Here are some ways AVR which can be used to simulate disaster situations:
Fig. 2 Proposed architecture diagram
Simulation on Natural Disaster Fire Accident Evacuation Using …
349
(a) Training: AVR can be used to simulate various disaster scenarios such as earthquakes, floods, and fires. This allows responders to simulating their emergency response skills in a safe and controlled environment. For example, firefighters can practice navigating a burning building or rescuing victims in a simulated fire scenario. (b) Planning: AVR can be used to simulate disaster scenarios for planning and preparedness purposes. Emergency responders can use VR simulations to visualize and plan for various disaster scenarios, identify potential risks, and develop response strategies. (c) Coordination: AVR can be used to facilitate coordination and communication among responders during a disaster. In a AVR simulation, responders from different agencies can work together to simulate a coordinated response to a disaster. (d) Assessment: AVR can be used to assess the impact of a disaster on infrastructure and buildings. For example, a VR simulation can be used to evaluate the structural integrity of buildings after an earthquake or hurricane. Overall, AVR provides a convenient and immersive way to simulate disaster situations and prepare responders for real-life emergencies. By using AVR simulations, emergency responders can develop the skills and strategies they need to respond effectively to disasters and save lives.
4.1.2
Creation of Virtual Environment for Fire Safety Training
Follow these procedures to establish a virtual environment for fire safety training in catastrophe situations: A. Define the Learning Objectives Determine the specific learning objectives that you want to achieve through the virtual environment. For example, the learning objectives could be to teach responders how to use fire extinguishers, how to evacuate a building in a fire emergency, or how to assess the risk of a fire. 2 Choose a VR Platform Choose a VR platform that best fits your needs and budget. There are several VR platforms available such as Unity, Unreal Engine, and A-Frame. Design the environment: Design the virtual environment to simulate a realistic fire emergency scenario. The environment should include elements such as smoke, fire, alarms, and obstacles that responders may encounter during a real-life fire emergency. 3 Develop Interactive Features Develop interactive features that allow responders to interact with the virtual environment. For example, responders can practice using fire extinguishers by aiming and
350
G. A. Senthil et al.
spraying them at virtual fires, or they can practice evacuating a building by following a designated path and avoiding obstacles. Incorporate feedback and assessment: Incorporate feedback and assessment mechanisms that provide real-time feedback to responders on their performance. For example, you can include assessments such as time taken to evacuate the building or accuracy in using a fire extinguisher. 4 Test and Evaluate Test the virtual environment with responders and evaluate its effectiveness in achieving the learning objectives. Make necessary adjustments based on feedback and evaluation results. 5 Deploy and Train Once the virtual environment is finalized, deploy it to the target audience and provide training on how to use the VR platform and navigate the virtual environment. Overall, creating a virtual environment for fire safety training requires careful planning, design, and development to ensure that the virtual environment is effective in achieving the learning objectives and preparing responders for real-life fire emergencies.
4.1.3
Spatialization
Spatialization is the process of creating and positioning sounds in 3D space to create a realistic auditory environment. In AVR-based simulation for disaster situations, spatialization is important to create an immersive and realistic experience for responders. Here are some ways to achieve spatialization for AVR-based simulation for disaster: 1. Binaural audio: Binaural audio uses two microphones placed inside a dummy head to capture sounds as they would be heard by human ears. When the binaural audio is played back in an AVR simulation, it creates a realistic 3D auditory experience for the user. 2. Ambisonics: Ambisonics is a spatial audio technique that captures sound from multiple directions and positions it in a 3D space. Ambisonics can be used to create an immersive and realistic auditory environment in an AVR simulation. 3. Head-related transfer function (HRTF): HRTF is a mathematical function that models how sound waves interact with the human head and ears. By applying HRTF to sounds in an AVR simulation, the sounds can be positioned in a 3D space and heard as they would be in real life. 4. Real-time processing: Real-time processing techniques can be used to process sound in an AVR simulation in real-time, based on the user’s head movement and position. This creates a dynamic and immersive auditory environment that responds to the user’s movements.
Simulation on Natural Disaster Fire Accident Evacuation Using …
351
Fig. 3 Fire extinguisher 3D model
By incorporating spatialization techniques such as binaural audio, Ambisonics, HRTF, and real-time processing, an AVR-based simulation for disaster situations can create a realistic and immersive auditory environment that enhances the overall user experience and prepares responders for real-life emergency situations.
4.1.4
3D Modeling
Blender software is utilized for modeling, and it is necessary to have a basic understanding of 3D modeling. There are two types of models: low-poly mesh and highpoly mesh Fig. 3. Texturing software is used in the baking process to make low poly appear high poly.
4.1.5
Model Texturing
Texturing is done with Substance Painter 2. To begin, the low-poly and high-poly meshes must be imported into in the software. After that, the texture maps (normal, curvature, ambient occlusion, and height) are created by baking a high-poly mesh over a low-poly mesh. These texture maps respond differently in various lighting scenarios, adding reflection details to create lifelike materials. Creating an ID map for the mesh to separate each component and applying the materials presets created in the previous process, or performing a custom paint or texture job. After that, the textures should be exported to the environment.
352
G. A. Senthil et al.
Fig. 4 Office environment design
4.1.6
Environment Design
Figures 4 and 5 Unity 2019.4.40f1 is a Game Engine that uses programming to organize and bring game objects to life. Importing low-poly meshes and textures into the engine is part of the office environment design process and post-earthquake collapsed building design. Using Unity Engine’s material creator, the appropriate materials are generated from the above textures. The material is then applied to the mesh. Configure interior lighting such as direction light (sunlight), skylight, and spotlight to create an environment. Lighting methods are classified as static or dynamic. We used static lighting because dynamic lighting is resource intensive and does not produce the same level of realism.
4.1.7
Sound Design
The sounds of sparks, office ambiance, fire, a fire alarm, and other things were recorded. A special microphone is used to record and edit audio using the audio software iTunes Garage band to achieve spatialization.
4.1.8
Game Programming
Unity uses C# programming. The primary scripting language in Unity is C#. You can also write your code in JavaScript, but most people prefer C# because it is simpler. Unity only supports object-oriented scripting languages.
Simulation on Natural Disaster Fire Accident Evacuation Using …
353
Fig. 5 Post-earthquake collapsed building design
5 Simulation Methodology The simulation was created using UNITY 3D Engine, an open-source game engine that allows for integrated gameplay design, modeling, programming, and texturing. The engine supports C# programming, which makes it easy to create class-based objects’ in-game and design environments that have different characteristics but can still inherit some common ones.
5.1 Design Layout The first scenario depicts a generic office building. In order to recreate a relatable layout, the architectural layout plan of general building spaces was reviewed to identify the key elements present in most of them, such as elevators, meeting rooms with furniture, lobbies, fire exits, and so on. The simulation employs universal signs and symbols as guiding references, including symbolic instructions such as those of fire. This aids in symbol recognition and aids in breaking down language barriers. The workspace is designed in UNITY Editor, which allows static meshes to be transformed into preferable structures with materials. Items such as lamps, furniture, and so on are added to the rooms to distinguish the layout from its basic structure. Some of these items can be interacted with in a variety of ways.
354
G. A. Senthil et al.
5.2 Character Controlling The trainee can view the simulation in first-person mode, i.e., through the eyes of the character. A controller connected to the Android device via Bluetooth controls the character’s movement. The movement is forward, backward, left, and right, with one action button for crouching and another for interacting with intractable objects like tools around you. The simulation also includes head tracking, which means that the view visible to the trainee changes depending on where he or she turns the head, which is based on the gyro sensor built into the Android device. Cross Platform C# was used for scripting on MonoDevelop. Scripting is used to control all aspects of the game, including character movement and animation, as well as camera rotation. The game’s player movement was to be controlled by the phone’s head tracking and the use of a Bluetooth Controller to move throughout all directions. The controller was also used to control the players’ interactions with objects in their environment. The “Mobile VR Movement Pack” by “In Your Face Games” was downloaded from unity assert store to use the Bluetooth controller in the game because it contained prefabs that could be configured in Unity to suit our needs. Trigger colliders were used to trigger interactions with objects. Full 3D spatial sound, real-time mixing and playback, mixer architectures, snapshots, preconfigured effects, and other audio capabilities are available in Unity. Figure 5 depicts a postearthquake collapsed building design that may be visualised using the Oculus Quest head-mounted device, as seen in Fig. 6. Figure 7 shows the basic organization structure. Users can utilize a database and a document management system, respectively, to handle information about various surroundings and equipment under the “Information Management” section. Users can create new scenarios, model fire behavior, and manage the outcomes of fire evacuation drills by using the “Fire Simulation” component. The equipment and environment Fig. 6 Oculus quest 2 headset
Simulation on Natural Disaster Fire Accident Evacuation Using …
355
Fig. 7 Basic organizational structure
management systems then use the information from the “Fire Simulation” branch to guide their operations. This can give users more information about how to enhance their disaster management plans.
6 Result and Discussion An Augmented Virtual Reality system results in an inexpensive fire evacuation drill in the virtual environment. It is used to conduct safe evacuation training. Our Virtual Reality-based solution is more effective in training/aware general people of fire accident disasters than normal training. Our simulation is more realistic and threading, and people can experience real disaster situations in reel. It will create a 90% realistic disaster situation.
356
G. A. Senthil et al.
Fig. 8 Fire evacuation inside building office
If people experience our training once they can react according to when it comes to real life. General people must be aware of rescue measures that would help to guide others in a disaster situation. In the existing solution, we can train people using video games, but it does not give real-time experience. In our proposed solution, they can learn some rescue measures. Figure 8 shows fire evacuation inside building office which is a still image from a AVR-based simulation for disaster management. The image shows an office building with several floors, and people are evacuating the building due to a simulated fire. The environment is designed to simulate a realistic office building, with appropriate lighting, textures, and details such as desks, chairs, and computers. The people in the scene are depicted as semi-transparent to indicate that they are in motion. Some people are shown walking calmly toward the exits, while others are seen running away from the fire. There are smoke and fire effects in the environment to simulate a realistic fire emergency. The simulation is designed to provide a realistic training environment for responders to practice their skills in a safe and controlled setting. The image conveys the seriousness of the situation and the need for quick and efficient evacuation to ensure the safety of people in the building. Figure 9 shows fire evacuation in car parking which is a still image from a AVRbased simulation for disaster management. The image shows a car parking lot with several parked cars, and people are evacuating the area due to a simulated fire. The environment is designed to simulate a realistic car parking area, with appropriate lighting, textures, and details such as trees and benches. The people in the scene are depicted as semi-transparent to indicate that they are in motion. Some people are shown walking briskly toward the exits, while others are seen running away from the fire. The simulation is designed to provide a realistic training environment for responders to practice their skills in a safe and controlled setting. The image conveys the urgency and seriousness of the situation and the need for quick and efficient evacuation to ensure the safety of people in the area.
Simulation on Natural Disaster Fire Accident Evacuation Using …
357
Fig. 9 Fire evacuation in car parking
7 Conclusion The system proposed would be excellent for instructing individuals on fire safety drills. The PASS (Pull, construction operation Aim, Squeeze, Sweep) training for fire extinguishers is one of the most significant drills reinforced through. The user would be able to fully immersive himself/herself in the virtual world and save others by utilizing the fire safety drills taught through the Augmented Virtual Reality (VR) environment. The project’s physical hardware will be an Oculus Quest2 Headset. By creating a digital world, virtual reality (VR) can be used to conveniently simulate any kind of scenario, reducing the financial and non-financial costs of preparations for a mock drill for disaster management training that is typically not possible or feasible to recreate either due to being inherently unsafe, prohibited by regulations, or requiring significant people and assets. Depending on how realistic a simulation is, it may also assist the responder to feel less traumatized, allowing for clearer thinking during crucial times for self-safety.
8 Future Work The suggested AR and VR system is applicable to a wide range of public facilities. The emergency fire scenario in the Augmented Virtual Reality world was built in this study, using a school building as an example, and the simultaneous development of the fire with smokes and the evacuation mechanics of residents were combined. As a consequence, the resultant VR system allows users to experience evolving emergency fire scenarios for the specified industries gas leakage for fire evacuation process. Further research will focus on validating emergency signals and studying how smoke impacts their visibility, as well as human responses in fire emergency scenarios, using the proposed VR platform. Additional groundbreaking work will focus on emergency preparation research that will use management insights, such as analyzing fire safety or the performance of emergency response systems. Furthermore, the Augment Reality and Virtual Reality environment created Unity3D using
358
G. A. Senthil et al.
Machine Learning algorithm allows for the mixing of many classifications of simulations from disparate programs in different applications.
References 1. Anderson A, Boppana A, Wall R, Acemyan CZ, Adolf J, Klaus D (2021) Framework for developing alternative reality environments to engineer large, Complex systems. Virtual reality 25:147–163 https://doi.org/10.1007/s10055-020-00448-4 2. Drury J, Cocking C, Reicher S, Burton A, Schofield D, Hardwick A et al (2009) Cooperation versus competition in a mass emergency evacuation: A new laboratory simulation and a new theoretical model. Behavior Res Methods 41:957–970 3. Kretz T, Hengst S, Arias AP, Friedberger S, Hanebeck UD (2013) Using a telepresence system to investigate route choice behavior. In: Kozlov V, Buslaev A, Bugaev A, Yashina M, Schadschneider A, Schreckenberg M (eds) Traffic and granular flow ‘11. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39669-4_14 4. Ribeiro J, Almeida JE, Rossetti RJF, Coelho A, Coelho AL (2012) Using serious games to train evacuation behaviour. In: 7th Iberian conference on information systems and technologies (CISTI 2012), Madrid, Spain, 2012, pp 1–6 5. Saunders WL (2001) Decision making model of behaviour in office building fire evacuations. PhD dissertation, Victoria University of Technology 6. Loomis JM, Blascovich JJ, Beall AC (1999) Immersive virtual environment technology as a basic research tool in psychology. Behav Res Methods Instrum Comput 31:557–564 7. Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference 8. Patterson R, Winterbottom MD, Pierce BJ (2006) Perceptual issues in the use of head-mounted visual displays. Human Factors 48:555–573 9. Stanney KM, Kennedy RS (2010) Simulation sickness. In: Hancock PA, Vincenzi DA, Wise JA, Mouloua M (eds) Human factors in simulation and training, CRC Press, Boca Raton, Florida, pp 117–124 10. Lorusso P, De Iuliis M, Marasco S, Domaneschi M, Cimellaro GP, Villa V (2022) Fire emergency evacuation from a school building using an evolutionary virtual reality platform. Buildings 12(2):223. https://doi.org/10.3390/buildings12020223 11. Roopa D, Prabha R, Senthil GA (2021) Revolutionizing education system with interactive augmented reality for quality education. Mater Today: Proc 46(Part 9)3860–3863. ISSN 22147853. https://doi.org/10.1016/j.matpr.2021.02.294 12. Saghafian M, Laumann K, Akhtar RS, Skogstad MR (2020) The evaluation of virtual reality fire extinguisher training, Front Psychol 11:1664-1078. https://doi.org/10.3389/fpsyg.2020. 593466 13. Kwegyir-Afful E (2022) Effects of an engaging maintenance task on fire evacuation delays and presence in virtual reality. Int J Disaster Risk Reduction 67:102681. ISSN 2212-4209. https:// doi.org/10.1016/j.ijdrr.2021.102681 14. Ramadan MZ, Ghaleb AM, Ragab AE (2020) Using electroencephalography (EEG) power responses to investigate the effects of ambient oxygen content, safety shoe type, and lifting frequency on the worker’s activities. BioMed Res Int 2020 15. Jung Y (2022) Virtual reality simulation for disaster preparedness training in hospitals: integrated review. J Med Internet Res 24(1):e30600 16. Barfield W, Danas E (1996) Comments on the use of olfactory displays for virtual environments. Presence: Teleoperators Virtual Environ 5:109–121 17. Richard E, Tijou A, Richard P, Ferrier J-L (2006) Multi-modal virtual environments for education with haptic and olfactory feedback. Virtual Reality 10:207–225
Simulation on Natural Disaster Fire Accident Evacuation Using …
359
18. Hülsmann F, Mattar N, Fröhlich J, Wachsmuth I (2013) Wind and warmth in virtual reality– requirements and chances. In: Proceedings of the workshop Virtuelle & Erweiterte Realität 2013, vol 10 19. Boyle LN, Lee JD (2010) Using driving simulators to assess driving safety. Accid Anal Prev 42:785–787 20. Meyerbröker K, Emmelkamp PM (2010) Virtual reality exposure therapy in anxiety disorders: a systematic review of process-and-outcome studies. Depress Anxiety 27:933–944 21. Wiederhold BK, Wiederhold MD (2010) Virtual reality treatment of posttraumatic stress disorder due to motor vehicle accident. Cyberpsychol Behav Soc Netw 13:21–27 22. Anderson CA, Bushman BJ (1997) External validity of “trivial” experiments: the case of laboratory aggression. Rev Gen Psychol 1:19–41 23. Peperkorn HM, Alpers GW, Mühlberger A (2013) Triggers of fear: perceptual cues versus conceptual information in spider phobia. J Clin Psychol 70(7):704–714. https://doi.org/10. 1002/jclp.22057. Epub 2013 Dec 18. PMID: 24353196 24. Kobes M, Helsloot I, de Vries B, Post J (2010) Exit choice, (pre-)movement time and (pre-) evacuation behaviour in hotel fire evacuation—Behavioural analysis and validation of the use of serious gaming in experimental research. Procedia Eng 3:37–51 25. Malthe F, Vukancic (2012) Virtual Reality och människors beteende vid brand [Virtual Reality and human behavior in fire]. Lund University LUCATORG: 011033007, ISSN: 1402-3504 26. Johansson J, Petersson L (2007) Utrymning och vägval i Virtual Reality. In: Mühlberger H, Bülthoff HH, Wiedemann G, Pauli P (eds) Virtual reality for the psychophysiological assessment of phobic fear: responses during virtual tunnel driving, vol 19. Psychological Assessment, pp 340–346 27. Calvi A, De Blasiis MR (2011) How long is really a road tunnel? Application of driving simulator for the evaluation of the effects of highway tunnel on driving performance. In: 6th International conference traffic and safety in road tunnels, Hamburg, Germany 28. Calvi A (2010) Analysis of driver’s behaviour in road tunnels: a driving simulation study. In: 2010 International symposium on safety science and technology, Zhejiang, China, vol 8, pp 1892–1904 29. Törnros J (1998) Driving behaviour in a real and a simulated road tunnel—a validation study. Accid Anal Prev 30:497–503 30. Hirata T, Yai T, Tagakawa T (2007) Development of the driving simulation system MOVIC-T4 and its validation using field driving data. Tsinghua Sci Technol 12:141–150 31. Shechtman O, Classen S, Awadzi K, Mann W (2009) Comparison of driving errors between on-the-road and simulated driving assessment: a validation study. Traffic Inj Prev 10:379–385 32. Heliovaara S, Kuusinen J-M, Rinne T, Korhonen T, Ehtamo H (2012) Pedestrian behavior and exit selection in evacuation of a corridor—an experimental study. Saf Sci 50:221–227 33. Rüppel U, Schatz K (2011) Designing a BIM-based serious game for fire safety evacuation simulations. Adv Eng Inform 25:600–611 34. Duarte E, Rebelo F, Teles J, Wogalter MS (2014) Behavioral compliance for dynamic versus static signs in an immersive virtual environment. Applied Ergonomics (in press) 35. Lo SM, Fang Z, Lin P, Zhi GS (2004) An evacuation model: the SGEM package. Fire Saf J 39(3):169–190 36. Shen T (2005) ESM: a building evacuation simulation model. Build Environ 40(5):671–680. ISSN 0360-1323. https://doi.org/10.1016/j.buildenv.2004.08.029 37. Li H, Tang W, Simpson D (2004) Behaviour based motion simulation for fire evacuation procedures. In: Proceedings of the theory and practice of computer graphics 2004. Bournemouth, United Kingdom, 2004, pp 112–118 38. Gwynne S, Galea ER, Lawrence PJ, Filippidis L (2001) Modelling occupant interaction with fire conditions using the building EXODUS evacuation model. Fire Saf J 36(4):327–357 39. Thompson PA, Marchant EW (1995) A computer model for the evacuation of large building populations. Fire Saf J 24:131–148 40. Shih N, Lin C, Yang C (2000) Virtual-reality-based feasibility study of evacuation time compared to the traditional calculation method. Fire Saf J 34(4):377–391
360
G. A. Senthil et al.
41. Li W, Jin Y, Li J, Guo G, Peng G, Chen C (2004) Collaborative forest fire fighting simulation. In: Sun J (ed) Proceedings of SPIE * the international society for optical engineering, vol 5444, Fourth international conference on virtual reality and its applications in industry. Tianjin, China, 2004: pp 467–473 42. Cha M, Han S, Lee J, Choi B (2012) A virtual reality-based fire training simulator integrated with fire dynamics data. Fire Saf J 50(5):12–24 43. Ginnis AI, Kostas KV, Politis CG, Kaklis PD (2010) VELOS: A VR platform for shipevacuation analysis. Comput Aided Des 42(11):1045–1058 44. Andree K, Kinateder M, Nilsson D (2013) Immersive virtual environment as a method to experimentally study human behaviour in fire. In: 13th International conference and exhibition on fire science and engineering, Royal Holloway College, University of London, UK, 2013, pp 565–570
Verifi-Chain: A Credentials Verifier Using Blockchain and IPFS Tasfia Rahman, Sumaiya Islam Mouno, Arunangshu Mojumder Raatul, Abul Kalam Al Azad, and Nafees Mansoor
Abstract Submitting fake certificates is a common problem in Southeast Asia, which prevents qualified candidates from getting the jobs they deserve. When applying for a job, students must provide academic credentials as proof of their qualifications, acquired both inside and outside the classroom. Verifying academic documents before hiring is crucial to prevent fraud. Employing blockchain technology has the potential to address this issue. Blockchain provides an electronic certificate that is tamperproof and non-repudiable, making it difficult for students to manipulate their academic credentials. This paper presents a prototype for an academic credential verification model that leverages the security features of blockchain and InterPlanetary File System (IPFS). Certificates are temporarily stored in a database before being transferred to IPFS, where a unique hash code is generated using a hashing algorithm. This hash code serves as the certificate’s unique identity and is stored in the blockchain nodes. Companies can verify an applicant’s credentials by searching for the applicant and accessing their already verified certificates. Utilizing IPFS as a middleman storage platform lowers the expenses of directly storing massive data on the blockchain. To sum it up, the proposed solution would make the process of
Tasfia Rahman, Sumaiya Islam Mouno, Arunangshu Mojumder Raatul, Abul Kalam Al Azad and Nafees Mansoor: These authors are contributed equally to this work. T. Rahman · S. I. Mouno · A. M. Raatul · N. Mansoor (B) Computer Science and Engineering, University of Liberal Arts Bangladesh (ULAB), Dhaka, Bangladesh e-mail: [email protected] T. Rahman e-mail: [email protected] S. I. Mouno e-mail: [email protected] A. M. Raatul e-mail: [email protected] A. K. Al Azad North South University, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_24
361
362
T. Rahman et al.
certificate verification more efficient, secure, and cost-effective. It would save time and resources that would otherwise be used to manually verify certificates. Keywords Blockchain · IPFS · Academic credentials · Secure · Verification
1 Introduction The basic structure of the mainstream education system includes primary, secondary, and tertiary [1]. Hence, after the completion of primary and secondary education, students get enrolled in universities and can pursue further studies based on their preferences. Furthermore, students participate in a variety of extracurricular activities throughout their academic years. This means that they receive a plethora of certificates throughout each stage of their education journey. The problem regarding this situation is that these certificates are at an increased risk of being lost or damaged, and there is no regulated system in place to store all these certificates digitally or verify their authenticity. Many countries have huge populations, and with millions of graduates applying for jobs each year, individually verifying the credentials can be extremely timeconsuming and taxing. It is exceedingly challenging to manage and authenticate such a vast amount of records, leading to an unfavorable situation where falsified or replicated certificates can be created through tampering. This aspect has given rise to an increasing number of fraudulent organizations that have been engaging in the unethical practice of forging academic certificates. Unfortunately, as technology advances, it has become more difficult to distinguish between genuine and forged certificates [2]. To address this issue, this proposed system utilizes blockchain, a new emerging sophisticated technology. So, what are the benefits of using blockchain? The immutability and tamperproof nature of blockchain make it a very robust system to use [3]. Even if the state of the data is compromised, it can detect the change in less than a second. In blockchain, data or nodes are validated only when multiple parties approve them [4]. As a result, the system would always be reliable and authenticated. It is not only secure but extremely transparent about the transactions occurring in the system and there is also a traceability aspect to it. Blockchain technology has rapidly gained popularity in recent years as a novel and promising approach to securely store, share, and manage data. Originally developed as a distributed ledger technology to support cryptocurrencies such as Bitcoin, blockchain has now evolved to become a versatile and robust platform for a wide range of applications beyond finance, including supply chain management, healthcare, real estate, voting systems, and more [5]. At its core, blockchain technology offers a decentralized and immutable database that is resistant to tampering and fraud. It achieves this by employing a consensus mechanism that ensures the integrity of the ledger and eliminates the need for intermediaries or central authorities to manage the data. This makes blockchain technology an ideal solution for use cases that require
Verifi-Chain: A Credentials Verifier Using Blockchain and IPFS
363
high levels of trust, transparency, and security. By using IPFS, files can be stored and accessed in a secure, decentralized, and censorship-resistant way [6]. The integration of IPFS with blockchain technology provides an additional layer of security and immutability to the file storage system [7]. Since everything is stored digitally and all the certificates are verified before being stored in the IPFS, students do not have to worry about losing or damaging their certificates. Furthermore, it streamlines the process for companies to view these verified certificates and hire eligible applicants accordingly. As a result, this proposed system closes the gaps in the current system and provides us with an effective and tangible solution. The proposed solution aims to design and implement a decentralized certificate verification system using blockchain and IPFS technology. The system enables secure and tamperproof storage of digital certificates, allowing for easy and efficient verification of the authenticity and integrity of the certificates by authorized parties. By utilizing blockchain technology, data stored on the network is guaranteed to be unchangeable and transparent. Additionally, IPFS allows for decentralized and distributed storage of certificates. The system also incorporates a user-friendly interface for convenient access and utilization. The objective of the paper is to chart the present status of understanding regarding blockchain and IPFS. After the constructive Abstract and Introduction, there is a Literature review in Sect. 2 along with this Proposed system in Sect. 3. In Sect. 3 there are some subsections that briefly describe the system module, architecture, and so on. Ultimately, the intention of this paper is to illuminate the current condition of the verification process and suggest a strong system for future implementation.
2 Related Work The primary objective of this literature review is to delve into the current state of knowledge regarding academic certificate verification using blockchain technology, taking into account a multitude of factors such as various approaches, strategies, methodologies, and techniques employed in previous research studies, with the ultimate goal of gaining a comprehensive understanding of the field and identifying potential areas for further investigation and improvement. A thorough examination of the application of blockchain technology in smart contracts, including the basic principles of blockchain, such as decentralization, consensus algorithms, and cryptographic security is explored. There is also a detailed overview of smart contracts and their uses, along with the discussion of the potential advantages and disadvantages of blockchain technology [8]. An innovative method of data sharing through blockchain technology is introduced. The study proposes a semi-decentralized approach that employs the use of InterPlanetary File System (IPFS) to facilitate secure and efficient data sharing. According to the proposed method, the data owner first uploads an encrypted file to IPFS, which is then divided into n secret sections known as hash codes. Next, the data owner sets the access permissions for the encrypted file by specifying seven
364
T. Rahman et al.
access rights. This unique approach ensures that the data remains protected and can only be accessed by authorized parties. The authors of the study investigated an Ethereum public blockchain framework and found that it supports on-chain and offchain transactions, cloud deployment, pseudonymity, access control, and consensus. However, the study did not address the limitations of the framework or the issue of interoperability with other blockchain systems [9]. An evaluation of various blockchain platforms took place, leading to the conclusion that Hyperledger Fabric was the most suitable platform for their research objectives. The paper highlights the platform’s strong privacy features, particularly its role-based access mechanism, which allows for the regulation of user access to data and transactions based on their designated role within the system. Additionally, Hyperledger Fabric utilizes a permissioned network, ensuring that only authorized nodes can participate in the network and access data. However, the study did not investigate the cloud deployment functionality of the framework or provide an extensive analysis of their system’s security [10] . Introduction of a blockchain-powered platform named UniverCert for certificate verification is explored. This platform is built on the Ethereum blockchain technology in its consortium form, and its stakeholders include higher education institutions, governments, law enforcement agencies, and employers. The RestAPI channel is used for accessing the platform, providing users with convenient access to the platform’s features. However, the authors did not address the issue of how their proposed system would handle situations where graduates may desire to keep their personal information private. Further exploration and discussion of these limitations could offer valuable insights for future research in this field [11]. The paper follows an extensive discussion of the enhancement of digital document security through the implementation of timestamping and digital signatures. The digital signature comprises four fundamental components, namely, hash code, public key, private key, and timestamp. The university provides the student with both a printed copy of their educational certificate and a digitally signed document. Nonetheless, the authors did not investigate the on-chain, off-chain, cloud deployment, and consensus features of the framework or explore the issue of interoperability with other blockchain systems [12]. Incorporating an additional accrediting body into the certification verification process is a crucial step toward ensuring the authenticity of universities authorized to issue and verify certificates. This feature adds an extra layer of validation, increasing the system’s security and trustworthiness. To ensure the confidentiality and security of data, the AES encryption algorithm was employed. The system also allows for the submission and verification of multiple academic certificates simultaneously, streamlining the verification process and enhancing its efficiency, as highlighted by [13]. The authors observed that the framework supports on-chain transactions, pseudonymity, and access control. However, they did not delve into the off-chain and consensus features of the framework or evaluate the performance of their system concerning latency and throughput. Utilization of the Go implementation of Ethereum, commonly referred to as GETH is used to establish a blockchain network that stores certificates. The study’s reliability
Verifi-Chain: A Credentials Verifier Using Blockchain and IPFS
365
tests demonstrated that the system could handle roughly 200 transactions in eight seconds, indicating its effectiveness. The scalability tests indicated that to become a node or miner on the blockchain network, one would need a storage capacity of 22.6 GB to support ten million blocks. Although the system was comprehensive in its scalability, it failed to assess the access control mechanism of the system [14]. MIT Media Lab uses Blockcerts to issue digital certificates to student groups, giving the certificate holders more control over their earned certificates. The certification process involves the issuer signing the digital certificate and storing its hash on the blockchain, with the recipient receiving the output. However, this process resulted in ownership issues and the need for a high level of trust. Furthermore, the MIT Media Lab has also released an app called Blockcerts Wallet, which stores information about diplomas and the key pairs of students [15] . In an effort to enhance the efficiency of transaction throughput, Kafka was integrated into the message queuing process. The utilization of this technology resulted in faster transaction processing times, surpassing other blockchain-based solutions. The study used Hyperledger Fabric, a well-known blockchain framework used for creating enterprise-level applications. The researchers’ findings are crucial as the faster processing times mean a boost in overall efficiency and decreased expenses for businesses. However, the study did not delve into the security of their system or investigate the cloud deployment feature of the framework [16]. Exploration of several applications took place in this particular paper. Firstly, the researchers analyzed three of the most widely used blockchain-based cryptocurrencies, which were Bitcoin, Litecoin, and Ethereum. Secondly, they explored the features and concerns surrounding the Bitcoin cryptocurrency. Finally, the team developed a graphical user interface for IPFS bandwidth analysis, which allows files to be stored on the network using Web3 JS and Smart Contracts. This research provided a better understanding of the blockchain technology and its different applications, revealing insights into the advantages and limitations of cryptocurrencies. The framework utilized pseudonymity to safeguard the privacy of supply chain participants and used on-chain transactions to document supply chain events and audit trails [17]. The articles reviewed cover a range of topics related to the use of blockchain technology for certificate verification, digital certificates, and smart contracts. The studies explore the benefits and limitations of blockchain technology, including its security, decentralization, and consensus algorithms. One of the key benefits of blockchain technology is its ability to provide increased security and trust in the verification process, particularly when multiple accrediting bodies are involved. Overall, the studies demonstrate the potential of blockchain technology to improve the efficiency and security of various processes, particularly those related to verification and certification. However, the authors also note the need for further research to explore the scalability, performance, and security of these blockchain-based solutions.
366
T. Rahman et al.
3 Proposed System This section outlines the proposed certificate verification system based on blockchain technology. To begin with, an applicant uploads their necessary credentials to the system. Once the credentials are uploaded, the admin gets a request from the applicant to verify the certificates. The admin reaches out to educational organizations to confirm the authenticity of these certificates. Once the credentials are verified, the admin notifies the users and uploads the certificates to IPFS. When students upload the certificates to an existing protocol node in IPFS, the data is chopped into smaller segments and undergoes a hashing algorithm. Once the hashing process is complete, it returns a hash key also known as the content identifier (CID). The CID serves as a fingerprint to uniquely identify the files. A new cryptographic hash (CID) is generated for each new upload of new data or previously uploaded data. This makes each upload to the network unique and resistant to security breaches or tampering. To enhance security, the hash key is encrypted before it is stored in the blockchain nodes. After the data is transmitted to the blockchain through IPFS, the issuer must approve the generation charges in MetaMask. Typically, the hash cannot be modified once it is stored in the blockchain. In the event of any data tampering, the system immediately alerts users. The applicant can now send the hash number to organizations in order to apply for any job. The companies can use the system to type in the hash to look for the applicants and find the legitimacy of their certificates.
3.1 Proposed Framework Figure 1 explains the workflow of the system. At first, the user login to the system and upload their credentials. These certificates are temporarily stored in the database. The admin receives a notification from the user regarding the verification request. Once the admin verifies the certificates, these are uploaded to the IPFS. The company, on the other hand, can search the applicants and request access to solely view the certificates. Furthermore, the certificate is impervious to any further adjustments or revisions requested by an intruder.
3.2 Stakeholders of the System In order to address the flaws in the current methodologies, this proposed system both stores certificates and acts as a verifier. The entire procedure is thoroughly explained in the sections that follow.
Verifi-Chain: A Credentials Verifier Using Blockchain and IPFS
367
Fig. 1 Workflow of the system
By utilizing distributed technologies such as IPFS and Ethereum smart contracts, our proposed solution simplifies the process of verifying the authenticity, integrity, and validity of a document. The stakeholders involved in the system are as follows: 1. Applicant: An applicant is able to upload their necessary credentials to the system. Once their credentials are verified, it is stored in the IPFS. Furthermore, applicants can accept access requests from companies to view their verified certificates. 2. Company: A company is able to search for an applicant and request access to applicants in order to view their credentials. 3. Admin: The admin receives verification requests from the applicants to confirm the originality of their certificates. The admin reaches out to educational organizations to confirm the validity of the certificates. 4. General User: A general user is able to browse through the system’s homepage and get acquainted with the functionality.
3.3 System Modules Blockchain can serve as the foundation for an entire project, with its immutable ledger system providing a secure method for storing data in blockchain nodes. The blockchain is composed of a sequence of blocks, each containing records of multiple transactions. Once a block is added to the chain, the information it contains cannot be modified, ensuring the data’s integrity is stored on the blockchain. This security is achieved through the use of cryptographic algorithms and a consensus mechanism that verifies transactions and maintains the ledger’s integrity. The technology is designed to be highly secure and resistant to tampering and hacking, making it well-suited for storing valuable and sensitive information. Furthermore, it has numer-
368
T. Rahman et al.
ous applications, including supply chain management, digital identity verification, voting systems, and more. Ethereum is an open-source blockchain platform with smart contract functionality and is powered by its native cryptocurrency, ether (ETH). Smart contracts are programs that are executed on the blockchain when a specific action is performed by a user. They can be written in various programming languages, with Solidity being a popular choice. On the other hand, the InterPlanetary File System (IPFS) is a decentralized file storage system that enables the creation of a peer-to-peer network among computers worldwide. Each file in the global namespace of IPFS is uniquely identified by content-addressing. IPFS relies on a network of nodes to store and distribute files, rather than depending on a central server. This decentralized architecture provides several benefits, including increased reliability, security, and speed. Whenever a file is uploaded or added to the IPFS network, it is assigned a unique identifier known as a hash. This hash serves as a digital fingerprint for the file and can be used to retrieve the file from any node in the network that has it stored. In summary, Ganache, MetaMask, Truffle, React, and Node JS are all tools that can be used together to create an effective blockchain platform. Ganache acts as a local server for the Ethereum blockchain, MetaMask is a cryptocurrency wallet that tracks transactions on the blockchain, and Truffle is a framework for compiling, linking, deploying, and managing smart contracts. React is a framework for developing user interfaces, while Node JS is used for serving frontend pages, and assets, and managing user authentication via JWT. Web3 is a dependency of Node JS that allows solidity code to be run on the front end.
Fig. 2 Applicant activity
Verifi-Chain: A Credentials Verifier Using Blockchain and IPFS
369
Fig. 3 Use case of company
3.4 Technical Diagram Figure 2 depicts the applicant’s use case. In order to utilize the system, the applicant must first register and log in. The applicant can upload their credentials in the system and request the admin for verification. While the verification process is still ongoing, the certificates will be temporarily stored in a local database. When a company requests access to view the credentials, the applicants have the option to review the access request and accept it accordingly. Figure 3 shows the use case for the company. When the firm receives the hash key from the applicant, it uses it to search the system for the specific applicant. Because the system has access tiers for further protection, the company must first request
370
T. Rahman et al.
Fig. 4 Activity of the admin
Fig. 5 Activity diagram for the general user
access before seeing the credentials. The company is able to examine the certificates after the applicant approves the request from their account. Figure 4 illustrates the administrator’s activity. The administrator’s job is to receive verification requests from applicants and contact certificate providers such as educational institutions to check the certificates’ legitimacy. They upload the confirmed results to the IPFS and notify the applicant once they get the results. Figure 5 illustrates the activity for the general user. The general user can simply browse through the pages of the system and view the functionalities.
4 Conclusion One of the primary advantages of blockchain is the ability to create immutable ledgers. Due to the ever-growing rate of certificate falsification during the job application process, this proposed solution aims to store and validate academic certificates using blockchain and IPFS. The system not only makes it easier to verify certificates, but it also reduces the risk of losing tangible certificates by storing them digitally in IPFS. While the original document is kept in IPFS, the hash associated with the certificate is kept in the blockchain. Future work could explore the integration of
Verifi-Chain: A Credentials Verifier Using Blockchain and IPFS
371
other blockchain platforms and file storage systems further to enhance the functionality and robustness of the system. Overall, the proposed solution has the potential to revolutionize the way academic certificates are verified, making the process more transparent, efficient, and secure for all stakeholders involved.
References 1. Paraide P, Owens K, Muke C, Clarkson P, Owens C (2023) Before and after independence: community schools, secondary schools and tertiary education, and making curricula our way. Mathematics education in a neocolonial Country: the case of Papua New Guinea. Springer, Cham, pp 119–148 2. Gopal, N., Prakash, V.V.: Survey on blockchain based digital certificate system. International Research Journal of Engineering and Technology (IRJET) 5(11) (2018) 3. Monrat AA, Schelén O, Andersson K (2019) A survey of blockchain from the perspectives of applications, challenges, and opportunities. IEEE Access 7:117134–117151 4. Zhang R, Xue R, Liu L (2019) Security and privacy on blockchain. ACM Comput Surv (CSUR) 52(3):1–34 5. Chatterjee R, Chatterjee R (2017) An overview of the emerging technology: blockchain. In: 2017 3rd International conference on computational intelligence and networks (CINE). IEEE, pp 126–127 6. Benet J (2014) Ipfs-content addressed, versioned, p2p file system. arXiv preprint arXiv:1407.3561 7. Kumar R,Tripathi R (2019) Implementation of distributed file storage and access framework using ipfs and blockchain. In: 2019 Fifth international conference on image information processing (ICIIP). IEEE, pp 246–251 8. Zheng Z, Xie S, Dai H-N, Chen X, Wang H (2018) Blockchain challenges and opportunities: a survey. Int J Web Grid Serv 14(4):352–375 9. Athanere S, Thakur R (2022) Blockchain based hierarchical semi-decentralized approach using ipfs for secure and efficient data sharing. J King Saud Univ-Comput Inf Sci 34(4):1523–1534 10. Saleh OS, Ghazali O, Rana ME (2020) Blockchain based framework for educational certificates verification. J Crit Rev 7(3):79–84 11. Shakan Y, Kumalakov B, Mutanov G, Mamykova Z, Kistaubayev Y (2021) Verification of university student and graduate data using blockchain technology. Int J Comput Commun Control 16(5) 12. Ghazali O, Saleh OS (2018) A graduation certificate verification model via utilization of the blockchain technology. J Telecommun Electron Comput Eng (JTEC) 10(3–2):29–34 13. Leka E, Selimi B (2021) Development and evaluation of blockchain based secure application for verification and validation of academic certificates. Ann Emerg Technol Comput (AETiC) 5(2):22–36 14. Faaroek SA, Panjaitan AS, Fauziah Z, Septiani N (2022) Design and build academic website with digital certificate storage using blockchain technology. IAIC Trans Sustain Digit Innovation (ITSDI) 3(2):175–184 15. Vidal F, Gouveia F, Soares C (2019) Analysis of blockchain technology for higher education. In: 2019 International conference on cyber-enabled distributed computing and knowledge discovery (CyberC). IEEE, pp 28–33 16. Liu D, Guo X (2019) Blockchain based storage and verification scheme of credible degree certificate. In: 2019 2nd International conference on safety produce informatization (IICSPI). IEEE, pp 350–352 17. Nouman M, Ullah K, Azam M (2021) Secure digital transactions in the education sector using blockchain. EAI Endorsed Trans Scalable Inf Syst 9(35)
IOT-Based Cost-Effective Solar Energy Monitoring System Abhishek Madankar, Minal Patil, and Shital Telrandhe
Abstract Using the technology for monitoring solar power generation has increased with the increase of solar power plant installations. This use of technology can greatly increase the monitoring, performance, and maintenance of solar plants. Generally, a standard meter only has a reading, i.e., solar power delivered into home, whereas the bidirectional meter has reading of the power delivered into home, received from the grid also the net consumption. Standard meter only shows the total meter reading but no analysis of the system and performance level. So this project is to monitor the energy with IOT-based device and this system can be modified to monitor through internet anytime anywhere and transmit all the data from smart meter to web. Numerous uses for this technology exist, including smart towns, microgrids, solar street lighting, and more which are examples. This period saw the fastest growth in the use of renewable energy in recorded history. In the suggested system, the online visualization of solar energy use is referred to as renewable energy. The proposed system displays the amount of energy consumed regularly by the connected load due to which it is very easy to manage energy consumption and eventually it will lead to saving of electricity and proper utilization of energy. Keywords Internet of things · Atmega 32p · Microcontroller · Raspberry Pi · MODBUS · Energy
A. Madankar (B) · M. Patil Department of ET Engineering, Y. C. College of Engineering, Nagpur, India e-mail: [email protected] S. Telrandhe Datta Meghe Institute of Higher Education and Research, Swangi, Wardha, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_25
373
374
A. Madankar et al.
1 Introduction The (IOT) internet of things is broadly spread in today’s era. It has many applications which are already implemented worldwide. The IOT is a system which transfers data over a network. This technology is proving its best in our day-to-day life. It supports many applications such as smart home, smart cities, smart mirrors, smart phones, solar cities. The word “smart” plays a crucial role in the vision of IOT and it gives rise to a smart world. Renewable energy is rapidly growing source of electricity which includes solar photovoltaic and wind which are in more demand. Utilization of renewable energy technologies reduces the environmental impact. The aim of our paper is to monitor the energy and the power ratings on the website. Domestic solar plants are implementing at different locations and these are in most demand. Solar plants have advantage to sale unused energy to power grid which can be utilized by other consumers. Solar plant has meters to monitor energy generated, exported, and imported. Standard meter only shows the total meter reading but no analysis of the system and performance level with IOT-based devices. This system can be modified to monitor through internet anytime anywhere. Problem Statement The proposed system is advancing the term internet of things (IOT) for industrial remote energy parameter monitoring systems. Auditing is needed by industries to reduce superfluous energy consumption, as well as the precise specifications of requirements for every machine or device. Due to their benefits over conventional sensing, wireless sensor nodes are a suitable system platform for remote monitoring. Identifying the final uses of energy in industry is the goal of an energy audit, which also serves as a feasibility assessment for the implementation of an energy management program. Throughout the energy program’s several phases, the audit methods can be expanded as necessary, with the application of each subsequent phase producing additional knowledge about energy use and potential for increasing efficiency of energy. Objectives As The following points are briefly discussed in order to achieve its goal: • To create a microcontroller-based, remotely accessible data acquisition system with internet access for monitoring energy parameter values. • To implement centralized data management and collection software. • To create a web service for web applications and microcontrollers to communicate with one another. • To create a graphical data analysis tool that is internet-ready.
IOT-Based Cost-Effective Solar Energy Monitoring System
375
2 Literature Survey The IEEE 802.3 protocol enables the user to connect wirelessly to the internet. WiFi depends upon the distance between transmitter and receiver. Higher the distance, lower will be the coverage area; strength of the signal is inversely related with the distance. The proposed model provides the backup power supply to the Wi-Fi device even in blackout of main supply [1]. With the help of Arduino Uno board and Node MCU, we can connect wirelessly to any system [2]. The proposed system uses Raspberry Pi as a microcontroller-based system, MCP 3008 as an analog-to-digital convertor for designing solar metering system [3]. The consumption of electricity can be reduced by using cloud computing for surveilling various home appliances like fan, bulb [4]. Wind turbines and solar panel arrays make up the renewable energy generation system that is instantaneously monitored. The monitoring system is based on measurements of the voltage and current of each renewable energy source [5–8]. The designed sensor circuits are used to measure the important quantities, and a Microchip 18F4450 microcontroller handles the processing to point out the problem with the non-intrusive load surveilling technique of load disaggregation into discrete appliances. When a few local renewable energy-based producers are linked to the same grid, they could not be compatible with varying time-varying loads. Recent initiatives to develop a wireless-based remote monitoring system for Malawi’s renewable energy installations are reported. The main intent of the system was to create a data gathering system that was reasonable and could uninterruptedly show distant energy yields and performance metrics. The project’s output gives the remote site’s residents immediate access to the electricity generated there through the use of wireless sensor boards and text message (SMS) delivery through cellular network [9–17].
3 Methodology The suggested system consists of a Monitoring and Control Unit, a solar panel, a controller, a website, software, a database, and smartphones (MCU). The MCU is in charge of watching the home appliances to see if the electricity is turned on. When the software program is activated, it starts a Monitoring and Control Unit to monitor the information it receives from the controller, which keeps a current database of the area’s power use. As a result, the amount of electricity that was wasted by each home appliance may be determined, and a bill for that electricity is also produced. Every 60 min, the Monitoring and Control Unit retrieves data from the controller and stores it in non-volatile memory. It maintains the information in the cloud, so users may access their home’s electricity usage statistics from any location. Smart phones, a database, a controller, a website, software, a solar panel, and it also regulates the solar panel make up the suggested system. Utilizing the solar energy conversion technology through solar panels. There is no sunlight available during rainy seasons
376
A. Madankar et al.
and under certain atmospheric conditions. Battery is therefore used to store energy for later use. A lead acid battery is utilized. It is used to give power to the circuit. The secret to finding all the datasets of household appliances is software application. The users gave their consent to this application when they registered for it using their home address. The customers can monitor each home appliance’s power usage and waste, as well as the total cost of each device’s bill. Consequently, this software is useful in reducing energy waste. Additionally, because the controller regulates each device to preserve its static voltage, it is utilized to lower accidents. Additionally, because the controller regulates each device to preserve its static voltage, it is utilized to lower accidents. First off, renting is simply the act of allowing someone else to use your property for a while in exchange for a fee. Therefore, renting solar energy entails providing solar power to people for a specific period of time. Block diagram of the proposed system is shown in Fig. 1.
Fig. 1 Block diagram of the proposed system
IOT-Based Cost-Effective Solar Energy Monitoring System
377
The owner must receive a request from the required user, and whoever owns can only rent if the demanded user is a member of that group. The primary benefits are the ability to reduce energy waste, cheaper electricity costs, and the fact that paying for rented power lowers each user’s bill. EMS system block diagram is shown in Fig. 2. Energy meter has been linked to a direct electric line as shown. The system requires an SMPS power supply because it is microcontroller-based. Since the system runs continuously, SMPS power is required to convert 240 V into 5 V, which allows for better heat prevention. Industrial energy meters need to be converted into UART format since they provide a variety of data in RS 485 format. We need a GSM modem or an RF transmitter to interface with the microcontroller over UART since the gadget has two communication options. Let us look at the proposed system’s functionality and working method before moving on to the internal clock of the IOT device that we are deliberating. The suggested IOT device is made up of the following blocks, the first of which is used to communicate with a smart meter. Because smart meters use the industrial MODBUS or RS485 protocol, they must be converted to RS232 before they can be understood by a microcontroller. Having a conversation with the web, the system will need a GPRS modem, which handles communication with the internet. Additionally, because this device is intended for use in an environment where a reliable power source is crucial, we also need of a SMPS power supply module. Combining all of these will create an IOT-based EMS. Process flow of energy is shown in Fig. 3.
Fig. 2 Block diagram of the EMS module
378
A. Madankar et al.
Fig. 3 Process flow of energy
Before discussing the suggested method in more depth, let us first clarify how regular metering and industrial smart meters work. The diagram above shows how an MSEB meter is typically implemented. These meters can only display incremental energy consumption using a few digits; we cannot see exact load information like power factor, voltage, current, etc., nor can we see historical consumption information like daily usage, daytime usage, nighttime usage, etc. To address this issue, we have smart meters, which can display many parameters and store data for almost a month. However, because these meters may communicate these data to other devices, it is not practical to record all readings manually on a regular basis. The next step is to create an IOT-based device that can read every meter’s parameter using MODBUS protocols and upload the data to an IOT server, allowing users to view the energy parameters’ usage visually via a web or mobile application. Thus, developing an IOT-based device, deploying an IOT server, and creating a web- based data analysis program constitute the precise scope of the task.
IOT-Based Cost-Effective Solar Energy Monitoring System
379
4 Result These are the graph representation of net power consumption, imported and exported of current month, a week, and the current day. In Fig. 6, the net meter trend is given for different time slots. As it observed that it ranges from 0.05 to 0.15 (in kw) consumption for time period 12.36 pm to 15.11 pm. The validated results are shown in Figs. 4, 5, 6, 7, 8, 9, and 10.
Fig. 4 Graph representation of export trend
380
Fig. 5 Graph representation of import trend
Fig. 6 Graph representation of net meter trend
A. Madankar et al.
IOT-Based Cost-Effective Solar Energy Monitoring System
Fig. 7 Showing real-time generation
381
382
Fig. 8 Live energy status
Fig. 9 Design implementation
A. Madankar et al.
IOT-Based Cost-Effective Solar Energy Monitoring System
383
Fig. 10 Actual implementation
5 Conclusion The proposed system is advancing the term internet of things (IOT) for an industrial remote energy parameter monitoring system to better describe this advancement. Energy auditing is necessary for industries to reduce unnecessary energy use and to be aware of the precise specifications needed for each machine or device. Due to their benefits over conventional sensing, wireless sensor nodes are a suitable platform for remote monitoring systems. Identifying the final uses of energy in industry is the goal of an energy audit, which also serves as a feasibility assessment for the adoption of a scheme for energy management. Throughout the various phases of the energy program, the audit methods can be expanded as needed, with the implementation of each new phase producing more data on energy use and more opportunity to increase energy efficiency.
References 1. Dolse H, Masram B, Ghorse A, Wankar A (2022) A WI-FI router with power backup and range extender. In: 2022 IEEE Delhi section conference delcon 2. Patil M, Chakole V, Chetepawad K (2020) IoT based economic smart vehicle parking system. In: Proceedings of the 3rd international conference on intelligent sustainable systems, ICISS 2020, 2020, pp 1337–1340, 9315919
384
A. Madankar et al.
3. Gupta A, Jain R (2017) Real time remote solar monitoring system. In: 2017 3rd International conference on advances in computing, communication and automation (ICACCA) 4. Jayapandian N, Rahman AMJMZ, Poornima U (2015) Efficient online solar energy monitoring and electricity sharing in home using cloud system. In: 2015 Online international conference on green engineering and technologies (IC-GET). IEEE 5. Kabalci E, Gorgun A, Kabalci Y (2013) Design and implementation of a renewable energy monitoring system. In: 2013 Fourth international conference on power engineering, energy and electrical drives (POWERENG). IEEE 6. KabalcÕ E, Batta F, Battal O (2012) Modelling of a hybrid renewable energy conversion system. In: 3rd International conference on nuclear and renewable energy resources (NURER 2012), pp 1–6, May 20–23, 2012, Istanbul, Turkey 7. Rashdi A, Malik R (2012) Remote energy monitoring, profiling and control through GSM network. In: 2012 International conference on innovations in information technology (IIT). IEEE, 2012 8. Byun J, Hong I, Kang B, Park S (2011) A smart energy distribution and management system for renewable energy distribution and contextaware services based on user patterns and load forecasting. IEEE Trans Consum Electron 57(2):436–444 9. Dorle SS, Vidhale B, Chakole M (2011) Evaluation of multipath, unipath and hybrid routing protocols for vehicular ad hoc networks. In: 2011 Fourth international conference on emerging trends in engineering & technology, Port Louis, Mauritius, 2011, pp 311–316. https://doi.org/ 10.1109/ICETET.2011.66 10. Çolak I, KabalcÕ E, Bal G (2011) Parallel DC-AC conversion system based on separate solar farms with MPPT control. In: IEEE 8th international conference on power electronics, pp 1469–1475, May 30-June 3 2011, Jeju Korea 11. Nehrir MH, Wang C, Strunz K, Aki H, Ramakumar R, Bing J, Miao Z, Salameh Z (2011) A review of hybrid renewable/alternative energy systems for electric power generation: configurations, control, and applications. IEEE Trans Sustain Energy 2(4):392–403 12. Lee PK, Lai LL (2009) A practical approach of smart metering in remote monitoring of renewable energy applications. In: Power and energy society general meeting, 2009. IEEE PES ‘09 13. Zhu J, Pecen R (2008) A novel automatic utility data collection system using IEEE 802.15.4compliant wireless mesh networks. In: Proceedings of the 2008 IAJC-IJME international conference 14. Tan HGR, Lee CH, Mok VH (2007) Automatic power meter reading system using GSM network. IEEE RPS 15. Tan SY, Moghawemi M (2002) PIC based automatic meter reading and control over the low voltage distribution network. In: 2002 Student conference on research and development proceedings, Shah Alam, Malaysia 16. Treytl A, Sauter T, Bumiller G (2004) Real-time energy management over power-lines and internet. In: The proceedings of the 8th international symposium on power line communications and its applications 17. Keote M, Choudhari T, Alone T, Ahmad A (2022) Energy-Efficient automated guided vehicle for warehouse management system. In: Electronic systems and intelligent computing. Lecture notes in electrical engineering book series (LNEE, vol 860), pp 289–301
Investigation of Assessment Methodologies in Information Security Risk Management C. Rajathi and P. Rukmani
Abstract Information security risk management is a crucial component of every organization’s security plan. It comprises the identification, assessment, and prioritization of potential security risks to an organization’s information assets as well as the implementation of protective measures or risk management plans. For this method to work, a detailed understanding of an organization’s assets, threats, vulnerabilities, and potential impacts of security incidents is required. Effective information security risk management ensures business continuity, safeguards critical information assets, and prevents data breaches. In this study, the key concepts, practices, and tools of information security risk management are discussed. It also looks at the most effective strategies to set up a successful risk management program and identifies emerging trends. Keywords Information security · Risk · Risk management · Risk assessment
1 Introduction Information security is the process of protecting information and mitigating the risk that allows the organization’s information asset to be compromised by any means, also known as InfoSec. Information from unauthorized access, disclosure, disruption, modification, corruption, deletion, recording, or devaluation must be prevented. The aim of the organization’s information security program is to protect the information asset from security threats, by applying actions intended to cut the adverse impact of it [1]. Risk management is the process of identifying the undesired events before they appear. Risk management allows business owners to select their business without C. Rajathi · P. Rukmani (B) School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India e-mail: [email protected] C. Rajathi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_26
385
386
C. Rajathi and P. Rukmani
risk by setting the rules to lessen the effect of risk on an organization [2]. A common definition of risk is “uncertainty”. Uncertain events can have a positive and negative impact on the organization. The positive impact is called an opportunity in which the risk owner tries to maximize the opportunity to help the organization. The negative impact can either be removed or reduced to an acceptable level using risk management.
2 Information Security Risk Management The process of managing risk associated with the information is Information Security Risk Management (ISRM). The procedure includes identifying, assessing, and treating risk to ensure information assurance of an organization. This process aims to identify the undesired events that remove or reduce the impact level of an organization’s assets [3].
2.1 Stages of ISRM Stages in ISRM process in shown in Fig. 1. Identification: Identify assets: The Crown Jewel of an organization either the data or system and other things considered as an asset are identified. An asset is an item considered valuable by the organization. For e.g., which asset would have a major impact on the organization if its confidentiality, integrity, or availability is compromised?
Fig. 1 ISRM process. Source Infosecinstitute-Ch-2
Investigation of Assessment Methodologies in Information Security …
387
Identify vulnerabilities: Weakness or deficiency in an organization could result in information being compromised. The process of identifying system level or any other vulnerabilities will be allowing the asset to risk. Identify threats: A threat is a possible hazard that might exploit a vulnerability to break security and cause damage to an asset. The possible reasons for assets or information becoming compromised are identified. Identify controls: Controls are deployed already to protect an organization’s assets. A control is directly applied to an identified vulnerability or threat by providing a remedy or by decreasing the effect of the risk to an asset. The controls may quarterly access the review process and terminate unauthorized access found during the review. Assessment: This is the process of defining the risk by merging gathered information about the asset, vulnerabilities, threats, and possible controls. Risk assessment is outlined by different methodologies. The most probable equation for the risk assessment is [4]: Risk = (threat ∗ vulnerability ∗ asset value) − security controls
(1)
Treatment: If the assessment found a risk, then the organization must act on the risk. Fixing: The proper control measures that will completely or nearly fix the riskcausing factors are fixing. E.g., applying security patches for a vulnerable server stored critical assets of an organization. Mitigating: It is the process that reduces the likelihood and/or impact of the risk but not completely. E.g., implementing a firewall instead of applying a security patch for a server where the critical assets are stored. Transferring: The risk is transferred to another entity to recover from the loss experienced by an organization. For e.g., obtaining insurance is a kind of supplement risk remedy and mitigation but not a replacement in total. Acceptance: This is an appropriate case where the risk is clearly low, and the incurred cost for fixing that risk is more than clear. E.g., if there is no critical information on a server or performing successful exploitation of the vulnerability on the server is difficult, then it is better to decide not to spend time or resources to fix that vulnerability. Risk avoidance: Removing all harmful factors that affect the organization’s assets is risk avoidance. E.g., migrating the sensitive data from the server, where the operating system is present in the state of expiration. Communication: The risk and the impact must be communicated to the project owner to decide to treat the risk. Stakeholders need to choose whether to treat the risk or not. Responsibility and accountability are clearly defined to the person and the team who handle the risk in an organization. With the right person at right time, the process must be engaged.
388
C. Rajathi and P. Rukmani
Rinse and Repeat: The controls that are implemented in the treatment phase need continuous monitoring. The applied controls may change in time by opening ports, changing codes, and other reasons that cause control breakdown months or years after implementation.
2.2 Ownership To make sure the smooth functioning of information security risk management, defining the roles and responsibilities of each one participating in the ISRM process is mandatory. There are many stakeholders in the ISRM process, and each of them is assigned different responsibilities. Project Sponsor: Project sponsor is the person internally responsible for the project’s success, typically who may have the signing authority for the project [1]. Process Owners: The senior security officer of an organization is a chief security officer who owns the ISRM program and is responsible to ensure that the risk assessment is properly scoped and performed by appropriate professionals [1]. The process owner minds out the team assigned to carry out the risk management process with expertise in the concept of risk assessment, a measure of expertise and acts in a professional way. Risk Owners: The project owner allocates the budget to the risk owner and the risk owns to him. The risk owner’s role is to risk respond, monitor, manage, and control the risk articulated in a system. Risk owners are concerned with several factors, including proper understating of the business unit, accurate identification of risk, clarity, and cost of recommendations [1].
2.3 Information Security Assessment Type The security assessment is important to examine and find the unknown weakness present in the system. The primary goal of any organization conducting an assessment is to review and audit the security controls and fix the loopholes in the control if exist. There are different types of assessments performed like vulnerability testing, penetration testing, Redbox team assessment, security audit, black box assessment, risk assessment, threat assessment, threat modeling, and bug bounty [5]. Vulnerability assessment: A technical process designed to identify, quantify, and prioritize the vulnerability in a system is vulnerability assessment. The goal of the vulnerability assessment is to possibly fix the problem efficiently. These assessments are continuous monitoring processes in which simple revisions need an assessment again.
Investigation of Assessment Methodologies in Information Security …
389
Penetration Testing: Penetration testing will check for vulnerabilities in the system that could be exploited by an attacker. Penetration testing is often confused with vulnerability assessment; penetration testing is designed to evaluate system security. Security Audit: A technical or document-based audit report focuses on the existing configuration which achieves preferred standards. Conformity of security validates security. Organizations use audits to demonstrate compliance. Risk Assessment: An analysis of current security controls protects the organization’s assets and determines the probability of loss to those assets [1]. Risk assessment uses the probability of risk and the impact of the risk using quantitative and qualitative risk assessment models. Threat modeling and risk assessment are similar in a way to reduce the risk to an acceptable level [6]. Threat Assessment: A review which relates a physical attack to technology comes under threat assessment. Determining the way, a threat is made or the way it is detected, is the primary process of threat assessment. Security assessments are periodic exercises that check the organization’s security preparedness. They check for vulnerabilities in the system and recommend the necessary steps to lower the risk in an organization. Every industry needs a different type of assessment based on its asset value and the impact caused if the asset is compromised by any means.
2.4 Risk Management Process The risk management process consists of some basic steps; sometimes the jargon used to denote the process may vary depending on the industry. Steps in risk assessment is shown in Fig. 2. The following five steps combine and deliver an effective risk management process to an organization [7]. 1. 2. 3. 4. 5.
Identification of risk. Analysis or assessment of risk. Evaluating or ranking of risk. Treating the risk. Monitor and review the risk.
2.5 Threats A threat is an undesired event that may loss, disclose, or damage the organization’s assets. Threats to an organization may be in any form, such as errors, omissions, fraud, theft, sabotage, loss of physical and infrastructure support, espionage, malicious code, and disclosure. An authorized employee makes a mistake while entering data during the development phase causing the lack of integrity of the system. The system confidentiality is breached despite the disclosure of the information being
390
C. Rajathi and P. Rukmani
Fig. 2 Steps in risk assessment
intentional or unintentional. This can happen by the entity called threat agent. Possible threat agents for information security are nature, employees, malicious hackers, and industrial and foreign government spies [1].
2.6 Vulnerabilities A vulnerability is a flaw in an existing system that possibly allows the threat agent to exploit and gain access to the organization’s assets. Vulnerabilities are important elements of security risk assessment because they determine the residual risk. Vulnerabilities in information security systems are categorized into physical, technical, and administrative areas. If the threat is able to exploit the vulnerability, then the originating asset faces a loss called security risk. Quantitative and qualitative approaches are used to derive the security risk in an organization [1].
Investigation of Assessment Methodologies in Information Security …
391
3 Related Works 3.1 Risk Assessment Based on Analytical and Statistical Methods Research [8] concentrated on the analysis and evaluation phases of risk management for the IT environment. Statistical designs enhance the quantitative aspect of risk assessment and make it smart, precise, and efficient. A statistical method of Plackett– Burman (PB) was used to select the most important factor by screening; it overcomes the issues faced in the parsimony principle [9]. PB method results in N values for N + 1 experiments, to find the control for N + 1 value PB with fold over which adds dummy controls N times in matrix [8]. A critical control is selected based on the threat and cost determination of an enterprise [9]. It determines the control configuration and the changes when critical control is identified. The impact of the changing control configuration was figured out and ranked by PB design. The aim of the work was to improve the quality and efficiency of the risk assessment statistical rigor [9]. Subjectivity and inaccuracy of traditional methods of vulnerability detection and network traffic analysis give non-quantitative results. To overcome these issues, statistical method Hidden Markov Model (HMM) was used to assess the system risk [10]. This quantitative method ensures the objectivity and accuracy of the system risk level. To quantify the trustworthiness of the system risk assessment, reliability classification methods such as code-based, software behavior-based, and software source-based methods were used. Directly assuming the results of software reliability analysis is whitelisted, then it can enter the system. The range of illegal operation records was stored and processed in HMM training, prediction module which offers the system a control in risk level. HMM, training, and prediction modules use Baum– Welch and Viterbi algorithms, respectively, and aim to resolve the likelihood of the system risk based on software performance [10]. A structures’ technique Analytic Hierarchy Process (AHP) was used to evaluate the risk which can organize and analyze the complex design which helps to solve the group decision-making problem under a fuzzy environment. To access the legal risk in high-tech Small-to-Medium Enterprises (SMEs), a comprehensive evaluation system was established with nineteen indexes. AHP-FCE will evaluate the legal risk of the organization. A gray theory which identifies known and unknown parts of the information is the fundamental used in Fuzzy Comprehensive Evaluation (FCE). By using this classical decision theory, SMEs can find a threat, vulnerability, and probability of risks and mitigation measures [11]. Law of information security risk management from the perspective of a beginner used an analytical lens of security-related stress. Security is an important part of any organization that creates security-related stress (SRS) if the person involved in ISRM lacks security skills. A case study approach was used to identify the stressor and stress inhibitor from the obtained data. Practice information security sorts the implication difficulties and finds the supporting tools for the ISRM process. Existing work dealt
392
C. Rajathi and P. Rukmani
with stress faced by ICT professionals instead of how stress is created by technology (i.e., Technostress). Based on the information system context [12], the dimension of the stress is categorized into Overload, Complexity, and Uncertainty. The case study approach was inspired by the protocol used in case study, Research: Design and Method [13]. Two public sectors Alpha and Beta participated in the case study, that provides internal access to the ISRM documentation. Observations, interviews, policies, and procedures of an organization were taken into consideration for data collection. The collected data were then categorized by the dimension discussed in [12], and the empirical method derives stressors and stress inhibitors [14]. Using statistical method for the risk assessment enhances the quantitative results. Using analytical designs/methods for risk evaluation results differently for different industries. So, maintaining the accuracy of the evaluation result is important. The Delphi method was applied [6] for the statistical aggregation of experts considered for the final evaluation. Safeguarding the information asset of an organization is important, and it requires optimal mitigation steps. Determining clear and accurate mitigation and information security risk assessment is appropriate. A ranking critical asset of an organization is an important task to reduce risk and its impact. The framework with OCTAVE allegro method with Analytic Hierarchy Process and Simple Additive Weighting was used to rank priority in information profiling. The combination of this decision support system gives flexibility to OCTAVE and offers a better mitigation budget plan [15]. Advantages of using Statistical and Analytical methods for Risk Assessment: Using statistical and analytical methods for risk assessment provides more comprehensive and effective solutions for risk management. By the strength of statistical and analytical methods, organizations can achieve better risk assessment and make a more informed decision about managing the risk.
3.2 Risk Assessment Based on Iterative Methods Iterative methods use the first guess to generate the sequence to make the results. Risk assessment uses iterative methods widely to analyze the risk and its impacts which affect the organization’s asset. A hierarchical approach for online risk assessment is used to identify the cyberthreats and vulnerabilities in local level Services’ Operators and Digital Service Providers. National-Level Risk Assessment is important among all institutions, performed either through formalize central framework model or otherwise in a decentralized model. In a decentralized or hierarchical online scheme, local entities prepare their own assessments to coordinate the assessments and evaluate the overall risks described by National Computer Security Incident Response Team. When global resources are required, then the risk assessment with a hierarchical approach extends to specified risk mitigation actions and limits the action in the most efficient way [16].
Investigation of Assessment Methodologies in Information Security …
393
In iterative software development, technical environment undergoes changes that increase the workload in each phase of iteration identification, assessing and managing the risk. The changes accumulated in technical debt implicitly add cost to do rework, by selecting an easy solution instead of selecting an option which lasts longer. For an efficient software development process, technical debt needs to be extended with secured debt. A security debt portfolio requires more features of risk probability and its impact. Security debt management can increase efficiency and predictability over future workloads and secure software with the best quality [17]. Security issues and vulnerabilities in e-learning platforms need a defensive method to secure web applications from flaws, which is important for both industry and academia. Vulnerability of the system can compromise in any means of CIA owing to the nature of e-learning platforms, such as user-friendliness, 24/7 availability, and the pervasive nature. E-learning platforms are usually selected by the conditions of open source and the extent of usage. The security model with a two-fold holistic view is proposed to secure the e-learning platform from risks. Encapsulating the entire system in the security cover highlights the vulnerability extended by hierarchical and distributed approaches. To test the vulnerability of the platform scanner NetSparker community edition (2014) and Acunetix (2014) were chosen. Constant monitoring and stringent actions against the attackers keep the platform at minimal risk [18]. Risk management not only allows the organization to identify the risk ahead, but it also allows the business owner to react accordingly and tries to minimize or prevent losses. Eliminating all risks from an organization is an enormous process and there is no need to eliminate all the risks that exist; some risks called residual risks are acceptable. Residual risk is an accepted risk, and it could be either qualitative or quantitative including threats, vulnerabilities, asset value, and control gaps. The quantifiable measure derives from the calculation of asset absorption instead of likelihood calculation, that the optional risk and accepted risk are greater than the asset value determined in the Economic Acceptable Risk Assessment model. The security investment is calculated based on the accepted risk of business investment and minimizes the effort to calculate the residual risk by the model [19]. Advantages of using Iterative methods for Risk Assessment: Iterative method provides a flexible and dynamic approach to risk assessment and management, allowing the organization to continuously improve their risk management strategies. From the iterative method, an organization can use detailed insights and perform evaluations to reduce the potential impact of the risks.
3.3 Risk Assessment Based on Optimization Techniques and Novel Approach Attack model independency, disclosure, and attribute generalizability are the challenges met by the inference attack. F-PAD offers countermeasures to the corresponding disclosure risk. The risk estimation of the risk is with the help of a basket
394
C. Rajathi and P. Rukmani
of typical attack models for user privacy disclosure problems, using novel attackbasket-based optimization methods [20]. General Bayesian inference attack approach is designed to simulate attacks. F-PAD pays to the real-world privacy protection mechanism from the features of primary approaches to ensure privacy comprised individual self-protection, industry self-regulation, and government regulation. The framework is quite practical for a variety of aspects including users, OSNs, governments, organizations, and enterprises. The risk levels are categorized into: Guarded, slight, Moderate, Elevated, and Severe [20]. For the risk management in e-commerce merchant system, the existing model is combined with various Machine Learning models to set rules in the decision layer. These rules are fixed and depend on manual updates. Business goals need multiple optimal rule sets for different tradeoffs generated using automated datadriven optimization methods. This allows the business owner to make informed decisions between optimized rule sets for changing business needs and risks. The meta-heuristic method is used for optimization to evaluate the fairness of function; here, Genetic Algorithm (GA) plays as the meta-heuristic method. GA is a generate and test algorithm; every population of offspring generated from the parent uses the operators’ mutation and/or crossover. At each level of offspring, fitness was tested. To optimize more than one objective at a time, Multi-objective Genetic Algorithm (MOGA) was used. MOGA accepts the rule set in a numerical array, to change the risk management rules into chromosome coding MVEL crisp rules along with variable indicator $-a sign used. To run the optimization faster with minimum computational resources, speed and memory consumption must be increased. Pertaining to these enhancements allows MORO to find the healthy rule set in a short span. [21]. Most enterprises that manage cybersecurity rely on information security standards and guidelines such as CIS Critical Security Control (CSC) and NIST. The number of security controls for various risk factors and cost for selecting an appropriate set of security controls for the task are difficult. Optimizing Return on Investment (ROI) is a complex and error-prone task in this environment. An appropriate costeffective critical security control must be selected to mitigate a risk-privileged novel and optimization technique. An automated method to decide what security control needs and where to deploy in the enterprise is called Cyber Defense Matrix (CDM). A tool CyberARM which formulates the CDM-decision-making problem using SMT constraints computes the true form of planning to fulfill cybersecurity ROI with surrounded residual risk and budget constraints. The aim is to develop an automated decision-making framework to assess and manage cyber-risk with a cost-effective mitigation plan. CyberARM works with the multi-dimensional model integrated with the frameworks, NIST cybersecurity framework, CIS CSC, and cyber-kill chain along with CDM. Evaluation of CyberARM is done with the heuristic’s algorithms’ model reduction and decomposition to show the performance and scalability. Performance evaluation for the tool was done with different asset–set generating plans for 15,000 assets in 10 min [22]. Optimization algorithms are thought of as minimizing or maximizing the aim of the function. They are also known as search algorithms, whereas searching fixes best to make the goal. Tradition risk management systems do not allow to cover
Investigation of Assessment Methodologies in Information Security …
395
the range of risks faced by organizations [23]. Enterprise Risk Management is a strategy which replaced the traditional Risk Management approach by defining Risk Management as part of the organization [24]. Classification of risk describes the risk assessment procedure necessary for developing management decisions needed to reduce the impact of risk institutions. The classification option allows for identifying risks more accurately [25]. Advantages of using optimization methods for Risk Assessment: Optimization techniques can give businesses a strong tool for managing and accessing risk. Organizations may prioritize resource allocation, cut expenses, and ultimately lessen the potential impact of risks on their business by determining the most efficient strategies to reduce risks.
3.4 Risk Assessment Based on Simulation Techniques Stochastic simulation technique is to analyze project data and identify factors which impact the productivity plus team schedule objective for globally distributed software projects. Stochastic simulation is a simulation method which has variables that can change randomly (stochastic) with individual probabilities. Random variables are generated and inserted into the model, and then the output is recorded. Due to money constraints or budget restrictions managing worldwide, a distributed project is a difficult task. To make decisions in the early software development process, the simulation work is applied by project management groups. Risk assessment on distributed software projects with random simulation provides quantitative results. To evaluate global software development factors like communication among sites and domain knowledge and to predict schedule risk, a tool called Tangram-II is applied [26]. To help an organization improve risk management strategies and invest in cybersecurity, a measure is required to quantify the risk and its impact. Quantiles are the cut points dividing samples in the observation in the same way. It is a measure and method used in the financial sector to quantify risk. A measure was proposed to estimate cyber-risks called Value at Risk (VaR) and Tail Value at Risk (TVaR) used in financial sectors. Cyber-VaR is a probability of loss in cyberattacks. TVaR is associated with VaR which overcomes the shortcoming of VaR measure. Privacy rights’ clearinghouse data breach information obtained from “chronology of data breach” is used to calculate the risk measures [27]. Historical risk is classified as operational risk, financial risk, strategic risk, organizational risk, and business planning and reporting risk. Operational risk is a general risk arising from the poor internal process, systems, and people internal or external. Cyber-risk falls into the category of operational risk; the way to protect the cyber-risk is self-protection, self-insurance, and cyber-insurance. An approach was proposed to estimate both VaR and TVaR using data breach information [27] with different
396
C. Rajathi and P. Rukmani
financial estimation methods. Tested methods in [28] are Empirical, Historical simulation, and Monte-Carlo simulation, dealing with the two scenarios, type BSF (41 breach events) and type MED (3644 breach events). The sets were modeled with loss frequency and loss severity using negative binomial distribution and log-skew normal distribution, and the results were tabled [28]. Advantages of using Simulation methods for Risk Assessment: Organizations may find that simulation techniques are an effective tool for managing and assessing risk. Organizations may better comprehend the possible impact of risks and pinpoint the most efficient strategies to manage those risks by modeling realistic scenarios and giving quantitative data.
3.5 Risk Assessment Based on Empirical Methods The quantitative risk model determines which risk to be treated to the known impact of the risk on an organization. Generally, identifying and mitigating risk are often an intuitive and qualitative process. For quantitative analysis of risk assessment, empirical risk assessment is used as a risk impact estimator which calculates the loss expectancy of each vulnerability. Developers are expected to identify, prioritize, and mitigate risk during Software Development. But, the methods used for identifying, testing, and managing are qualitative in nature. Risk is estimated by Annual Loss Expectancy, which is the product of Single Loss Expectancy (SLE) and Annual Rate of Occurrence, where SLE is the asset value and its exposure factor. Threat Analysis and Modeling Tool is used for modeling and open-source vulnerability database. National Vulnerability Database is used for testing the tool [29]. The product using the empirical method for system engineering risk assessment records the investigation, not the standards, forms, and procedures used to guide risk assessment. It is an arbitrary choice process; hence, it is not possible to do a quantitative analysis. It is a process like question and answers observation and concern value add for high-level questions. Fault Tree Analysis (FTA) is a top-down, failure deduction analysis method used by empirical approach for undesired state. Safety and reliability engineering use FTA to understand how the system can fail and to find the best way to reduce risk or system-level failure. The data for empirical system safety research used SafetyZOO-Risk Assessment Reports, assembled by a non-systematic search process. This is public or non-confidential data. The red book model and As Low As Reasonable Practice (ALARP) are the empirical assessment applications [30]. Vulnerabilities running in software application network require a network administrator to have a better understanding of the threat landscape and associated risk with it, so that cyberattacks need to be treated with appropriate countermeasures like security controls and policies. A framework representing these features enables quantifying the risk at the real-time network for traffic analysis and enumeration of the network. Attack patterns and network traffic are analyzed using data from the
Investigation of Assessment Methodologies in Information Security …
397
university network. The framework supported the risk score metric which calculates more objectively and not subjectively by human-based quality comments. Modeling risk is associated with software application and understanding the effectiveness of existing system policies using real-world data validated and demonstrated in the assessment framework [31]. Air Traffic Management (ATM) evaluation and validation are the integral parts which are clear and broadly used. To analyze the efficiency of the security policies of information and Return on Investment, the framework investigated, Empirical Framework for Security Design and Economic Trade-off (EMFASE), by applying different risk assessment methods in different scenarios to evaluate the performance, impact, usability, and economy of ATM. Qualitative analysis was conducted by question and answer and interviews with professionals to summarize the success criteria. The different aspects and characteristic classifications for risk assessment are derived from the data gathered at WP16.6.2 Jamboree from ATM professionals. ISO 31000 and ISO/IEC 27005 are the base of the selected methods in EMFASE. CORAS tool is designed to ease on-the-fly modeling diagram and EUROCONTROL is ATM Security Risk Management Tool which supports Air Navigation Service Providers to identify, assess, record, and manage risk [32]. Advantages of using Empirical methods for Risk Assessment: The empirical method provides a systematic and objective approach to risk assessment that can help organizations identify and manage risks more effectively.
4 Advantages and Limitations of Conducting Risk Assessment 4.1 Advantages of Conducting Risk Assessment Improved Safety: Conducting risk assessment helps to identify potential hazards and risks, allowing organizations to take necessary steps to mitigate them. Cost Saving: By identifying potential risks, organizations can take preventive measures to avoid or reduce the loss due to the potential risk. Compliance: By conducting risk assessment, organization ensures the regulatory requirements. Better decision-making: By identifying the risk from assessment, an organization will understand the potential threats and prioritize the remedial action to avoid negative impacts. Improved Reputation: A significant effort to ensure safety and managing risk enhances overall reputation of an organization [33].
398
C. Rajathi and P. Rukmani
4.2 Limitations in Conducting Risk Assessment Incomplete Data: Risk assessment relies on complete accurate data. If the data which need for an assessment are incomplete or inaccurate, then identifying potential threats to an organization is not possible. Uncertainty: Risk assessment is based on estimation and prediction. Identifying the likelihood and the impact of hazards accurately is impossible due to their uncertainty. Limited scope: Risk assessment is typically focused on a particular process or system. This leads to other potential risks being overlooked if they are outside the scope of assessment. Time and resources: Conducting risk assessment thoroughly is time-consuming and requires significant resources. This may be challenging for a small organization with limited resources. False sense of security: Some businesses may assume that they have eliminated all potential hazards as a result of a risk assessment’s findings. Yet, it is crucial to understand that there is always a certain amount of risk and that further steps may be needed to manage and reduce this risk.
5 Conclusion and Future Work The review of the risk assessment on information security has shown that the assessment has identified several potential risks that could impact the organization’s operations and reputation. The review highlighted the strength and weaknesses of the assessment including limitations in data gathering and need for the further improvements in the assessment process. The review has evaluated the effectiveness of mitigations recommended in the assessment. Overall, the review has provided valuable insights into the organization’s security posture and the effectiveness of the risk assessment. Based on the review of the risk assessment for information security, future work can be undertaken to enhance the risk assessment process of an organization. These include developing a risk management strategy integrating with Artificial Intelligence, incorporating human factors, improving measurement and evaluation, and addressing emerging threats and risks.
Investigation of Assessment Methodologies in Information Security …
399
References 1. Landoll D (2021) The security risk assessment handbook: a complete guide for performing security risk assessments. CRC Press 2. Wheeler E (2011) Security risk management: building an information security risk management program from the ground up. Elsevier 3. Fundamentals of Information Security Risk Management, https://www.rapid7.com/fundament als/information-security-risk-management/ 4. Risk Management Process, https://continuingprofessionaldevelopment.org/risk-managementsteps-in-risk-management-process/ 5. Information Security Assessment Types, https://danielmiessler.com/study/security-assess ment-types/ 6. Delphi Method, Available: https://en.wikipedia.org/wiki/Delphi_method 7. Limitations of risk Assessment, International Institute of Risk and safety Management. Available: https://www.iirsm.org/limitation-risk-assessment 8. Plackett RL, Burman JP (1946) The design of optimum multifactorial experiments. Biometrika 33(4):305–325 9. Singh A, Lilja D (2009) Improving risk assessment methodology: a statistical design of experiments approach. In: Proceedings of the 2nd international conference on security of information and networks, pp 21–29 10. Chen G, Wang K, Tan J, Li X (2019) A risk assessment method based on software behavior. In: 2019 IEEE international conference on intelligence and security informatics (ISI). IEEE, pp 47–52 11. Li D (2019) Research on legal risk assessment in high-tech SMEs based on AHP-FCE model. In: 2019 IEEE 6th international conference on industrial engineering and applications (ICIEA). IEEE, pp 828–831 12. D’Arcy J, Herath T, Shoss MK (2014) Understanding employee responses to stressful information security requirements: a coping perspective. J Manag Inf Syst 31(2):285–318 13. Yin RK (2009) Case study research: design and methods (vol 5). Sage 14. Lundgren M, Bergström E (2019) Security-related stress: a perspective on information security risk management. In: 2019 International conference on cyber security and protection of digital services (cyber security). IEEE, pp 1–8 15. Prajanti AD, Ramli K (2019) A proposed framework for ranking critical information assets in information security risk assessment using the octave allegro method with decision support system methods. In: 2019 34th International technical conference on circuits/systems, computers and communications (ITC-CSCC). IEEE, pp 1–4 16. Malinowski K, Karbowski A (2019) Hierarchical on-line risk assessment at national level. In: 2019 International conference on military communications and information systems (ICMCIS). IEEE, pp 1–5 17. Rindell K, Holvitie J (2019) Security risk assessment and management as technical debt. In: 2019 International conference on cyber security and protection of digital services (cyber security). IEEE, pp 1–8 18. Bhatia M, Maitra JK (2018) E-learning platforms security issues and vulnerability analysis. In: 2018 International conference on computational and characterization techniques in engineering & sciences (CCTES). IEEE, pp 276–285 19. Jackson LA, Al-Hamdani W (2008) Economic acceptable risk assessment model. In: Proceedings of the 5th annual conference on Information security curriculum development, pp 36–39 20. Han X, Huang H, Wang L (2019) F-PAD: Private attribute disclosure risk estimation in online social networks. IEEE Trans Dependable Secure Comput 16(6):1054–1069 21. Pulkkinen P, Tiwari N, Kumar A, Jones C (2018) A multi-objective rule optimizer with an application to risk management. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 66–72
400
C. Rajathi and P. Rukmani
22. Dutta A, Al-Shaer E (2019) “What”,“Where”, and “Why” cybersecurity controls to enforce for optimal risk mitigation. In: 2019 IEEE Conference on Communications and Network Security (CNS). IEEE, pp 160–168 23. Arena M, Arnaboldi M, Azzone G (2011) Is enterprise risk management real? J Risk Res 14(7):779–797 24. Power M (2007) Organized uncertainty: designing a world of risk management. Oxford University Press on Demand 25. Pavlova XL, Shaposhnikov SO (2019) Risk management for university competitiveness assurance. In: 2019 IEEE conference of Russian young researchers in electrical and electronic engineering (EIConRus). IEEE, pp 1440–1443 26. Lima AM (2010) Risk assessment on distributed software projects. In: 2010 ACM/IEEE 32nd international conference on software engineering, vol 2. IEEE, pp 349–350 27. Privacy rights clearing House Breaches. Available: https://privacyrights.org/data-breaches 28. Carfora MF, Orlando A (2019) Quantile based risk measures in cyber security. In: 2019 International conference on cyber situational awareness, data analytics and assessment (Cyber SA). IEEE, pp 1–4 29. Mkpong-Ruffin I, Umphress D, Hamilton J, Gilbert J (2007) Quantitative software security risk assessment model. In: Proceedings of the 2007 ACM workshop on quality of protection, pp 31–33 30. Rae A, Hawkins R (2012) Risk assessment in the wild. In: Proceedings of the Australian system safety conference, vol 145, pp 83–89 31. Awan MSK, Burnap P, Rana O (2015) An empirical risk management framework for monitoring network security. In: 2015 IEEE international conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing. IEEE, pp 1764–1771 32. Massacci F, Paci F, Solhaug B, Tedeschi A (2014) EMFASE—an empirical framework for security design and economic trade-off. In: 2014 Ninth international conference on availability, reliability and security. IEEE, pp 537–543 33. Why it is essential to conduct IT Security Assessment, https://www.cloudsecuretech.com/ess ential-conduct-security-assessment/
TSCH Scheduling Mechanism Based on the Network’s Throughput with Dynamic Power Allocation and Slot-Frame Delay Md. Niaz Morshedul Haque, Nakib Ahmed Sami, and Abdus Shahid Al Masruf
Abstract In the paper, we proposed a TSCH scheduling method that performs the scheduling based on the network’s throughput with considering dynamic power allocation and slot-frame delay according to the guideline of IEEE 802.15.4e. Here, we used a bipartite graph-based solution of a link-to-cell allocation technique. The prominent Hungarian assignment algorithm was used for assigning links’ on corresponding cells’. The assignment ensured throughput maximization with dynamic power allocation and fixed slot delay, which is close to the actual scenario of data transmission scheduling of wireless communications. In the simulation, we considered a four nodes structure of a network model, which formulated eight links for data transmission. We also assumed a slot frame of four channels, and four slots formed sixteen cells for allocating links for data transmission. The simulation results showed that the proposed method demonstrated more fairness than the previously proposed techniques, and throughput and delay relations are reasonable for the fundamental nature of wireless networks. Keywords TSCH-based scheduling · Bipartite graph · Hungarian assignment
Md. Niaz Morshedul Haque (B) Department of EEE, Bangladesh Army International University of Science and Technology (BAIUST), Cumilla, Bangladesh e-mail: [email protected]; [email protected] N. A. Sami · A. S. Al Masruf Department of EEE, Leading University, Sylhet, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_27
401
402
Md. Niaz Morshedul Haque et al.
1 Introduction Time-slotted channel hopping (TSCH) is a technique that performs scheduling operations by frequency division multiple access (FDMA) and time division multiple access (TDMA) techniques [1]. It follows the media access control (MAC) mechanism, consumes low power, and is highly efficient [2]. Due to its multi-functionality and handy effectiveness, it is gaining popularity in the internet of things (IoT) applications [3]. The TSCH-based scheduling was introduced in 2015 at the guidelines of IEEE 802.15.4e, which also mitigates the disadvantages of IEEE 802.15.4 [4]. The IEEE 802.15.4e inevitably contains the scheduling algorithm. Allocating links to cells or nodes is when a cell serves as a primary source for data transfer. It must be created according to the application’s concrete requirements, whether centrally located or distributed according to the guideline of IEEE 802.15.4. Here, the particular network nodes adhere to a scheduling technique that clarifies what is occurring in each slot, such as transmission, receiving, and idle situations [5]. The slot frame serves as the primary means of communication in the TSCH, and it is required for a pair of nodes to exchange data for carrying out their functions [6]. The most effective solution of TSCH-based scheduling is considering variable channel state information (CSI) to maximize the network’s throughput [7]. On the other hand, the bipartite graph approach for a link-to-cell allocation is the convenient approach to execute the proper scheduling [8]. Furthermore, the different features of the statistical Hungarian assignment algorithm are incorporated into TSCH-based scheduling to perform the exact allocation of a link to a cell which satisfies the nature of TSCH according to the guidelines of IEEE 802.15.4e [9, 10]. In the paper, we proposed TSCH-based scheduling where the power allocation technique [11] was incorporated with the throughput [12], which was determined by Shannon’s formula. Slot-frame delay ensured more fairness than previously proposed techniques.
2 Literature Review This section will discuss some proposed TSCH-based scheduling schemes where random channel state information (CSI) was considered. The random CSI is close to the natural phenomena of wireless communication. It means the network’s environment and data transmission mechanism will change if channel parameters change. These papers also proposed a bigraph solution for a link-to-cell assignment. In [7], authors proposed a bigraph technique of a TSCH-based solution where throughput was maximized and the slot-frame delay was minimized. In [8], authors depicted utilizing the Hungarian assignment algorithm to execute TSCH-based scheduling, where bipartite edge weight was considered only throughput and scheduling maximized throughput. In [9], Hungarian was performed based on the
TSCH Scheduling Mechanism Based on the Network’s Throughput …
403
maximization of bipartite edge weight, where bipartite edge weight was the sum of throughput and delay of a specific slot. The delay was the same for each slot; here also projected the moving average delay, which ensured the delay between two slots at the repetition. This scheme is the more pragmatic solution to TSCH-based scheduling. In [12], where authors incorporated a learning-based algorithm in TSCHbased scheduling to reduce the execution time, scheduling was performed based on maximizing the network’s throughput. In [11], we learned about the node-to-node power allocation techniques in wireless communications. In the paper, we adopted this technique and calculated throughput based on the dynamic power allocation. It ensured more fairness in transferring data from node to node. It is also the more closure to the fundamental nature of wireless communication systems.
3 Contributions In the paper, we considered a random channel state information (CSI) which will be changed at the repetition of slot frame close to the pragmatic fashion of TSCHbased scheduling and the real nature of wireless communications. We focused on the network’s throughput with dynamic power allocation and considered slot-frame delay in scheduling node-to-node communications. The main contributions of this paper are as follows: • We modeled a bipartite graph structure for executing link-to-cell assignment where top vertexes are considered as link (node pair) and bottom vertexes are considered as slot frame cells. • Bipartite edge weight is the sum of the network’s throughput and slot-frame delay. Here, dynamic power allocation is incorporated for determining the network’s throughput and also incorporated slot-frame delay, which ensures fairness. • The prominent Hungarian algorithm performed the scheduling based on maximizing bipartite-edge-weight to ensure maximum data transfer with fixed delay to enhance the network’s reliability. • The simulation outcomes demonstrated that the proposed scheme is fairer than previously proposed techniques, and that throughput and delay relation is more authentic for wireless communications. Figure 1 depicts the basic flowchart of the proposed scheme, where the network was considered, and Rayleigh fading was utilized to determine channel state and power allocation. Based on this knowledge, throughput was calculated. The slot delay was calculated, and accordingly, bipartite-edge weight was determined. The Hungarian was conducted based on the bipartite edge weight, and scheduling was performed. By this method, enough sample data was generated to observe the network’s behavior which is close to the actual scenario.
404
Md. Niaz Morshedul Haque et al.
Fig. 1 Flowchart of the proposed scheme
4 TSCH Network Model There is just one gateway access point in a TSCH network, which synchronizes all of the network’s nodes as it connects the nodes together. The scheduler, which is installed in the gateway, establishes the frequency and time allocation for each node’s broadcast. After that, scheduling is handled centrally. We consider a TSCH network, similar to Fig. 2, with N nodes and a gateway [8].
4.1 Network Topology We consider a TSCH network G = {N , M} here N = {n 1 , . . . , n N } is the set of nodes and M = {m 1 , . . . , m M } is the set of links (Node-pairs). The slot-channel matrix of a four-node graph structure G is illustrated Fig. 3. Here, cells are accessed by two non-collide links. In a slot frame, links such as (A → M) and (C → B) are assigned in a same cell because of their non-interference.
TSCH Scheduling Mechanism Based on the Network’s Throughput …
405
Fig. 2 Network model
Fig. 3 Example of a network topology with corresponding slot-frame cell
4.2 Data Transmission Model The aggregation of channel, F = { f 1 , . . . , f F } and slot, S = {s1 , . . . , s S } constitute cells, C = {c1 , . . . , cC } which are the fundamental resources of data transmission. The links M are usually assigned by cells C according to the condition of channel gain, which is defined by h = Hm,c . The paper considers the Rayleigh fading as a reasonable model to determine the channel gain [9] by following the probability density function. FR (x) =
2x −x 2 /μ e μ
[ ] Here, μ = R R 2 ; R is a random variable.
406
Md. Niaz Morshedul Haque et al.
In the previously proposed schemes [8, 9] where power p was considered as a constant parameter but in the real scenario of the wireless communications channel, power exhibits random nature [11], and in the paper, we consider a dynamic nature of power in the context of real nature. The power is determined by using the following equation. p = Pm,c (h) =
βΩ0 βl 2 −1 h
Here, β is the bandwidth and Ω0 is the noise variance. When node-to-node data transmission happens in a specific cell c, the following Shannon formula determines the throughput if a link m is assigned in a specific cell c and corresponding channel gain is h and power is p. ( ) h∗p ϕm,c = β log2 1 + βΩ0 More particularly, we consider delay according to cell position in the suggested scheduling scheme. Delay performance is better when the delay value is low because the relationship between delay and delay performance is inversely proportional. The delay performance will decline as the cell index, c ∈ C rises. The delay will be the same in each slot according to our network’s evaluation of the slot frame, as depicted in Fig. 1. Furthermore, the following equation can be used to calculate the delay performance in each cell: (| ϒm,n (c) = δ exp
| | |) C C −c − S S
Here, δ is the constant parameter trade-off between throughput and delay, c is the cell index, C is the total number of cell, and S is the total number of slot.
4.3 Link-to-Link Intervention Model According to this paradigm, the intervention graph links inside a specific independent set can all be scheduled concurrently on the same physical channel. This calculation is performed in a processing step before executing the schedule-calculating algorithms [8]. The link-to-link intervention is defined by O˜ in Fig. 4, where vertexes are achieved from the topology graph Fig. 3. The edge in the model indicates the collision between two links. However, the transmission, such as ( A → M) and (B → M), does not happen simultaneously, and there is an edge between them in the intervention model. On the other hand, there is no edge between ( A → M) and (B → C), which means there is no collision between both links.
TSCH Scheduling Mechanism Based on the Network’s Throughput …
407
Fig. 4 Corresponding intervention model
5 The Proposed TSCH-Based Scheduling Model In the paper, we proposed a solution for TSCH-based scheduling by using the Hungarian assignment algorithm. Hungarian was executed based on the knowledge of bipartite edge weight. The bipartite edge is the link to the cells execution model of the network.
5.1 Bipartite Graph Model of Link-to-Cell Allocation Technique A bipartite graph, also known as a bigraph, is a collection of vertices divided into two distinct sets where no two vertices are next to one another. It is prominently used in assignment works [13]. Here, a bigraph is used to project the links to the cell allocation method. { } The bipartite graph ϑ = M, C, Eˇ was considered in Fig. 5, where top vertexes are considered as a set of links M and bottom vertexes are as a set of slot-frame cells ˇ is considered by the sum of normalized throughput and delay define C. The edge ( E) by the following equation. Uˆ m,c = α´
(
ϕm,c max ϕm,c
)
( ( ) + 1 − α´
ϒm,n max ϒm,n
)
408
Md. Niaz Morshedul Haque et al.
Fig. 5 Corresponding bipartite graph model for link-to-cell allocation
Here, α´ is the weighting factor between throughput and delay of the network. The performance of throughput and delay will be observed with the variation of weighting factor.
5.2 Hungarian Assignment Algorithm to Execute TSCH-Based Scheduling The Hungarian method is a combinatorial optimization algorithm that anticipates further primal–dual ways and solves the assignment problem in polynomial time. It helps to assign a specific job based on specific parameters [14]. In the paper, we utilize the Hungarian assignment algorithm to execute link allocation with the corresponding cell, which satisfies the nature of TSCH. The Hungarian assignment was executed based on the maximization of bipartite edge weight. In the work, the bipartite edge weight was calculated on the summation of network throughput and slot-frame delay. The throughput was executed with considering dynamic power allocation and the random nature of the channel state, which is more closure to the real nature of the wireless network. The Hungarian algorithm of this work is as follows [8]. Algorithm 1 Hungarian algorithm process Step 1: Insert bigraph edge weight from bipartite graph (|C ∗ S|) (continued)
TSCH Scheduling Mechanism Based on the Network’s Throughput …
409
(continued) Step 2: a = cost matrix % edges’ weight of bipartite graph Step 3: b = max(a) % determine maximum value of cost matrix Step 4: y = b − a % the process is to subtract all element from the highest value of the cost matrix Step 5: executing row operation % subtracting row minima from in each row-wise Step 6: executing column operation % subtracting column minima from in each column wise Step 7: mark the least zero elements and remove other zero elements Step 8: find the minimum value of all uncovered elements Step 9: the minimum value subtracts from uncovered elements and add with intersect elements Step 10: if column and row are uncovered update predecessor index Step 11: achieved optimal assignment with maximized edges’ weight
In the work, we utilize the above algorithm to generate 1000 sample data of TSCHbased scheduling to observe the behavior of wireless communication networks to justify the real nature of TSCH-based scheduling and the corresponding relation between throughput and delay. The details method of the scheduling is depicted in Algorithm 2. Algorithm 2 To generate enough samples data by utilizing Hungarian assignment algorithm 1: 2: 3: 4:
Initialize: β, l and No //primary loop For m ∈ M and c ∈ C of e ∈ E Determined channel state by utilizing Rayleigh fading phenomenon: h = Hm,c
5: Determined dynamic power for all cells: p = Pm,c (h) = ( 6: Determined throughput accordingly: ϕm,c = βlog2 1 +
βΩ0 βl h 2 h∗ p β N0
)
−1
7: Determine slot delay performance according to the cell position: | | C |) (| − S ϒm,n (c) = δ exp C−c S 8: Bipartite-graph edge weight was calculated: ( ) ( ) )( ϒ ϕ Uˆ m,c = α´ maxm,c ´ maxm,n ϕm,c + 1 − α ϒm,n 9: Determined cell scheduling for all links by utilizing the Hungarian-algorithm (Algorithm 1) with the matching of maximization of bipartite edge weight, Uˆ m,c 10: //prime loop 11: In this process, generated 1000 sample data to observe behavior of networks
6 Simulation Results The simulation was carried out by MATLAB using a PC with an Intel Core i7 processor and 8 GB of RAM to determine network parameters and executed the Hungarian assignment algorithm to specify links to cell allocations based on the maximization of bipartite edge weight.
410 Table 1 Specification of the parameter to execute network model
Md. Niaz Morshedul Haque et al.
Parameter
Specification
Link number (M)
8
Cell number (C)
16
Slot number (S)
4
Bandwidth (β)
1 MHz
Noise variance (Ω0 )
1
6.1 Bigraph Model of Proposed Network As previously said, the primary goal of this research is to construct a bipartite graph model of the TSCH network. Algorithm 2 uses the model described in detail to create enough sample data to examine the behavior of the network. We took into account the parameters to build the model given in Table 1 based on our network model.
6.2 Link’s to Cell’s Allocation Figure 6 depicts the links to the allocation of cells, which was executed by the Hungarian assignment algorithm. The Hungarian was conducted based on the maximization of bipartite edge weight. The bipartite edge weight is the sum of normalized throughput and slot delay, where throughput was determined by considering random power allocation. Figure 5 reveals that the links are assigned to corresponding cells without interruption, which satisfies the goal of TSCH.
Fig. 6 Cell assignment executed by the Hungarian assignment algorithm
TSCH Scheduling Mechanism Based on the Network’s Throughput …
411
Fig. 7 Fairness among baseline schemes [8, 9] and proposed scheme in this paper
6.3 Network’s Fairness Figure 7 exhibits the fairness of our proposed scheme as discussed in the previously proposed methods [8, 9], where throughput was calculated with considering fixed power level. On the other hand, we assumed throughput with random power allocation. We incorporated this concept from [11], considering fixed slot delay. As a result, our proposed method exhibited more fairness than baseline schemes [8, 9].
6.4 Performance Evaluation We determine the proposed scheme’s performance by comparing the relation between throughput and delay. Figure 8 exhibits throughput and delay performance with the ´ When α´ is low, throughput is low; it means throughput changing weighting factor (α). performance is low, and at the same time, the delay is low, it means delay performance is high, which is closer to the actual scenario of wireless communication networks and satisfies the nature of TSCH according to the guideline of IEEE 802.15.4e [9, 12].
412
Md. Niaz Morshedul Haque et al.
Fig. 8 Performance of a throughput and b delay according to the weighting factor (α) ´
7 Conclusions In the study, we proposed a solution of TSCH-based scheduling, which was solved by utilizing the Hungarian assignment algorithm to determine link allocation on the corresponding cell based on the maximization of throughput considering dynamic power allocation and fixed slot delay. Simulation results exhibit fairness than the previously proposed schemes, which is reasonable to the natural fashion of TSCH. In the future, we will integrate DNN and interference network clusters to make this algorithm more pragmatic, utilizing the feature of DL in wireless communications.
References 1. Piyare R, Oikonomou G, Elsts A (2020) TSCH for long range low data rate applications. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3046769 2. Din IU, Guizani M, Hassan S, Kim BS, Khan MK, Atiquzzaman M, Ahmed SH (2019) The ınternet of things: a review of enabled technologies and future challenges. IEEE Access.https:// doi.org/10.1109/ACCESS.2018.2886601 3. Bae BH, Chung SH (2020) Fast synchronization scheme using 2-Way parallel rendezvous in IEEE 802.15.4 TSCH. Sensors (Switzerland) 20(5). https://doi.org/10.3390/s20051303 4. IEEE Standard for Local and Metropolitan Area Networks-Part 15.4: Low-Rate Wireless Personal Area Networks (LR-WPANs) (n.d.) 5. Oh S, DY Hwang, Kim KH, Kim K (2018) Escalator: an autonomous scheduling scheme for convergecast in TSCH. Sensors (Switzerland) 18(4). https://doi.org/10.3390/s18041209
TSCH Scheduling Mechanism Based on the Network’s Throughput …
413
6. Ojo M, Stefano G, Portaluri G, Adami D, Pagano M (2017) An energy efficient centralized scheduling scheme in TSCH networks. In: 2017 IEEE ınternational conference on communications workshops, ICC Workshops 2017, pp 570–75. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICCW.2017.7962719. 7. Ojo M, Giordano S (2016) An efficient centralized scheduling algorithm in IEEE 802.15.4e TSCH networks. In: 2016 IEEE conference on standards for communications and networking, CSCN 2016. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ CSCN.2016.7785164 8. Javan T, Nastooh MS, Hakami V (2020) IEEE 802.15.4.e TSCH-based scheduling for throughput optimization: a combinatorial multi-armed bandit approach. IEEE Sens J 20(1):525–537. https://doi.org/10.1109/JSEN.2019.2941012 9. Haque MNM, Lee YD, Koo I (2022) Deep Learning-based scheduling scheme for IEEE 802.15.4e TSCH network. Wirel Commun Mobile Comput https://doi.org/10.1155/2022/899 2478 10. Niaz M, Haque M, Koo I (2022) TSCH-based scheduling of IEEE 802.15.4e in coexistence with ınterference network cluster: a DNN approach simulink model of controlling fuel cell powered direct current motor with comparative performance analysis view project fault diagnosis using advance hybrid algorithms view project TSCH-based scheduling of IEEE 802.15.4e in coexistence with ınterference network cluster: a DNN approach. Int J Internet Broadcast Commun 14(1):53–63https://doi.org/10.7236/IJIBC.2022.14.1.53 11. Hameed I, Tuan PV, Koo I (2020) Exploiting a deep neural network for efficient transmit power minimization in a wireless powered communication network. Appl Sci (Switzerland) 10(13). https://doi.org/10.3390/app10134622 12. Haque M, Niaz M, Koo I (2023) Throughput optimization of IEEE 802.15.4e TSCH-Based scheduling: a deep neural network (DNN) scheme. Inst Electr Electron Eng (IEEE) 844–49. https://doi.org/10.1109/iccit57492.2022.10054722 13. Ma J, Qiao Y, Hu G, Li T, Huang Y, Wang Y, Zhang C (2018) Social account linking via weighted bipartite graph matching. Int J Commun Syst 31(7):e3471.https://doi.org/10.1002/ dac.3471 14. Date K, Nagi R (2016) GPU-accelerated hungarian algorithms for the linear assignment problem. Parallel Comput 57(September):52–72. https://doi.org/10.1016/j.parco.2016.05.012
Movie Recommendation System Using Hybrid Approach Nidhi Bharatiya , Shatakshi Bhardwaj , Kartik Sharma , Pranjal Kumar , and Jeny Jijo
Abstract In this digital era, there is enough content on the movie suggestion system already. The customer won’t have to spend a lot of time looking for stuff they could enjoy if movie suggestions are provided. To get user-specific movie suggestions, a recommendation system for movies is also essential. It has been discovered, after conducting extensive online research and consulting a large number of research papers, that the suggestions produced utilising Content-based filtering in order to compare the similarity of the vectors, filtering only employs a single method for text-to-vector conversion. To produce the final recommendation list for this study, the output of various text-to-vector conversion algorithms was altered and employed a variety of text-to-vector conversion strategies. It solely employs the content-based filtering technique, although it may be thought as a hybrid strategy. With the help of movie recommendation systems that employ content-based filtering, these systems can suggest a movie to the viewer depending on the content. People now use movies as a getaway from their hectic life. A big dataset of movies that are accessible globally makes it difficult to pick just one movie, making watching movies more difficult than it has to be. In this work, Heroku deployment has been used to create a recommendation system that used a content-based filtering method. The most common methods for creating recommendation systems are collaborative filtering and content-based filtering. The application may be easily deployed thanks to Heroku deployment. Keywords Movie recommendations · Content-based filtering · Text to vector · Vector similarity · Hybrid approach
N. Bharatiya (B) · S. Bhardwaj · K. Sharma · P. Kumar · J. Jijo PES University, Bangalore, India e-mail: [email protected] J. Jijo e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_28
415
416
N. Bharatiya et al.
1 Introduction Due to the quantity of knowledge amassed up to the twenty-first century and the speed at which information is disseminated on the Internet, there is a great deal of confusion regarding what should be consumed and what shouldn’t. For example on YouTube, there are typically a number of videos accessible for users to view when they want to see a video of a specific subject. Since the results are suitably ranked, there might not be many problems right now, but what if they weren’t? Therefore, in such circumstances, a great deal of time should be undoubtedly spent seeking for the best movie that is suitable for us and satisfies our demands. This list of suggestions shows when the user performs a website search. The algorithm should be able to display recommendations the next time the user visits a certain page without even making a search for them. This trait certainly seems intriguing. The fundamental duty of a recommender system is to offer the most relevant options to the user. For example, recommendation algorithms are used by Netflix and Amazon Prime for movies, by Flipkart and Amazon for products and by YouTube for video recommendations. What a user does on these websites is tracked by a system that eventually suggests goods or products users are likely to be interested in based on their behaviour. This research paper looks at movie suggestions, the rationale behind standard movie recommendation engines, the drawbacks of conventional movie recommendation systems, and a remedy for an AI-tailored movie recommendation system. On Kaggle and other platforms, there are already well-known datasets relating to movie suggestions. Among these are the Movielens dataset, and the dataset produced by Netflix themselves. Movie suggestions are used by services like Netflix, Amazon Prime and others to increase revenues or sales by improving overall user experience. According to [1], in fact, in 2009, Netflix held a competition with a reward pool of around .1million(1M) for enhancing the current method by at least 10%. Since, as was said before, no one is frequently engaged in the entirety of the material that is available to the users, they must filter it before consuming it. Certain filtering techniques in order to classify the data are needed. A recommendation system may be built on a variety of filtering methods or algorithmic methods for suggesting movies. Many significant filtering techniques, such as collaborative filtering, contentbased filtering and hybrid filtering, are used to achieve the same goal. A recommender system is one that, depending on specified information, suggests particular resources to consumers, such as books, movies and music. The qualities of previously appreciated films are frequently used by movie recommendation systems to anticipate a user’s tastes. These recommendation systems are beneficial to businesses that gather data from lots of clients and try to offer the finest suggestions. A number of factors, such as the movie’s genre, cast and even director, may be taken into consideration when creating a system for movie recommendations. The algorithms may provide movie recommendations based on one attribute, a combination of two or more, or even more. The user’s preferred genres of television are used as the basis for the study’s recommendation engine. The approach adopted for
Movie Recommendation System Using Hybrid Approach
417
Fig. 1 Phases of the proposed methodology
this is content-based filtering with Heroku deployment. A similar goal was achieved using the Tmdb movie dataset. For data analysis, Jupyter notebook and Pycharm are used. The implementation of this application is divided into four phases as shown in Fig. 1 given. This figure depicts the various phases in which the application is divided beginning with data pre-processing, followed by model building and website building, ending with the final deployment of the recommendation system:
2 Literature Review Choi et al. [1] emphasised the collaborative filtering method’s shortcomings, such as the sparsity issue or the cold-start challenge. In order to solve this issue, the authors propose employing category information as a workaround. Consequently, the recommendation algorithm might propose a brand-new movie. Lekakos et al. [2] suggested a hybrid approach to the problem of movie selection. According to the authors, content-based filtering and collaborative filtering both have their own disadvantages and can be used in various situations. Consequently, the recommendation algorithm has the ability to propose a brand-new film. Debashis Das et al. [3] published an article describing the many classifications of recommendation systems and their general information. An overview of recommendation systems was provided in this work. Additionally to discussing nonpersonalised systems, the authors also discussed personalised recommendation systems. Zhang et al. [4] introduced the term “Weighted KM-Slope-VU” to describe a collaborative filtering method for movie selection. K-means clustering was used by the authors to group comparable users into groups. This shortens the time it takes to obtain suggestions. Rajarajeswari et al. [5] spoke about simple recommender systems, content-based recommender systems, collaborative filtering-based recommender systems and eventually we suggested a hybrid recommendation system as a cure. Ahmed et al. [6] used the K-means clustering technique to suggest a solution. Authors used clusters to distinguish between similar users for the aim of recommendations. For each cluster, the authors eventually created a neural network. Finally, predictions of high ratings are used to make recommendations.
418
N. Bharatiya et al.
Harper et al. [7] referred to the specifics of their investigation, the MovieLens dataset. This dataset is often used, especially for movie recommendation reasons. There are other datasets available, including the MovieLens 100K, 1M, 10M, 20M and 25M /1B dataset. According to Lavanya et al. [8], in order to deal with the problem of the information explosion, recommendation systems are helpful. The writers talked about issues such data sparsity, cold start and scalability. The authors evaluated the literature on movie recommendation systems using about 15 research publications. Phonexay Vilakone et al. [9], the number of vertices that both k-cliques share in a straightforward graph can be used to illustrate the overlap between two k-cliques. The clique percolation method is equivalent to quantising this clique network, eliminating all edges with weights less than (k-1) and keeping just the linked components to create the populations of cliques identified by CPM. Syed Ali et al. [10], this study suggests a hybrid approach for recommending similar movies based on content-based filtering and movie genetic characteristics. It reduces the quantity of duplicated and low-proportion-of-variance tags using principal component analysis (PCA) and Pearson correlation approaches, which lowers the processing complexity. Yeole Madhavi et al. [11], this paper examines movie recommendations and their basis, along with traditional movie recommendation systems, issues with such systems, and a recommended repair for an AI-based personalised movie recommendation system. Manoj Kumar et al. [12], a collaborative and content-based filtering method was employed to fine-tune the recommendation. The majority of recommendation systems in use today employ ratings provided by former users to identify potential clients. Based on these scores, the item of interest is projected and suggested. Xu et al. [13] suggested a collaborative filtering method for suggesting movies, and they called it “Weighted KM-Slope-VU”. K-means clustering was used by the authors to group comparable users into groups. From each cluster, a virtual opinion leader is elected representing all users within that particular cluster. Arora et al. [14], The technique is based on user similarities. The research paper is quite generic in that the writers have not discussed the specifics of how things operate within. In the Methods section, the authors only mentioned City Block Distance and Euclidean Distance; they made no mention of cosine similarity or any other methodologies. The solution of personalised movie suggestion that makes use of a collaborative filtering technique has been put out by V. Subramaniyaswamy et al. [15]. The most comparable user has been identified using the Euclidean distance metric. It is determined which user has the shortest Euclidean distance value.
Movie Recommendation System Using Hybrid Approach
419
3 Proposed Methodology The commonality of the recommended items is what content-based filtering (CBF) depends on. The fundamental tenet is that if the user enjoys watching something, they would want to watch something “similar”. It typically works effectively when it’s simple to establish the context and characteristics of each item. When a person is viewing a movie, the system will look for additional films that are comparable. The necessary characteristics must be combined into a single feature using the preprocessing procedures on the dataset. Later, the text from that specific feature was converted into vectors. The similarity between the vectors was calculated using the scores from the recommendation of two algorithms, namely cosine similarity and Tf-Idf (frequency-inverse) Vectoriser, details about which have been discussed in the further sections.
3.1 Architecture Figure 2 depicts a systematic architectural diagram using which the application functionalities and implementation design is drafted. It represents the entire procedure followed while implementing the proposed methodology including the various techniques used for pre-processing and the above-mentioned two algorithms.
3.2 Algorithms Converting text to vector and by using CountVectoriser and Tf-idf Vectoriser. After vectorising the text, it must be ascertained how similar the vectors are to one another. The similarity between the vectors may be determined using a variety of methods such as cosine similarity and others. To compute cosine similarity, first represents each movie into a vector of word frequencies or TF-IDF scores using the movie tags. Then, the dot product of the two vectors is divided by the product of their magnitudes. The resulting value is a measure of how much the two vectors are aligned with each other. Cosine similarity scores range from .− 1 to 1, with 1 denoting equivalence of the two vectors, 0 means the two vectors are orthogonal (i.e. they have no common components), and .−1 means the two vectors are diametrically opposed (i.e. they are completely dissimilar). CountVectoriser and cosine similarity for content-based recommendation (Algo 1-CoSim)—The CountVectoriser creates a vector from a text input based on the frequency for every word used throughout the main content. After obtaining the vectors, cosine similarity will be used to determine how similar the vectors are to one another.
420
N. Bharatiya et al.
Fig. 2 Architectural diagram
Content-based Recommendation algorithm using Tf-Idf Vectoriser (frequency-inverse) and cosine similarity (Algo 2- CoVec)—TfIdfVectoriser is an algorithm that assesses the importance of words by examining their frequency in a given document. When the vectors have been produced, the similarity between the vectors will be evaluated using cosine similarity. Using both algorithms the model is obtaining the recommended movies accurately. These algorithms are used as they are efficient to find similarities between textual data. For instance, if a user gives a movie a good rating, the recommendation systems can suggest other films that the user might like that have comparable feature vectors (i.e. they are equivalent in terms of genre, stars, directors, etc.).
Movie Recommendation System Using Hybrid Approach
421
3.3 Pseudocode
Datasets made up of formats like text and pictures may be utilised to extract features in forms supported by machine learning algorithms using the sklearn feature extraction module. Use fit in CountVectoriser. To aid in comprehension, convert the matrix count matrix to determine the number of texts, and output the modified matrix count matrix to an array. Calculate the similarity between two films using Sklearn’s cosine similarity as the metric. Cosine similarity is a statistic used to determine how similar two objects are. When two vectors are envisioned onto a higher dimension, the cosine of the angle they create is calculated. Use enumerate to get the indices and coefficients. “Similar movies” is a list of tuples containing indices and coefficients.
4 Dataset Preprocessing In this application, “TMDB 5000 Movie Dataset” is considered with the intention of making movie recommendations. This dataset may be accessed at kaggle.com. The dataset is divided into two CSV files: 5000 movies from tmdb.csv, which has the details about the movies present in the dataset and tmdb 5000 credits.csv which includes details about the actors and employees. Data pre-processing was done using Jupyter notebook and many pre-processing techniques including dropping of unnecessary columns and duplicate data. Conversion of all the keywords to a paragraph string. The number of columns in the dataset before pre-processing was 24 and post pre-processing was reduced to 3. Figure 3 depicts the dataset before pre-processing where we merged two datasets: movies.csv and credits.csv. Figure 4 represents the conversion of key-value pairs to a single set which consists of only the values. This is done in order to make it easy to merge these columns into a single paragraph string of keywords. Figure 5 depicts the dataset post pre-processing, where all the unnecessary columns are dropped and all the columns (except movie_id and title) are merged
422
Fig. 3 Data input
Fig. 4 Data pre-processing
Fig. 5 Dataset post pre-processing
N. Bharatiya et al.
Movie Recommendation System Using Hybrid Approach
423
Fig. 6 Working of recommend function
into single columns named “tags”. This is the final paragraph string from which keywords based on user preferences would be searched.
5 Model Building As soon as pre-processing is completed, in this recommend function, a movie name is given as input, and based on model developed, it will recommend the most similar movie based on cosine similarity. Next to this step, the website integration for the user interface with the model will be done. In the model, conversion of all the keywords to a paragraph string which will be the movie tags takes place where the tags act as identifiers for that movie. It is implemented using content-based filtering approach, where user preferences were stored in the form of a matrix and later converted into a vector using Countvectoriser for further processing and analysis. Figure 6 depicts how recommend function is working and on the basis of similarity score which is calculated from
6 Website Building and Deployment After the model was built successfully using the algorithms, the next phase in the pipeline was building the website which users could access. For this phase, primarily Pycharm was used as a tool. The website was built using the API keys and executed using Streamlit commands. Figure 7 depicts the website after it has been successfully built, which consists of various features. Recommendation is done based on movie as well as genre, and user can select the number of movies which can be displayed. The website also consists of an IMDB rating score scale.
424
N. Bharatiya et al.
Fig. 7 Working website
Fig. 8 Deployment
After the website has been successfully built, the application reaches the last and final stage which is the deployment. This process of deployment was initially carried out using Heroku and later moved to Streamlit cloud. A cloud platform as a service that supports several programming languages is called Heroku. It employs containers as a platform as a service in the cloud (PaaS). Modern programmes are launched, managed, and scaled using Heroku by developers. The appealingness, versatility and simplicity of this platform offer developers the quickest route to releasing their applications. The community can deploy, find and share Streamlit applications and code on the Streamlit Community Cloud, which is an open and free platform. Figure 8 depicts the process of successful deployment being carried out using the Streamlit cloud.
Movie Recommendation System Using Hybrid Approach
425
7 Results and Discussions The results obtained introduced variety of features which are not found in the existing recommender systems, including a criteria-based search option and an IMDB rating score-based recommendation which is provided by the recommender system built. The online library’s built-in recommendation system for research papers therefore has a lot of advantages over those without it. Some of the results obtained are as follows: Figure 9 depicts the process in which the model has been completely built post data pre-processing and the execution of the final working recommendation function on the movie “Avatar”. The output is a list of all the movies which are similar to the movie entered from the dataset. Figure 10 depicts how recommend function is working on the website. For example, movie similar to “Avatar” is recommended by the model. This figure depicts the working of the main website where the user can choose the category, whether the movie selected should be based on the movie name or the genre associated with it. This image clearly shows the recommendations based on the movie name. The user can also choose how many recommendations will be shown on the screen. Figure 11 depicts the output if the user wants the movies to be displayed based on the genre. There is also a rating scale provided where the user can select the movie based on the rating out of 10. Figure 12 depicts the working of the rating scale which is according to the IMDB ratings of a movie. It can be adjusted and lies in the range of 1–10. In this image, movies similar to “Drama” genre that has an IMDB rating of “6” are recommended. Figure 13 depicts that the movies similar to the movie with IMDB rating “7” are recommended. Figure 14 depicts the redirection to IMDB website if the user wishes to explore more about the movie or even watch the movie. The user can select the movie link from the recommender site.
Fig. 9 Final recommendations similar to “Avatar” movie
426
Fig. 10 Recommendations based on movie
Fig. 11 Recommendations based on genre
N. Bharatiya et al.
Movie Recommendation System Using Hybrid Approach
427
Fig. 12 Working of the IMDB rating scale
8 Conclusion The findings reveal that the combined recommendations barely outperform the individual recommendations of Algorithms 1 and 2. So, it is frequently preferable to alter the output of numerous algorithms to produce a finished good that incorporates the benefits of each special technique. To summarise, a recommender system that employs content-based filtering and the cosine similarity algorithm may deliver better recommendations to customers by suggesting films that share essential features such as IMDb votes, average IMDb ratings, casts, filmmakers, genre, launch year, audience hashtags, and so forth. The proliferation of information has enhanced the significance of optimisation techniques. With content-based recommender systems, we look for particularly creative ways to improve the representation of the movie’s accuracy.
428
Fig. 13 Working of the rating scale with a different value
Fig. 14 Redirection to IMDB site if movie link selected from the website
N. Bharatiya et al.
Movie Recommendation System Using Hybrid Approach
429
References 1. Lohr S (2009) Netflix awards $1 million prize and starts a new contest. Bits. The New York Times. https://archive.nytimes.com/bits.blogs.nytimes.com/2009/09/21/netflix-awards1-million-prize-and-starts-a-new-contest/ 2. Choi S-M, Ko S-K, Han Y-S (2012) A movie recommendation algorithm based on genre correlations. Expert Syst Appl 39(9):8079–8085 3. Lekakos G, Caravelas P (2008) A hybrid approach for movie recommendation. Multimedia Tools Appl 36(1):55–70 4. Das, Debashis, Laxman Sahoo, and Sujoy Datta (2017) A survey on recommendation system. Int J Comput Appl 160(7) 5. Zhang J et al (2019) Personalized real-time movie recommendation system: practical prototype and evaluation. Tsinghua Sci Technol 25(2):180–191 6. Rajarajeswari S et al (2019) Movie recommendation system. Emerging research in computing, information, communication and applications. Springer, Singapore, pp 329–340 7. Ahmed M, Imtiaz MT, Khan R (2018) Movie recommendation system using clustering and pattern recognition network. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE 8. Harper FM, Konstan JA (2015) The movielens datasets: history and context. ACM Trans Interact Intell Syst (TIIS) 5(4):1–19 9. Lavanya R, Singh U, Tyagi V (2021) A comprehensive survey on movie recommendation systems. In: International conference on artificial intelligence and smart systems (ICAIS), pp 532–536. https://doi.org/10.1109/ICAIS50930.2021.9395759 10. Berry MW (1992) Large-scale sparse singular value computations. Int J Supercomput Appl 6(1):13–49 11. Resnick P, Iakovou N, Sushak M, Bergstrom P, Riedl J (1994) GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer, supported cooperative work 12. Breese JS, Heckerman D, Kadie C (2013) Empirical analysis of predictive algorithms for collaborative filtering. 7(7):43–52 13. Herlocker J, Konstan JA, Riedl J (2002) An empirical analysis of design choices in neighborhoods-based collaborative filtering algorithms. Inf Retrieval 5(4):287–310 14. Das J, Mukherjee P, Majumder S, Gupta P (2014) Clustering-based recommender system using principles of voting theory. In: International conference on contemporary computing and informatics (IC3I), Mysore, pp 230–235 15. Beel J, Gipp B, Langer S, Breitinger C (2015) Research paper recommender systems: a literature survey. Int J Digit Libr 17(4):305–338. https://doi.org/10.1007/s00799-015-0156-0
Highly Correlated Linear Discriminant Analysis for Dimensionality Reduction and Classification in Healthcare Datasets S. Rajeashwari and K. Arunesh
Abstract Particularly in medical science, artificial intelligence and its applications are expanding. Accessing significant clinical data is possible, and most remain untapped. This technology will help diagnose human diseases earlier if used effectively. An effective classification system allows doctors to make a more accurate diagnosis at an earlier stage of the disease. Because medical data typically contains many features, their inclusion in decision-making processes can lead to overfitting the classification model, affecting the classification accuracy. For this reason, it is crucial to create a dimensionality reduction technique that can effectively cut down on the number of structures while simultaneously improving the classification’s accuracy. To reduce the dimensionality of medical data, this article proposed Highly Correlated Linear Discriminant Analysis (HCLDA) framework. The datasets are collected from different chronic disease datasets, and missing values are analyzed using improved decision tree algorithm. The best features are selected by using weighted binary bat algorithm. The dataset clustering has used Gaussian kernel combining fuzzy c-means (GKFCM) with Particle Swarm Optimization (PSO) algorithm. The classification and dimensionality reduction (DR) have performed with HCLDA algorithm. The simulation results are compared with conventional dimensionality reduction methods, including LDA, PCA and correlation with RF classification. The classification performance metrics are compared with accuracy, sensitivity, and specificity. Keywords Chronic disease dataset · Gaussian kernel combining fuzzy c-means · Dimensionality reduction · Highly correlated linear discriminant analysis · Machine learning
S. Rajeashwari (B) · K. Arunesh Department of Computer Science, Sri S. Ramasamy Naidu Memorial College, (Affiliated to Madurai Kamaraj University), Sattur, Tamil N¯adu 626203, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_29
431
432
S. Rajeashwari and K. Arunesh
1 Introduction Machine learning (ML) has been one of the fastest-growing technologies over the last 15 years [1]. It has applications in several areas, including banking, healthcare, computer vision, bioinformatics, business analytics, fraud detection, and trend forecasting [2]. Using machine learning, computationally massive data samples are used to identify existing patterns. Machine learning algorithms are used in these different study areas to categorize and predict test scores and generate accurate ones. The application of classifier models for machine learning in medicine is continuously expanding [3]. They have proved very useful in diagnosing various clinical and medical datasets. Datasets must be pre-processed to predict any human disease. Dimension reduction is crucial in this context as it helps transform high-dimensional data into lower-dimensional data [4]. Numerous dimensionality reduction approaches to the dataset considering data samples have been used over the past decades [5–8]. High-dimensional inputs must be mapped to lower-dimensional inputs to map similar locations in the input space to nearby points on the manifold when the dimensionality is reduced [9]. Today’s digital world produces vast amounts of data from virtually every industry. Algorithms for machine learning are applied to the algorithms’ loading to extract actionable patterns from the data for use in management and business decisions [10]. By applying an orthogonal transformation to a set of features, Principal Components Analysis (PCA) produces the fewest possible principal components that best capture the diversity of the complete dataset. The most strongly connected properties of the initial data relate to the final principal components [11]. If the data from different classes do not fit the homoscedastic Gaussian (HOG) model, then Linear Discriminant Analysis (LDA) will not give accurate results. Instead of using a standard LDA, this article can use heteroscedastic discriminant analysis (HDA), a model-based generalization of the maximum likelihood framework [12]. Combining Locality Preserved Projection and Local Fisher Discriminant Analysis, Hybrid Local Fisher Discriminant Analysis (HLFDA) can take the provided medical dataset and reduce its dimensionality [13]. By applying an orthogonal transformation to a set of features, Principal Components Analysis (PCA) produces the fewest possible principal components that best capture the diversity of the complete dataset [14]. The most strongly connected properties of the initial data relate to the final principal components. Data from different classes does not fare well for LDA when it is used since it does not suit the homoscedastic Gaussian (HOG) model. As an alternative to traditional LDA, heteroscedastic discriminant analysis (HDA) is a model-based generalization created inside the maximum likelihood paradigm [15–17]. A small change in the data can result in a major change in decision tree, so we proposed the entropy with the avoiding result used as c4.5 as improved decision tree [18]. Feature selection will get rid of irrelevant and duplicated features that may cause unintended correlations in learning algorithms and hence reduce their generalizability. In the Binary Bat algorithm, the location of data is represented by a string of binary digits (BBA) [19]. PSO is a
Highly Correlated Linear Discriminant Analysis for Dimensionality …
433
stochastic optimization technique based on the movement and intelligence of swarms. The new algorithm is called Gaussian kernel-based fuzzy c-means clustering algorithm (GKFCM), and the major characteristic of gkfcm is the use of a fuzzy clustering [20]. This article’s primary objective is to investigate how dimensionality reduction techniques affect the performance of machine learning algorithms. The main contributions of this paper are as follows: • • • •
Chronic disease datasets are normalized using improved decision tree Features are selected using weighted binary bat algorithm The dataset has clustered using GKFCM with PSO algorithm Classification and DR has done by HCLDA.
The remaining parts of this paper are organized as follows. In Sect. 2, many different authors discuss different ways to reduce dimensions. In Sect. 3, discussed with the HCLDA framework. The investigation findings are presented in Sect. 4. Final thoughts on the findings and directions for further research are presented in Sect. 5.
2 Background Study Hariharan et al. [2], consideration of all the dataset’s characteristics during building a machine learning model will affect classification precision and processing speed. It will be efficient to evaluate just the essential qualities and exclude the redundant and unnecessary ones. Hybrid linear discriminant analysis (HLDA) was suggested as a means of dimensionality reduction. The Linear Discriminant Analysis (LDA) method was utilized to transform the dataset into a new subspace with c-1 features, where c was the number of classes in the dataset. Additionally, n strongly correlated characteristics were added to the modified dataset. Random Forest was used to improving classification accuracy for these author classification models. The findings for the three UCI datasets demonstrate that the suggested HLDA + R.F. approach significantly improves the presentation regarding precision, sensitivity, and specificity. Gupta and Janghel [10], fuzzy logic and artificial neural network-based classification methods are discussed in this research, along with a unique approach for dimensionality reduction and feature ranking. This study aimed to evaluate the accuracy of several breast cancer diagnostic and prognostic methods. In comparison, the principal component analysis and neural network method achieve a remarkable 96.58% diagnostic classification accuracy. Despite this, the gain ratio and chi-square rank characteristics have a greater accuracy (85.78%) than other techniques. The results mentioned above may be used to model a Medical Expert System to automate the diagnosis of illnesses, and they offer a solid foundation for the automated diagnosis of more diseases and high-dimensional medical datasets. Sharma and Saroha [15], eliminating unnecessary and minor characteristics from massive datasets improves classification and other data mining operations. PCA was
434
S. Rajeashwari and K. Arunesh
the most used approach for dimensionality reduction, although it does not give a subset of genuine features. In contrast, filters and wrappers could locate a smaller subset of characteristics but were computationally costly. The proposed method allows Principal Component Analysis (PCA) to be used for subset selection by rating and ranking features to ensure accuracy is maintained (i.e., it does not decrease below the accuracy attained when all features are used). Prakash and Rajkumar [19], various classifications of chronic diseases have been presented in this study. Dimensionality reduction and classification were the two critical procedures established to get the highest classification accuracy possible. A novel hybrid local fisher discriminant analysis (HLFDA) has been developed for dimensionality reduction, and a T2FNN has been constructed for classification. Comparisons have been made between the classification and dimensionality reduction method and another algorithm. Dash et al. [21], currently, intelligent data analysis techniques utilize traditional machine learning, extensively employed in artificial intelligence, encompassing many subfields. There were several applications of classical machine learning in various scientific and development studies. In this study, the author concludes that traditional machine learning techniques improved the efficiency of intelligent data analysis. This scenario modifies complex reasoning’s use of intelligent data analysis. The evolution of machine learning expands the framework for analyzing sets of functional data. The conventional interpolation and extrapolation scheme employ classical machine learning to discover new hidden empirical regularities in intelligent data analysis methods employing new rules or laws. Giang et al. [23], the authors present a classification model for chronic disease patients utilizing a unified dataset comprised of three categories of chronic disease patient data. The findings demonstrated that these author model precision and dependability were superior to those of models based on a single data type. In reality, in addition to the three categories listed above, numerous more forms of data about chronic disease patients were investigated and retrieved. Zhang et al. [24], traditional dimensionality reduction techniques, such as feature selection or feature extraction, were being considered locally. Using information theory, mutual information, and a maximum spanning tree, this research suggested a tree-based method. In contrast to conventional approaches, the suggested algorithm would solve the dimensionality reduction issue via global rather than local consideration.
3 Materials and Methods The Highly Correlated Linear Discriminant Analysis (HCLDA) ensemble approach of dimensionality reduction was presented to improve the chronic disease dataset classification efficiency. The dataset has normalized using improved decision tree, and the features are selected by using WBB algorithm. GKFCM with PSO has
Highly Correlated Linear Discriminant Analysis for Dimensionality …
435
improved the clustering. Finally, Linear Discriminant Analysis is used in HCLDA to reduce an n-by-n dataset to a c-by-c1 subspace, where c is the number of classes.
3.1 Datasets The records are collected from the Kaggle repository. From the UCI medical benchmark datasets, the Wisconsin breast cancer dataset, chronic kidney disease dataset, diabetes dataset, and heart disease dataset were selected to demonstrate the effectiveness of the proposed approach. The dataset is named breast_cancer.csv, heart.csv, kidney_disease.csv, diabetes.csv. The datasets have memory sizes of 123 KB, 12 KB, 48 KB, and 24 KB.
3.2 Pre-processing Researchers and practitioners recognize that data pre-processing is critical for effectively applying data mining technologies to a database. The techniques to remove the cattery information or impute (fill in) the missing information are standard processes in data pre-processing.
3.2.1
Improved Decision Tree
The C4.5 algorithm [21] is used in Data Mining as a Decision Tree is by far the most widely used method. The classification of instances according to their characteristics is represented in a tree-like structure in the decision tree model [22]. A set of conditional probability distributions in feature and class spaces and a collection of if–then rules. Attribute A information gain: Gain(A) = Info(D) − Info A(D).
(1)
Pre-processing information entropy Info(D) = Entropy(D) = −
∑
p( j|D) log p( j/d).
(2)
j
Distribution information entropy Info A(D) =
v ∑ ni i=1
n
Info(Di ),
(3)
436
S. Rajeashwari and K. Arunesh
where D is denoted as dataset; the dataset has gained the list of attributes using Eq. (1), and the pre-processing of dataset using entropy with the corresponding rows and columns, p denotes rows, and d denotes the column data which is represented in Eq. (2). Finally, the missing values analyzed using Eq. (3) as distribution of data while normalization of dataset attributes with the n amount of data.
3.3 Feature Selection Using Weighted Binary Bat Algorithm (WBBA) Feature selection identifies and eliminates as many unsuitable and redundant features as possible. The objective is to obtain a set of options that adequately describe the initial issue. This subset is commonly used for training learners, and the specialized literature reports additional benefits. Feature selection will eliminate inapplicable and redundant features that can lead to accidental correlations in learning algorithms, diminishing their generalizability. The data’s location is stored as binary digits in the Binary Bat algorithm (BBA) [23]. Since this is the case, they have difficulty venturing outside the solution space. On the other hand, there could be an issue with premature convergence. So, this work moves to weighted binary bat algorithm with the fitness value. To relocate them to unexplored regions of the search space, this article designed the following mutation operator: xi j = {xi j if rand(i ) ≤ rm xi j } otherwise,
(4)
where r m denotes the likelihood of random mutation, the bits are mutated with probability rm once the data’s location is updated according to Eqs. (3) and (4). Each bit in the data will have at least one of its bits flipped, and the mutation probability (r m ) will be 1/d, where d is the number of dimensions in the dataset. Data assigned random 0 s and 1 s make up the beginning population. It produces data that have been examined extensively. The data are subjected to random mutations. Each iteration uses the transfer function to refine the location. The data are re-mutated to increase their ability to explore their environment. Algorithm 1 Pseudocode of WBBA Input: r = 0.9, A = 0.5, Number of data’s = 10, max = 50, rm Begin initialize the data population Apply mutation to data’s according to Eq. (4) fitness1 = Initial data’s’ fitness is calculated using an Eq. (3) min_fit = A data with a minimum level of fitness gbest = data with minimum conditioning while (t < max) adjusts the frequency and velocity calculate transfer function
Highly Correlated Linear Discriminant Analysis for Dimensionality …
437
if (T < rand), then generates new data’s endif (rand > r), then update newdata with gbest end Enhance new data utilizing mutation operator per Eq. (4) fitness2 = The fitness of young data’s is computed using an Eq. (3) if (fitness1 < fitness2 && rand > A) then Update initial data and decrease volume while increasing pulse rate end if (fitness2 < min_fit) then update gbest end end if (fitness1A), then update the initial data, reduce the loudness, and increase the pulse rate.
3.4 Clustering Using GKFCM with PSO Algorithm The following steps detail the PSO algorithm [24]: Step 1: Initiate the position and the velocity of the particle randomly. For every particle, randomly select K cluster centroids. Step 2: Evaluate every particle’s fitness for the preliminary values. Step 3 Optimally conserve the structure of the set of documents. Within-cluster is given in Eq. 5. Between-cluster is given in Eq. 6. The mixture scatters matrix is given in Eq. 7. All three mentioned above can be used to measure the quality of a cluster. A cluster is considered high-quality if grouped closely but well-separated from the others. Equation 5 describes the within-cluster, and Eq. 6 describes the similarity measures among clusters. This operation retains the framework of clusters. Also, Eq. 7 ismaximized. Cw =
k ∑ ∑
(d j− cci )(d j− cci )T ,
(5)
(c j− pc)(c j − pc)T ,
(6)
i=1 j=Ni
Cb =
k ∑ ∑ i=1 j£Ni
Cm =
n ∑ j=1
(d j − pc)(d j − pc)T ,
(7)
438
S. Rajeashwari and K. Arunesh
where cci is the cluster and d j is the document that resides in it. cci is the ith clustercentroid, pc is the process cluster-centroid, N i is the total no. of documents in the cluster, and K is the total no. of clusters. C b is between clustering, and C w is within the cluster. C m is the mixed scatter matrix and is the total of C w and C b . C b evaluates the particle’s fitness value. Clustering’s main objective is to reach low inter-cluster and high intra-cluster similarity. Equation 7 finds the best position for the particle hitherto. Equation 6 finds the particle’s best position in the region. Equation 5 applies velocity updates for every dimension of the particle. Equation 7 positions and creates the location of the new particle. Reapply steps 2 to 6 till a criterion for stopping is met. These can be either when a sufficient solution is found or completing the highest number of generations. The optimum solution is the particle from the population with the best fitness value. Evaluate the documents’ relevancy. HCLDA architecture diagram is shown in Fig. 1. Algorithmic steps for Gaussian Kernel Fuzzy C-Means clustering Process 1: take data points di and dj with the smallest distance in D to meet d(di , dj ) < e, and combine R1 = ∅ with di and dj . Then R1 + {di , dj }→R1 , and D-{di , dj }→D. Process 2: take the mean value of di and dj by (di + dj )/2 as a new data point, and identify data point di in Set D whose mean value is less than e. Process 3: take {di , dj , dl } as a new data point with a value of ((di + dj )/2 + dl )/ 2. Repeat Process 2 until no new data points can be found, and form new sets F1 and D.
Medical Datasets
Standardization, Filtering, Normalization, Preprocessing using Improved Decision Tree
Clustering GKFCM with PSO Optimization
Fig. 1 HCLDA architecture diagram
Preprocessing
Feature Extraction Weighted Binary Bat (WBB)
Classification and Dimensionality Reduction Using HCLDA
Performance Metrics
Highly Correlated Linear Discriminant Analysis for Dimensionality …
439
Process 4: repeat Steps 1 to 3 for Set D form Set S, including R1… Rm and D, where in the final mean values of data points in R1 …Rm are taken as new data points, respectively. Process 5: let data points in S be nodes in the graph based on graph theory and the connecting line of nodes be the distance. If the distance is greater than y, then the connecting line is deleted, thus forming a connected network graph [20]. Algorithm 2 Highly Correlated Linear Discriminant Analysis (HCLDA) using Dimensionality Reduction The datasets are normalized and the features are selected using WBB algorithm. Then, the dataset has clustered using PSO with GKFCM. Finally, the clustered output has classified by using HCLDA algorithm. The classification steps are given with the feature class 1. Let X be the data matrix with n features given a set of ınstances ⎡
a(1, 1) a(1, 2) . . . a(1, N) ⎢ a(2, 1) a(2, 2) . . . a(2, N) ⎢ ⎢ X = ⎢ a(3, 1) a(2, 2) . . . a(3, N) ⎢ ⎣ ··· a(l, 1) a(l, 2) . . . a(l, N)
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(8)
2. Calculate the average value of each class µj =
1 ∑ xi n j x ∈c
(9)
1∑ xi l i=1
(10)
i
j
3. Include the total mean of all data µ=
4. Calculate the ıntermediate class matrix Sm =
c ∑
n k (µk − µ)(µk − µ)T
(11)
k=1
5. for all classes i, i = 1, 2, …., c, do 5.1 Calculate the Scatter matrix within the class Sc =
c c ∑ ∑ (xi − µk )(xi − µk )iT k=1 i∈ck
(12)
440
S. Rajeashwari and K. Arunesh
5.2 For each class, create a transformation matrix (ci ) using ci = Sci − Sm
(13)
5.3 For each transformation matrix, determine the eigenvalues and eigenvectors and then order the eigenvectors in decreasing order of the associated eigenvalues. 5.4 The primary k eigen vectors are used to translate each class instance into a low-dimensional space. 5.5 End of for loop 6. Calculate the correlation between each feature and the class feature, then order the results by decreasing the correlation. 7. Select the best n highly associated features, and then incorporate them into the modified dataset. 8. As a result, return the dataset with the reduced dimensions as a classified result.
4 Results and Discussion The proposed HCLDA method has implemented by using Python programming language. To show case the efficacy of the proposed method, this article chose to use the UCI medical benchmark datasets, the Wisconsin breast cancer dataset, the chronic kidney disease dataset, the diabetes dataset, and the heart disease dataset. The Wisconsin dataset for breast cancer has 32 characteristics and569 cases, the dataset for chronic kidney disease has 26 characteristics and 458 cases, the dataset for diabetics has ten characteristics and 769 cases, and the dataset for heart disease has 14 characteristics and 304 cases. Accuracy, sensitivity, and specificity were the threeperformance metrics used to validate the proposed HCLDA performance. These metrics best depict the categorization capability of the suggested techniques. The accuracy of the classifier refers to its capacity to generate accurate prediction. The result was analyzed using the framework of many classifications. The classification data were partitioned into two classification issues, and the one-versus-the-rest approach was applied. The accuracy has compared with four different methods like LDA with RF, PCA with RF, CORR with RF, and HCLDA algorithm. The accuracy comparison chart has shown in Fig. 2. X-axis denotes the chronic disease datasets and Y-axis denotes the percentage. Performance analysis of the proposed work is given in Tables 1, 2, and 3. The sensitivity has compared with four different methods like LDA with RF, PCA with RF, CORR with RF and HCLDA algorithm. The sensitivity comparison chart is shown in Fig. 3. X-axis denotes the datasets, and Y-axis denotes the percentage. The specificity has compared with four different methods like LDA with RF, PCA with RF, CORR with RF and HCLDA algorithm. The specificity comparison chart is shown in Fig. 4. X-axis denotes the chronic disease datasets, and Y-axis denotes the percentage.
Highly Correlated Linear Discriminant Analysis for Dimensionality …
441
Breast Cancer 100% 80% 60% 40% 20% 0% Accuracy HCLDA
Sensitivity
LDA+RF
Specificity
PCA+RF
CORR+RF
Fig. 2 Performance measure comparison of breast cancer dataset Table 1 Comparison of accuracy with proposed work Methods used (%) Breast cancer (%) Kidney disease (%) Heart disease (%) Diabetics (%) HCLDA
98.12
96.84
93.34
97.27
LDA + RF
96.69
95.25
91.67
96.06
PCA + RF
81.57
94.02
86.67
91.57
CORR + RF
95.61
94.75
81.67
94.67
Table 2 Sensitivity performance analysis of proposed work Methods used
Breast cancer (%)
Kidney disease (%)
Heart disease (%)
Diabetics (%)
HCLDA
98.99
98.77
93.47
92.27
LDA + RF
98.61
98.67
91.30
89.89
PCA + RF
82.24
98.37
81.69
87.57
CORR + RF
96.61
97.46
89.57
90.81
Table 3 Specificity performance analysis of proposed work Methods used
Breast cancer (%)
Kidney disease (%)
Heart disease (%)
Diabetics (%)
HCLDA
96.61
90.67
92.85
92.85
LDA + RF
96.59
81.34
90.04
90.04
PCA + RF
77.57
71.18
71.18
62.28
CORR + RF
86.09
82.34
82.34
64.85
442
S. Rajeashwari and K. Arunesh
Kidney Disease 100% 80% 60% 40% 20% 0% Accuracy
Sensitivity
HCLDA
LDA+RF
PCA+RF
Specificity CORR+RF
Fig. 3 Performance measure assessment of chronic kidney disease dataset
Diabetics 100% 80% 60% 40% 20% 0% Accuracy HCLDA
Sensitivity
LDA+RF
PCA+RF
Specificity CORR+RF
Fig. 4 Performance measure assessment of diabetics dataset
Figures 3, 4, and 5 show how the four datasets fared in the analysis of the proposed approach. Comparing the Wisconsin dataset with LDA and RF, the proposed HCLDA technique achieves a sensitivity of 98.99%, 98.77%, 93.47%, and 92.27%. Other techniques include PCA and RF with a sensitivity of 82.24% and CORR and RF with a sensitivity of 96.61%. Comparing the sensitivity of the proposed method with that of LDA and RF, which was 96.61% for the breast chronic disease dataset, the sensitivity of the proposed method is 96.61%. Similarly, HCLDA provides superior results compared to other performance indicators. Similarly, the proposed heart disease dataset categorization process outperforms competing approaches. The classification methodology’s superior performance is further demonstrated in diabetes-related datasets. Compared to other methods, the results demonstrate that the proposed classification model combining Highly Correlated Linear Discriminant
Highly Correlated Linear Discriminant Analysis for Dimensionality …
443
Heart Disease 100% 80% 60% 40% 20% 0% Accuracy HCLDA
Sensitivity LDA+RF
Specificity
PCA+RF
CORR+RF
Fig. 5 Performance measure assessment of heart disease dataset
Analysis (HCLDA) and Random Forest (RF) provide superior accuracy, sensitivity, and specificity.
5 Conclusıon This article proposed HCLDA framework for dimensionality reduction in medical datasets. The ML model’s classification accuracy and processing speed performance are affected by considering all dataset characteristics during model building. It is practical to consider crucial aspects and ignore repetitive and irrelevant features. The LDA methodology used to transform the dataset into a new subspace with c-1features (where c is the number of classes in the dataset) is a component of the proposed HCLDA dimension reduction method. Additionally, the updated dataset contains n highly correlated features. By using the proposed model HCLDA and Random Forest for classification were able to increase precision. All four UCI datasets normalized and feature selected using IDT and WBB algorithm and clustering using GKFCM with PSO. Finally, the classification and DR has done with HCLDA. The HCLDA will be implemented in the future machine learning approaches and artificial neural networks for dimensionality reduction.
References 1. Crawford B, Soto R, Olea C, Johnson F, Paredes F (2015) Binary bat algorithms for the set covering problem. In: 2015 10th Iberian conference on information systems and technologies (CISTI), June 2015, pp 1–4. https://doi.org/10.1109/CISTI.2015.7170537 2. Hariharan B, Prakash PNS, Anupama CG, Siva R, Kaliraj S, WB NR (2022) Dimensionality reduction based medical data classification using hybrid linear discriminant analysis. In: 2022
444
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
S. Rajeashwari and K. Arunesh 3rd International conference on electronics and sustainable communication systems (ICESC). https://doi.org/10.1109/icesc54411.2022.9885728(1) Emmanuel BS (2021) Improved approach to feature dimension reduction for efficient diagnostic classification of breast cancer. In: 2021 1st International conference on multidisciplinary engineering and applied science (ICMEAS). https://doi.org/10.1109/icmeas52683.2021.969 2388 Pamukçu E, Bozdogan H, Çalık S (2015) A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification. Comput Math Methods Med 2015:1–14. https://doi.org/10.1155/2015/ 370640 Ma’ruf FA, Adiwijaya, Wisesty UN (201) Analysis of the influence of minimum redundancy maximum relevance as dimensionality reduction method on cancer classification based on microarray data using Support Vector Machine classifier. J Phys Conf Ser 1192:012011. https:// doi.org/10.1088/1742-6596/1192/1/012011 Damgacioglu H, Celik E, Celik N (2019) Estimating gene expression from high-dimensional DNA methylation levels in cancer data: a bimodal unsupervised dimension reduction algorithm. Comput Ind Eng 130:348–357. https://doi.org/10.1016/j.cie.2019.02.038 Sharma H, Panda RR, Nagwani NK (2021) SPIN: a novel hybrid dimensionality reduction technique for cervical cancer risk classification. In: 2021 8th international conference on signal processing and integrated networks (SPIN). https://doi.org/10.1109/spin52536.2021.9565941 Xie H, Li J, Zhang Q, Wang Y (2016) Comparison among dimensionality reduction techniques based on random projection for cancer classification. Comput Biol Chem 65:165–172. https:// doi.org/10.1016/j.compbiolchem.2016.09.010 Gandhimathi K, Umadevi N (2021) Type 2 diabetes mellitus prediction model using PSOGaussian Kernel-based FCM and PDF with recurrent NN. In: 2021 IEEE 6th international conference on computing, communication and automation (ICCCA). https://doi.org/10.1109/ iccca52192.2021.9666236 Gupta K, Janghel RR (2018) Dimensionality reduction-based breast cancer classification using machine learning. Comput Intell Theor Appl Future Dir I:133–146. https://doi.org/10.1007/ 978-981-13-1132-1_11 Nirmalakumari K, Rajaguru H, Rajkumar P (2020) Performance analysis of classifiers for colon cancer detection from dimensionality reduced microarray gene data. Int J Imaging Syst Technol 30(4):1012–1032. https://doi.org/10.1002/ima.22431 Ebrahimpour MK, Mirvaziri H, Sattari-Naeini V (2017) Improving breast cancer classification by dimensional reduction on mammograms. Comput Methods Biomech Biomed Eng Imaging Vis 1–11. https://doi.org/10.1080/21681163.2017.1326847 Shi M, Wang J, Zhang C (2019) Integration of cancer genomics data for tree-based dimensionality reduction and cancer outcome prediction. Mol Inform 39(3):1900028. https://doi.org/10. 1002/minf.201900028 Zhao M, Tang Y, Kim H, Hasegawa K (2018) Machine learning with K-means dimensional reduction for predicting survival outcomes in patients with breast cancer. Cancer Inform 17:117693511881021. https://doi.org/10.1177/1176935118810215 Sharma N, Saroha K (2015) A novel dimensionality reduction method for cancer dataset using PCA and feature ranking. In: 2015 International conference on advances in computing, communications and informatics (ICACCI). https://doi.org/10.1109/icacci.2015.7275954(9) Rafique O, Mir AH (2020) Weighted dimensionality reduction and robust Gaussian mixture model-based cancer patient subtyping from gene expression data. J Biomed Inform 112:103620. https://doi.org/10.1016/j.jbi.2020.103620 Chen P (2021) The application of an improved C4.5 decision tree. In: 2021 7th annual international conference on network and information systems for computers (ICNISC). https://doi. org/10.1109/icnisc54316.2021.00078 Gang P et al (2018) Dimensionality reduction in deep learning for chest X-ray analysis of lung cancer. In: 2018 Tenth international conference on advanced computational intelligence (ICACI). https://doi.org/10.1109/icaci.2018.8377579
Highly Correlated Linear Discriminant Analysis for Dimensionality …
445
19. Prakash PNS, Rajkumar N (2021) Improved local fisher discriminant analysis-based dimensionality reduction for cancer disease prediction. J Ambient Intell Humaniz Comput 12(7):8083–8098. https://doi.org/10.1007/s12652-020-02542-6(13) 20. Chouhan R, Purohit A (2018) An approach for document clustering using PSO and K-means algorithm. In: 2018 2nd International conference on inventive systems and control (ICISC). https://doi.org/10.1109/icisc.2018.8399034 21. Dash SS (2021) Intelligent computing and applications. Springer Singapore.https://doi.org/10. 1007/978-981-15-5566-4(4) 22. Gavankar SS, Sawarkar SD (2017) Eager decision tree. In: 2017 2nd International conference for convergence in technology (I2CT). https://doi.org/10.1109/i2ct.2017.8226246 23. Giang TT, Nguyen TP, Tran DH (2017) Stratifying cancer patients based on multiple kernel learning and dimensionality reduction. In: 2017 9th International conference on knowledge and systems engineering (KSE). https://doi.org/10.1109/kse.2017.8119443(16) 24. Zhang X, Chang D, Qi W, Zhan Z (2019) Tree-like dimensionality reduction for cancerinformatics. IOP Conf Ser Mater Sci Eng 490:042028. https://doi.org/10.1088/1757-899x/ 490/4/042028(18)
Identification of Brain Tumor Images Using a Novel Machine Learning Model Y. Mahesha
Abstract In today’s world, manually examining a large number of magnetic resonance brain images for the identification of brain tumor is a time-consuming task. It may negatively influence the patient’s medical therapy. Since normal tissue and brain tumor cells have a lot in common in terms of appearance, segmenting tumor regions can be difficult. Hence, an automatic brain tumor identification model is required. In this paper, a novel convolutional neural network model has been envisioned to identify brain tumor images. The proposed model has been evaluated on two datasets Br35H and BraTS. The proposed model has been compared with the state-of-the-art method VGG-16. The envisioned model has achieved accuracy of 97.2% and 97% with Br35H and BraTS datasets, respectively. Keywords Brain tumor · BraTS · Br35H · Magnetic resonance imaging · Convolutional neural network · VGG-16
1 Introduction Brain tumor is a grouping of aberrant brain cells. The brain is covered by the rigid skull, and hence, it leads to severe health problems if any form of irregular growth occurs in the brain. Tumor is either cancerous or noncancerous [2]. As the brain tumor grows, it causes pressure inside the skull which results in brain damage and it may lead to death. To test the presence of brain tumors, doctors depend on computed tomography (CT) scan, magnetic resonance imaging (MRI), Angiography, Skull Xray, and Biopsy. Among these methods, the MRI scan is considered the best method for finding a tumor in the brain [1]. In MRI, special color is used by the doctor to detect tumors. MRI does not use radiation, whereas CT scan depends on the radiation. This paper focuses on the identification of brain tumors in MRI images using deep learning tools. In deep learning, a convolutional neural network (CNN) can be adopted for Y. Mahesha (B) ATME College of Engineering, Mysore, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_30
447
448
Y. Mahesha
assessing visual information. In CNN, since parameters are shared, the number of parameters used by the CNN becomes less. Hence, CNN is usually preferred over the other kinds of neural networks for extracting features and for classification. The present article uses CNN to detect brain tumors.
2 Literature Review CNN models can be applied to solve numerous issues in a variety of fields [3, 4]; however, its image processing performance for health applications is exceptional. A CNN with a neutrosophic is being investigated for the detection of brain tumors [5]. In this approach, 160 images were used to train and test the system. Here, CNN extracts features, while support vector machine (SVM) and K-Nearest Neighbors (KNN) are utilized for classification in this hybrid technique. This method has achieved 95.62% accuracy. Another study found that brain tumors can be diagnosed using both handcrafted features and deep learner features [6]. In this approach, a transfer learning model is used to acquire features, while form and texture are manually retrieved. The classifier is fed entropy and fused vectors for classification. Another study used CNN and transfer learning to classify brain tumors [7]. In this experiment, pre-trained GoogLeNet is utilized to extract features, and existing classifiers are utilized for classification. This approach has achieved 98% accuracy. For brain tumor classification, CNN is trained on the addition of large amounts of data [8]. A deep learning technique has been used to separate the tumor part in the proposed system. The study used a pre-trained CNN model to assess the system’s performance on both original and supplemented data. A hyper-column approach is used to design CNN architecture in a study [9]. In this approach, before transmitting input to CNN layers, an attention module detects the area of interest. The proposed system has a precision of 96.05%. A CNN model has been utilized to segment brain tumors in MRI [10]. Here, we compare various techniques, including clustering, classical classifiers such as Multilayer Perceptron (MLP), SVM, Logistic Regression, KNN, Random Forest, and Naive Bayes, along with the outputs of CNN. Among these, CNN has shown the best performance of all the classifiers with an accuracy rate of 97.87%. For brain tumor detection, a fusion procedure is used to combine textural and structural information from the four MRI sequences [11]. The Discrete Wavelet Transformation (DWT) is used in the fusion process. More information from the tumor region is retrieved using the Daubechies wavelet. CNN is used to classify tumor and nontumor areas after preprocessing. The results show that merged images function better. In another study, different hyper-parameters are used to determine the architecture of CNN models [12]. Deep learning models perform better than traditional methods, according to the findings. A new architecture for CNN is created for classifying benign tumors in another similar method [13]. The accuracy of several models is said to range between 96 and 99%.
Identification of Brain Tumor Images Using a Novel Machine Learning …
449
Long Short-Term Memory (LSTM) was used to distinguish healthy brain tissues from a brain tumor [14]. On an MRI signal dataset, various augmentation approaches are used to train stacked Bi-LSTM. The proposed technique achieves an average accuracy of 91.46% using fivefold cross-validation. A multiscale deep CNN [15] has been proposed for analyzing tumor images and classifying the images as glioma, meningioma, or pituitary tumors. A dataset of 3000 images is considered to check the model performance. The planned CNN’s classification accuracy is reported to be 97.3%. ResNet-50 is a deep neural network that was developed using 3000 images from three different datasets [16]. With the use of a key performance matrix, the model’s performance is assessed. The suggested model obtains an average accuracy of 97% for non-augmented and augmented data. In a different approach, eight CNN models in a CAD system of brain tumors were built and trained using brain MRI [17]. The accuracy of CNN models ranges from 90 to 99%. To extract characteristics from brain MRIs, a 3D CNN model is proposed [18]. A correlation-based model is used to choose the best features from the CNN features, and ANN has been applied to classify them. For three separate datasets, the suggested technique achieves the accuracy of 92.67%, 96.97%, and 98.32%. From the literature survey, it is found that CNN can be used to classify brain tumor images from healthy brain images. Hence, this paper proposed and implemented a CNN model to identify brain tumor images.
3 Material and Methods 3.1 Training and Evaluating the CNN Model A CNN model has been trained as shown in Fig. 1 using a set of brain images which includes both healthy brain and unhealthy brain images. The experiment has been conducted on the dataset Br35H. It consists of 1500 positive and negative images each. To train the model, a split ratio of 80:20 is applied. Samples of healthy and unhealthy brain images are presented in Figs. 2 and 3, respectively. The performance of VGG-16 also has been evaluated, and its results are analyzed.
Dataset
Train set (80%) and Test set (20%)
Fig. 1 Training and evaluation of the CNN model
Proposed CNN model
450
Y. Mahesha
Fig. 2 Healthy brain images
Fig. 3 Brain images with tumor
3.2 Proposed Model The structure of the proposed model is presented in Figs. 4 and 5. The architecture consists of two convolutional layers and two fully connected layers. Two activation functions, namely ReLU [19] and Sigmoid [20], are used by the layers. Following is the workflow of the proposed CNN model: (i) Input image of size 224 × 224 passes through the first convolution layer where 32 filters of size 3 × 3 are applied to the image followed by the ReLU activation function to generate a feature map. (ii) The Max Pool layer is then applied to the resulting feature map for dimensionality reduction which enhances the functionality of feature map. The output of the pooling layer passes through the second convolutional layer where 64 filters of size 3 × 3 are applied which is also followed by ReLU to generate a feature map. (iii) The generated feature map passes through the first fully connected layer after the Max Pool layer. The second fully connected layer acts as the output layer which includes the sigmoid activation function. The output layer is followed by binary cross-entropy [21] for classification. The expression for ReLU activation is given in Eq. (1), and it is represented in the graph as shown in Fig. 6. R(z) = max(0, z)
(1)
Identification of Brain Tumor Images Using a Novel Machine Learning …
Output layer, Sigmoid
Fully connected layer, ReLU
Max Pool
2D Convolution, 64 x 3 x 3, ReLU
Max Pool
2D Convolution, 32 x 3 x 3 Filters, ReLU Fig. 4 Architecture of proposed model
Fig. 5 Internal structure of proposed model
451
452
Y. Mahesha
Fig. 6 ReLU activation function
The expression for the sigmoid activation function is given in Eq. (2), and it is represented in the graph as shown in Fig. 7. σ (z) =
1 1 + e−z
(2)
Cross-entropy given in Eq. (3) has been used to calculate the error. LC E = −
2
ti log( pi ),
(3)
i=1
where ti represents truth label and pi represents softmax probability with respect to ith class. Fig. 7 Sigmoid function
Identification of Brain Tumor Images Using a Novel Machine Learning …
453
4 Results 4.1 Evaluation of the Performance of Proposed Model The accuracy of the proposed model across different epochs is shown in Fig. 8, and the loss during the training process is shown in Fig. 9. The maximum accuracy achieved by the proposed model after 50 epochs is 97.5%. ROC has been plotted, and AUC has been evaluated for the proposed model as presented in Fig. 10. The AUC value generated is 0.955, which indicates that the proposed model is a good discriminator. Fig. 8 Accuracy across different epochs by the proposed model
Fig. 9 Loss across different epochs in the proposed model
454
Y. Mahesha
Fig. 10 ROC and AUC for VGG-16 model
4.2 Evaluation of the Performance of VGG-16 Model The accuracy and loss graph is shown in Figs. 11 and 12, respectively. The maximum accuracy obtained by the VGG-16 model is 96%. The ROC has been plotted and AUC has been evaluated for the VGG-16 model as presented in Fig. 13. The AUC value obtained is 0.944, which indicates that VGG-16 can be considered to identify brain tumor images. To avoid bias in the selection of test set, fivefold cross-validation has been carried out, and the result obtained is presented in Table 1. The accuracy attained by the proposed CNN model is 97.2% and by the VGG-16 is 96.12%. Fig. 11 Accuracy across different epochs by the proposed model
Identification of Brain Tumor Images Using a Novel Machine Learning … Fig. 12 Loss across different epochs in the proposed model
Fig. 13 ROC and AUC for VGG-16 model
Table 1 Fivefold cross-validation
Iteration
Accuracy VGG-16 (%)
Proposed model (%)
1
96
97.5
2
96
97.0
3
95.5
96.5
4
96.6
98.0
5
96.5
97.0
Average
96.12
97.2
455
456 Table 2 Performance evaluation of different datasets
Y. Mahesha
Datasets
Accuracy VGG-16 (%)
Proposed model (%)
Br35H
96.12
97.2
BraTS
95.4
97.0
The experiment has been conducted on another dataset known as BraTS, and the result obtained is presented in Table 2.
5 Conclusion A CNN model has been proposed to identify brain tumor images. The performance of proposed model has been analyzed with the advanced model VGG-16. The model has been evaluated using the datasets Br35H and BraTS. With Br35H, after fivefold cross-validation, the VGG-16 model has attained accuracy of 96.12%, and proposed model has achieved an average accuracy of 97.2%. With BraTS, the average accuracy of the VGG-16 and proposed model is 95.4% and 97.0%, respectively. The result obtained shows that the proposed CNN model has achieved desirable accuracy in the identification of brain tumor images, and the model outperformed state-of-the-art method VGG16.
References 1. Mondia MWL, Espiritu AI, Jamora RDG (2020) Primary brain tumor research productivity in southeast Asia and its association with socioeconomic determinants and burden of disease. Front Oncol 10:607777 2. Majid K, Sepideh D, Zahra K, Fatemeh N, Parvin M (2015) Brain tumors: special characters for research and banking. Adv Biomed Res 4(4) 3. Naseer A, Zafar K (2018) Comparative analysis of raw images and meta feature based Urdu OCR using CNN and LSTM. Int J Adv Comput Sci Appl 9(1):419–424 4. Naseer A, Zafar K (2019) Meta features-based scale invariant OCR decision making using LSTM-RNN. Comput Math Organ Theory 25(2):165–183 5. Özyurt F, Sert E, Avci E, Dogantekin E (2019) Brain tumor detection based on convolutional neural network with neutrosophic expert maximum fuzzy sure entropy. Measurement 147:106830 6. Saba T, Mohamed AS, El-Affendi M, Amin J, Sharif M (2020) Brain tumor detection using fusion of hand crafted and deep learning features. Cogn Syst Res 59:221–230 7. Deepak S, Ameer P (2019) Brain tumor classification using deep CNN features via transfer learning. Comput Biol Med 111:103345 8. Sajjad M, Khan S, Muhammad K, Wu W, Ullah A, Baik SW (2019) Multi-grade brain tumor classification using deep CNN with extensive data augmentation. J Comput Sci 30:174–182 9. To˘gaçar M, Ergen B, Cömert Z (2020) Brainmrnet: Brain tumor detection using magnetic resonance images with a novel convolutional neural network model. Med Hypotheses 134:109531
Identification of Brain Tumor Images Using a Novel Machine Learning …
457
10. Hossain T, Shishir FS, Ashraf M, Nasim MMA and Shah FM (2019) Brain tumor detection using convolutional neural network. In: 1st International conference on advances in science, engineering and robotics technology (ICASERT), pp 1–6, Dhaka, Bangladesh 11. Amin J, Sharif M, Gul N, Yasmin M, Shad SA (2020) Brain tumor classification based on DWT fusion of MRI sequences using convolutional neural network. Pattern Recogn Lett 129:115–122 12. Alfonse M, Salem ABM (2016) An automatic classification of brain tumors through MRI using support vector machine. Egypt Comput Sci J 40(3) 13. Samanta AK and Khan AA (2018) Computer aided diagnostic system for automatic detection of brain tumor through MRI using clustering based segmentation technique and SVM classifier. In: International conference on advanced machine learning technologies and applications, pp 343–351, Cham 14. Dandıl E, Karaca S (2021) Detection of pseudo brain tumors via stacked LSTM neural networks using MR spectroscopy signals. Biocybernetics Biomed Eng 41(1):173–195 15. Díaz-Pernas FJ, Martínez-Zarzuela M, Antón-Rodríguez M, González- Ortega D (2021) A deep learning approach for brain tumor classification and segmentation using a multiscale convolutional neural network. In: Healthcare, vol 9, p 153, Multidisciplinary Digital Publishing Institute 16. Kumar RL, Kakarla J, Isunuri BV, Singh M (2021) Multiclass brain tumor classification using residual network and global average pooling. Multimedia Tools Appl 80(9):13429–13438 17. Kalaiselvi T, Padmapriya S (2021) Brain tumor diagnostic system—a deep learning application. Mach Vision Inspection Syst 2:69–90 18. Rehman A, Khan MA, Saba T, Mehmood Z, Tariq U, Ayesha N (2021) Microscopic brain tumor detection and classification using 3D CNN and feature selection architecture. Microsc Res Tech 84(1):133–149 19. Konstantin E, Schmidt-Hieber J (2019) A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Netw 110:232–242 20. Siqiu X, Xi L, Chenchen X, Houpeng C, Cheng C and Zhitang S (2021) A high precision implementation of the sigmoid activation function for computing-in-memory architecture. Micromachines 12(10) 21. Gordon-Rodriguez E, Loaiza-Ganem G, Geoff P and John PC (2011) Uses and abuses of the cross-entropy loss: case studies in modern deep learning. arXiv:2011.05231
A Study on Real-Time Vehicle Speed Measurement Techniques Prasant Kumar Sahu and Debalina Ghosh
Abstract On-road speed detection of vehicles is essential for various reasons, including enforcement of traffic rules and regulations, managing traffic flow, and road safety. Multiple techniques have been developed in the literature to detect the speed of vehicles on the road. This paper will discuss some of the most commonly used methods and their pros and cons. We have demonstrated the operation of a camerabased system for speed detection. Our proposed approach is designed based on a commercially available off-the-self camera and can reasonably accurately measure different speeds. Keywords Speed detection · Light detection and ranging (LIDAR) · Inductive loop · Radio detection and ranging (RADAR) · Camera-based system
1 Introduction Overspeeding is identified as a significant cause of traffic accidents. High-speed related road accidents can result in catastrophic consequences such as crashes, severe injuries, and loss of life. According to the World Health Organization (WHO) [1], over 90% of road traffic deaths occur in low- and middle-income countries, with the highest road traffic injury death rates recorded in the African region and the lowest in the European region. Speed contributes to approximately 30% of road deaths, even in high-income countries. In 2021, there were 412,432 reported cases of road accidents in India, resulting in 153,972 fatalities and 384,448 injuries, according to the Ministry of Road Transport and Highways, the Government of India. While India experienced a significant reduction in road accidents, fatalities, and injuries in 2020 due to the COVID-19 pandemic and nationwide lockdowns, fatalities increased by P. K. Sahu (B) · D. Ghosh School of Electrical Sciences, IIT Bhubaneswar, Arugul, Jatni, India e-mail: [email protected] D. Ghosh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_31
459
460
P. K. Sahu and D. Ghosh
1.9% in 2021 compared to the same period in 2019. Due to road accidents, India loses about 3–5% of its GDP each year. Although India accounts for only 1% of the world’s vehicle population, it faces approximately 6% of global road traffic incidents. Controlling vehicles’ speed can prevent crashes and reduce the impact when they occur, lessening the severity of injuries sustained by the victims. This proposed work aims to determine the performance of different techniques used for the on-road speed detection process, emphasizing a camera-based system.
2 Motivation According to the World Health Organization [1], road traffic accidents claim the lives of around 1.3 million people annually. At the same time, up to 50 million more individuals sustain non-fatal injuries, which can lead to disabilities. The economic impact of these accidents is significant for individuals, families, and countries; preventing and reducing road traffic accidents is an ongoing and crucial challenge that demands urgent action and sustained commitment to enhancing road safety. Real-time speed detection and monitoring systems can play a pivotal role in deterring drivers from exceeding speed limits, decreasing the probability of accidents, and improving overall road safety. These systems can also enable enforcement agencies to monitor and enforce speed limits effectively.
3 Related Work Over the past five years, multiple research projects and researchers have worked on various methods to reduce traffic-related fatalities and provide law enforcement with effective tools for measuring on-road vehicle speeds. One innovative approach presented by Czajewski and Iwanowski [2] is a visual-based method for detecting vehicle speed, which uses only a digital camera and computer to determine the speed of passing vehicles and identify license plate numbers. This system offers a more cost-effective and efficient alternative to expensive photo radar systems. The authors have used the vertical difference of their (vehicles) position in consecutive images to calculate the speed. However, the problem with the system was mainly about the camera positioning, as the speed calculation depends to a greater extent on the camera placement. Trivedi et al. [3] have recently published a paper on a vision-based technique for detecting vehicles and measuring their speed in real time. They have used image processing techniques like morphology and binary logical operations to handle unforeseen traffic scenarios, and their approach can adapt to vehicles of different sizes on the road. However, their system was unsuitable for detecting various types of vehicles. Quan [4] has suggested an intelligent infrastructure called Smart Road Stud (SRS), which can see vehicles, transmit wirelessly, self-powering, offer
A Study on Real-Time Vehicle Speed Measurement Techniques
461
traffic guidance, and detect vehicle speeds. Their system offered spot vehicle speed estimation of 93% accuracy, which can be further improved. Another paper [5] presents a unique method for detecting and tracking moving vehicles in a designated surveillance area. Furthermore, this approach also records the position of each vehicle as it travels through the site. LIDAR, which stands for light detection and ranging, is another commonly used technique for on-road speed detection, using laser beams to measure distance and calculate speed. The operation of LIDAR-based systems is mainly affected due to weather conditions or fake targets. The paper by Cypto and Karthikeyan [6] offers a deep learning-based automatic speed violation detection system with an accuracy rate of 98.8% for speed violation detection and 99.3% for license plate identification. Zhang et al. presented a tracking framework in [7] that uses roadside LIDAR to detect and track vehicles, intending to achieve precise vehicle speed estimation with an accuracy of 0.22 m/s. The use of LIDAR for speed detection may require a clear line of sight between the LIDAR device and the target vehicle, which may not always be possible in certain traffic situations. Overall, LIDAR can be an effective method for speed detection of on-road vehicles, but it may not be suitable for all situations and may require specialized equipment and training. LIDAR speed detection may not be suitable for all types of vehicles, e.g., vehicles with certain reflective surfaces, such as those with chrome or polished metal finishes, as these surfaces may reflect the laser beam, which may introduce inaccurate measurements of the speed. A magnetically coupled inductive loop sensing system for traffic with less lane discipline was proposed by Ali et al. [8]. Their system comprises a loop sensor wherein small inner loops are placed within a large outer loop and use the concept of synchronous detection. Although inductive loop detection is a widely used method for measuring vehicle speed, it has some limitations that should be considered. Inductive loop detectors require physical installation on the road surface, which can be costly and time-consuming. The authors of [9] have introduced a new algorithm called the Position-Based High-Speed Vehicle Detection Algorithm (PHVA), which leverages a vehicular cloud server to detect vehicles traveling at high speeds within a Vehicular Ad Hoc Network (VANET). However, the system’s operation relies on the Road Side Units (RSUs) and cloud server processing. The synchronization of RSUs and cloud servers is a significant challenge besides maintaining the units. In [11], the authors presented a method to compensate for camera vibrationinduced noise in uncalibrated-camera-based vehicle speed measurement systems by proposing a technique to eliminate noise resulting from displacement between incoming and background images. However, this scheme introduces processing overhead. In [12], the authors proposed a novel search method for possible orientations to improve the efficiency and accuracy of object detection from a moving camera. However, this system requires training many images, and the image-matching process introduces additional processing overhead. The authors in reference [13] used a sequence of images from an uncalibrated camera to calculate the traffic speed. They
462
P. K. Sahu and D. Ghosh
applied geometric relationships within the image and simplified the problem to onedimensional geometry using common sense assumptions. They utilized frame differencing to isolate moving edges and to track vehicles between frames; they implemented a tracking system. Moreover, they estimated the speed by using parameters derived from the distribution of vehicle lengths. The authors in reference [14] proposed a traffic sensor system that uses deep learning techniques to automatically track and classify vehicles on highways. They utilized a calibrated and fixed camera for this purpose. Additionally, a new traffic image dataset was created to train the models, including real traffic images captured in poor lighting or weather conditions and low-resolution images. The authors used You Only Look Once-based (YOLOv3 and YOLOv4) networks trained on the new traffic set. They have combined a simple spatial association algorithm with a more sophisticated Kanade–Lucas–Tomasi (KLT) tracker to follow the vehicles on the road. As a result of this, the system reliability mostly depends on the KLT tracker. In reference [15], the authors presented a modified template-matching technique that examines a target’s color pixels to identify a vehicle’s license plate. The method employs a modified strip search to locate the standard color-geometric template used in Iran and some European countries. Additionally, a periodic strip search is utilized to determine each pixel’s hue. However, environmental factors significantly affected the system’s functionality. A thorough examination of computer vision for traffic videos, including critical analysis and future research prospects, was presented in [16]. The authors proposed a fresh research direction based on the critical survey centered on developing robust combined detectors and classifiers for all road users. The emphasis is on realistic evaluation conditions. The authors of [17] introduced a technique to triangulate a vehicle’s location along its path and estimate its speed using time and trajectory data. Their experimental results showed that the proposed method surpassed the current state of the art, with a mean error of about 0.05 km/h, a standard deviation of under 0.20 km/h, and a maximum absolute error of less than 0.75 km/h. The drawback of their proposed scheme includes the complexity of creating a triangulate for many vehicles traveling simultaneously on the road. The authors introduced an analytical model for gauging the velocity of a moving vehicle using an off-the-shelf video camera [18]. While the proposed system is adept at measuring vehicle speed, it necessitates prior knowledge of the surrounding environment. In their study in [19], the authors utilized the image scale factor in pixels to estimate the distance between a camera and a vehicle across various frames in a video. This approach enabled them to calculate the average speed of the vehicle. To validate their findings, the researchers compared the velocity of a vehicle moving at a known speed. The image scale factor in pixels could be more reliable in determining the speed of vehicles. Similarly, in [20], the authors introduced a multi-sensor fusion technique to enhance the detection of vehicles in challenging weather conditions. Initially, they proposed a supervised learning-based classifier that leverages LightGBM to efficiently extract vehicle targets from radar data. Next, they identified potential vehicle areas from infrared images based on the distribution of radar targets and employed
A Study on Real-Time Vehicle Speed Measurement Techniques
463
Number of publications
pixel regression to predict the region of interest (ROI) of vehicles. The system design becomes more complex for this approach. We have studied several researchers’ works to discover the major progress and drawback of camera-based systems for on-road vehicular speed detection. Based on a critical survey from 2010 to 2022, we found that several researchers and companies are working on different solutions to measure the speed of vehicles accurately. Based on our literature studies using the Web of Science Core Collection, we found around 140 publications related to speed detection of vehicles were found for the period from 2010 to 2022. The report suggests that the number of publications in this area of research could be higher, with only 11–12 papers per year, motivating more research work to make the field more competitive and open to a large number of researchers. The speed detection of vehicles is an essential aspect of transportation safety. Over the years, numerous studies have been conducted to accurately detect vehicles’ speed. The studies revealed that a total of 140 publications were found from 2010 to 2022 related to the speed detection of vehicles. The distribution of these publications is shown in Fig. 1. Most publications were from electrical engineering, electronics, instrumentation engineering, transportation science and technology, and civil engineering. It is clear from the distribution that around 85% of the total publications were combinedly from engineering fields of electrical engineering, electronics engineering, instrumentation engineering, transportation science and technology, and civil engineering fields. This suggests that the research in this area is primarily focused on these fields. Furthermore, the report suggests that the number of publications in this area of research could be higher, with only 11–12 papers per year. This is an important motivation for carrying out more research work to make the field more competitive and open to many researchers. The paper also highlights the need for collaboration
Field of study Fig. 1 Speed measurement publication as per web of science core collection
464
P. K. Sahu and D. Ghosh
between different fields to explore new avenues for research in this area. Though this is a very important area of research, by going with the publications over the last 12 years, the number is around 11–12 papers per year. This gives an essential motivation for carrying out the research work in this area to make the field competitive and open to a large number of researchers. While several methods have been suggested in the past, concerns persist regarding false positive detection, inaccurate speed recording, susceptibility to weather conditions, associated expenses, complexity, and computational overhead regarding memory and time. In this work, we have examined the possibility of minimizing the tradeoff between the cost and computational overhead by designing the proposed system using a commercially available off-the-self IP camera and have used OpenCV on the Python platform to determine the speed.
4 Method This work presents a proposed solution to mitigate the challenges faced by videobased moving vehicle speed detection systems due to complex natural scenes, including the generation of shadow and occlusion by trees and tall structures near the camera placement point. The proposed solution involves installing the camera-based system on a mobile platform or a single traffic pole 8 m from the ground level. The principle used to detect the vehicle’s speed is based on the measurement of the difference in their position in consecutive frames/images. The report provides a workflow of the proposed system, which involves pre-processing the captured video, detecting and tracking the presence of cars in the frame, and determining their speed. Detecting vehicle speed is an essential aspect of traffic safety and enforcement. However, videobased moving vehicle speed detection systems are often affected by the complexity of natural scenes, including shadows and occlusion by trees and tall structures.
4.1 Proposed Solution The proposed solution involves two solutions: (i) to install the proposed camera-based system on a mobile platform like the kind of moving vehicle of enforce department and (ii) to install the camera on a single traffic pole of height 8 m from the ground level. The principle used to detect the vehicle’s speed is based on the measurement of the difference in their position in consecutive frames/images. The workflow of the proposed system is shown in Fig. 2. As shown in the flowchart, the proposed system involves the following steps: Step 1: Capture the video feed from the IP camera. Step 2: Pre-process the captured video to remove noise and adjust the brightness and contrast to ensure a clear and consistent image.
A Study on Real-Time Vehicle Speed Measurement Techniques
465
Start
Install cameras on the traffic signal pole, approximately 8 meters from road level/ in the enforcement vehicle
Capture images or videos
Analyze the images or videos
Calculate the speed
No
If the speed>Limit Yes Store the Information in the Database
End Fig. 2 Flowchart of workflow
466
P. K. Sahu and D. Ghosh
Step 3: Analyze the pre-processed video to detect and track the presence of cars in the frame. Step 4: Track the position of the detected vehicle across multiple frames to determine its speed. The proposed system can be installed on a mobile platform like a moving vehicle of the enforcement department or a single traffic pole of a height of 8 m from ground level. By doing so, the system can overcome the challenges posed by complex natural scenes and occlusion by trees and tall structures near the camera placement point. The proposed system could improve the accuracy and efficiency of vehicle speed detection, thereby contributing to traffic safety and enforcement. The speed of the vehicle is calculated by analyzing the distance it traveled between the consecutive frames and dividing it by the time elapsed between those frames. We have also set a threshold limit for the speed beyond which the proposed system will record the speed as over speed. The work is carried out using the Python OpenCV environment. Table 1 shows the camera specification that we have used for collecting the vehicle motion data for speed calculation. The proposed system comprises a camera mounted on a structure that captures visual information from 5 to 8 m above ground level. The system’s block diagram is shown in Fig. 3. The system consists of the following components: Camera: The camera is mounted on the structure and captures visual information from 5 to 8 m above ground level. The camera angle can be adjusted as per the requirement. Wireless Connectivity: The captured camera information is transmitted wirelessly to the personal computer for processing. Personal Computer: The received data is processed on a personal computer, which processes it and prepares it for display. Readout Unit: The processed data is then displayed on the readout unit. The system captures visual information from 5 to 8 m above ground level. The camera angle can be adjusted per the requirement to capture specific areas or objects of interest. The captured data is transmitted wirelessly to the personal computer and Table 1 Specification of the camera [10] Features
Specification
Image sensor
Progressive Scan CMOS
Max. resolution
1920 × 1080
Min. illumination
0.01 lx @(F2.0, AGC ON), B/W: 0 lx with IR
Angle adjustment
Pan: 0°–360°, tilt: 0°–180°, rotate: 0°–360°
Shutter time
1/3 s to 1/100,000 s
SNR
≥ 52 dB
Focal length and FOV 4 mm, horizontal FOV 90.2°, vertical FOV 48.6°, diagonal FOV 107.6°
A Study on Real-Time Vehicle Speed Measurement Techniques
Computing unit
467
Readout unit
Baselines Fig. 3 Proposed system architecture
processed using image processing algorithms. The processed data is then displayed on the readout unit, allowing users to view the captured information after processing it on the computer. The proposed system with a camera mounted on a structure is a versatile tool that can be used for various types of surveillance and traffic monitoring applications. The system captures visual information from 5 to 8 m above ground level and transmits it wirelessly to a personal computer for processing. The processed data is displayed on the readout unit, providing users with valuable information. We have used the OpenCV for velocity calculation. The vehicle’s speed is obtained by using the formula speed = distance/time. We have a known distance constant set by altering the reference line positioning, as shown in Fig. 3. In our work; speed is computed using two different locations (baseline positioning) and the time difference of a vehicle detected times. We have also incorporated an FPS factor in the speed calculation to compensate for the slow processing and to address the background noise in speed calculation.
5 Results We have conducted several studies for actual speed detection of the vehicles on the road using the camera-based system, as shown in Fig. 3. The results are shown in Figs. 4 and 5. The measurements are carried out under nine different speed-measuring conditions. In Fig. 4, the measurements are carried out by varying the positioning of the baseline, i.e., by changing the distance between the reference lines, as shown in Fig. 3. Figure 4 depicts the measurement speed for six different conditions labeled as speed 1 to speed 6. The reference lines are set apart at a distance of 4, 12, 20, 30, 36, and 52 m, respectively. Figure 4 shows that the actual speeds of vehicles in the traffic
468
P. K. Sahu and D. Ghosh
Fig. 4 Detected speed under different baseline settings
Speed (in kmph)
Detected speed under different look angles
150 100 50 0
ID ID ID 0 1 2
ID ID ID ID ID ID ID ID ID 3 4 5 6 9 10 11 15 16 Detected Vehicle ID @40°
@60°
@90°
Fig. 5 Detected speed under different look angles
stream are not always recognized and recorded correctly by the camera-based system. The results show that the reference line setting is important in determining the speed as close to actual values. Our results show that setting the difference between the two reference lines at 36 m gives the best results without producing too many false positive results. Setting the difference between the reference lines at 52 m results in speed close to the actual speed but also leads to too many false positive alarms. We have used the OpenCV for velocity calculation. The vehicle’s speed is obtained by using the formula speed = distance/time. We have a known distance constant which was set by altering the reference line positioning as shown in Fig. 3. In our work, speed is computed using two different locations (baseline positioning) and the time difference of a vehicle detected times. Similarly, the results shown in Fig. 5 correspond to the three different look angle conditions, i.e., (40°, 60°, and 90°, respectively). The results show that by collecting the readings from the camera at a look angle of 40°, most of the vehicles’ speeds are undervalued, and it only identified the slow-traveling
A Study on Real-Time Vehicle Speed Measurement Techniques
469
vehicles. Similarly, by setting the capture angle 60°, though the measurement values are improved, the readings have a large deviation from the actual readings. However, capturing the speeds of the vehicles by setting the angle at 90° resulted in a highly accurate measurement with the least number of false positives. For this reason, it is important that the reference line separation and the camera look angle and positioning need to be very accurately calibrated before taking the readings for further processing.
6 Conclusion The camera-based system presented in this paper works fairly accurately to measure different speeds. However, the system is very sensitive to factors like (i) the distance between the reference lines, as by selecting a very small value, the system is failing to record the actual speed value due to the size of the vehicles and due to other background error, similarly by selecting the difference between the reference lines very high, resulting in false positive alarm. (ii) The camera placement also plays a very significant role. As we observed from our analysis, improper camera placement has a strong impact on speed calculation. (iii) The results are also getting affected by threshold setting. We have tried our system with different speed levels and different camera placements. The results are shown in Figs. 4 and 5. The results of the speed detection system for each of the nine conditions of speed measurement are shown in Figs. 4 and 5. Vehicle speed detection is an important aspect of reducing the accident-related injuries and fatalities as well as it is an essential component in the intelligent transport system. The proposed work can solve issues like cost and system complexity and does not require high power for its operation. The system is suitable for operation under normal to moderate climatic conditions. The major drawback of our method is the background light condition. Under bad weather conditions, the system may fail to detect the speed of a few vehicles and also may produce false alarms. Acknowledgements The authors acknowledge the support of the Odisha Road Safety Society for all types of support, including financial support for carrying out this work.
References 1. https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries 2. Czajewski W, Iwanowski M (2010) Vision-based vehicle speed measurement method. In: Computer vision and graphics: international conference, ICCVG 2010, Warsaw, Poland, 20–22 Sept 2010, Proceedings, Part I. Springer, Berlin 3. Trivedi JD, Mandalapu SD, Dave DH (2022) Vision-based real-time vehicle detection and vehicle speed measurement using morphology and binary logical operation. J Indus Inform Integr 27
470
P. K. Sahu and D. Ghosh
4. Quan W, Wang H, Gai Z (2020) Spot vehicle speed detection method based on the short-pitch dual-node geomagnetic detector. Measurement 158:107661 5. Kumar T, Kushwaha DS (2016) An efficient approach for detection and speed estimation of moving vehicles. Procedia Comput Sci 89:726–731 6. Cypto J, Karthikeyan P (2022) Automatic detection system of speed violations in a traffic based on deep learning technique. IFS 43:6591–6606 7. Zhang J, Xiao W, Coifman B, Mills JP (2020) Vehicle Tracking and Speed Estimation From Roadside Lidar. IEEE J Select Top Appl Earth Observ Remote Sens 13:5597–5608 8. Ali SSM, George B, Vanajakshi L (2012) A magnetically coupled inductive loop sensing system for less-lane disciplined traffic. In: 2012 IEEE international instrumentation and measurement technology conference proceedings, Graz, Austria, pp 827–832 9. Nayak RP, Sethi S, Bhoi SK (2018) PHVA: a position based high-speed vehicle detection algorithm for detecting high-speed vehicles using vehicular cloud. In: 2018 international conference on information technology (ICIT), Bhubaneswar, India, pp 227–232 10. https://www.hikvision.com/ 11. Thuy TN, Xuan DP, Song JH, Jin S, Kim D, Jeon JW (2011) Compensating background for noise due to camera vibration in uncalibrated-camera-based vehicle speed measurement system. IEEE Trans Veh Technol 60(1):30–43 12. Chen Y, Zhang RH, Shang L (2014) A novel method of object detection from a moving camera based on image matching and frame coupling. PLoS ONE 9(10):e109809 13. Dailey DJ, Cathey FW, Pumrin S (2000) An algorithm to estimate mean traffic speed using uncalibrated cameras. IEEE Trans Intell Transp Syst 1(2):98–107 14. Fernández J, Cañas JM, Fernández V, Robust PS (2021) Real-time traffic surveillance with deep learning. Comput Intell Neurosci 27(2021):4632353 15. Ashtari AH, Nordin MJ, Fathy M (2014) An Iranian license plate recognition system based on color features. IEEE Trans Intell Transp Syst 15(4):1690–1705 16. Buch N, Velastin SA, Orwell J (2011) Review of computer vision techniques for the analysis of urban traffic. IEEE Trans Intell Transp Syst 2(3): 920–939 17. Najman P, Zemcik P (2022) Vehicle speed measurement using stereo camera pair. IEEE Trans Intell Transp Syst 23(3):2202–2210 18. Dahl M, Javadi S (2020) Analytical modeling for a video-based vehicle speed measurement framework. Sensors 20(1):160 19. Costa LR, Rauen MS, Fronza AB (2020) Car speed estimation based on image scale factor. Forens Sci Int 310 20. Wang Z, Zhan J, Li Y, Zhong Z, Cao Z (2022) A new scheme of vehicle detection for severe weather based on multi-sensor fusion. Measurement 191:110737
Deep Learning with Attention Mechanism for Cryptocurrency Price Forecasting V. Yazhini, M. Nimal Madhu, B. Premjith, and E. A. Gopalakrishnan
Abstract Cryptocurrencies are a hot topic in recent years. This study aims to predict the future closing price of Bitcoin and Ethereum using different combinations of Long Short-Term Memory (LSTM), bidirectional-LSTM (Bi-LSTM), and Gated Recurrent Unit (GRU) with attention mechanisms like Bahdanau and Luong. To achieve this, data from different time scales are taken. The tuning of model’s hyperparameters is done to improve the performance, and it is evaluated using the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics. The best results were observed for Ethereum in the very short term when using GRU with Bahdanau’s attention. Similarly, the best results for Bitcoin were found in the very short term, when using Bi-LSTM with Bahdanau’s attention. Overall results of the experiments reveal that the tuning of hyperparameters improves the performance of the model, and the use of attention mechanism on Bi-LSTM and GRU gives a better prediction. Keywords Time series forecasting · Long short-term memory · Attention mechanism · Deep learning · Cryptocurrency price prediction
1 Introduction The cryptocurrency market is a relatively new area of digital currencies, where these are managed using encryption techniques [1]. The focus on cryptocurrencies is increasing since Bitcoin prices are soared [2]. The cryptocurrency concept came into existence after a paper on Bitcoin: A Peer-to-Peer Electronic Cash System [3], where P2P transfer is made possible without any intermediate financial institute [4]. The world has more than 20,000 digital currencies [5]. Among them, there V. Yazhini · M. Nimal Madhu (B) · B. Premjith Center for Computational Engineering and Networking, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] E. A. Gopalakrishnan Amrita School of Computing, Amrita Vishwa Vidhyapeetham, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_32
471
472
V. Yazhini et al.
are a few cryptocurrencies which become popular over some time like Bitcoin and Ethereum. Bitcoin has more than 5.8 million dynamic clients and approximately more than 111 exchanges throughout the world [6]. The challenging task in this area is the extreme volatility of the price [5], where a trustworthy model of forecasting is required to make a meaningful investment. There are a variety of methods that can be used for forecasting, including technical analysis, fundamental analysis and machine learning. Technical analysis involves studying historical market data. Price forecasting in the cryptocurrency market is a complex task due to its high volatility and lack of fundamental value [5]. Since the cryptomarket is relatively new, it is still not well understood, making it difficult to make accurate predictions. Additionally, the market is highly speculative and influenced by a wide range of factors, including investor sentiment, regulatory changes and global events. Despite these challenges, many researchers and traders have attempted to develop forecasting models to gain an edge in the market. Most of the previous work engulfed the usage of RNN-based models like LSTM, GRU and Bi-LSTM [2, 6, 7]. Some other approaches related to the dataset like considering the dataset as minute data for the short term, hour’s data for the mid-term and day’s data for the long term [2].
2 Related Work Some of the existing works are discussed in this section. In [1], a comparative analysis was done on the predictive results of five cryptocurrencies (BTC, ETH, ADA, USDT, BNB) using LSTM, Bi-LSTM and GRU. Some of the results discussed are in form of plots and tabulated values where hyperparameter tuning is not done. On average Bi-LSTM and GRU performed better than LSTM in terms of results. But LSTM and GRU performed better than Bi-LSTM in terms of execution time. In [2], model performances are discussed by taking the data into three different modes, viz. short term (minute data), mid-term (hour data) and long term (day data), and the algorithms employed are LSTM, Bi-LSTM and GRU. RMSE is used as a metric for comparison. The study concluded that the ensemble LSTM (stacked LSTM) model performs better [2]. In [4], the hybrid model of LSTM and GRU is applied to Litecoin and Monero, and the model outperformed the LSTM model by a considerable level. In [5], salp swarm metaheuristics is used for tuning the LSTM model, and the performance is checked on Ethereum. The obtained results showed a large potential when SSA optimization is done. In [6], RNN, LSTM, ARIMA and GRU are explored for Bitcoin. The Accuracy and Regression prediction are done for comparative study. Concluded that GRU performs better when compared with the other models. In [7] which discusses the results of Litecoin, Bitcoin and Ethereum using LSTM, Bi-LSTM and GRU. It has been concluded that GRU performed better with a 0.2456% MAPE score for BTC. Bi-LSTM is the least performed model in that paper [7].
Deep Learning with Attention Mechanism for Cryptocurrency Price …
473
In [8], statistical metrics like distribution analysis, and autocorrelation were explored, and dependency analysis concepts were explored with 7 coin data. In [9], multi-layered GRU is used for the prediction of dogecoin, Bitcoin and Ethereum. Here, 21 forecasting window is used with 3-layered GRU which seems to give better performance in terms of various metrics like MAE, RMSE, RMSE and MAPE. It has been stated that this method outperforms the LSTM and GRU. CNN, ConvLSTM and CNN-LSTM are explored in [10], where the ConvLSTM performed better in comparison with others by 2.4076% MAPE. For prediction, indicators such as moving average, Bollinger band, and Relative Strength Index are also included as input features. In [12], solar irradiance forecasting has been done using deep learning models where the sliding window approach is been implemented for very short-term data. In [13, 14], NSE Stock Market prediction is done using a linear model (ARIMA) and nonlinear models (neural networks). Based on the results obtained, it has been observed that the neural networks (RNN, LSTM, CNN) are outperforming the existing linear model (ARIMA). Stock price forecasting is done in [15], using LSTM, CNN, GRU and ELM, where it is stated that deep learning-based models are well suited for generating accurate forecasts for financial time series. In [16], stock price prediction is done using neural networks with an attention mechanism. In this paper, a comparison is made between LSTM, GRU, SVR and a hybrid model where the attention mechanism is included. MSE is used to evaluate the model, and it is concluded that the use of the attention mechanism with neural network increased the model’s performance. The research attempts are more in the stock markets compared to crypto despite the similarities in the forecasting problems. In the existing studies, the focus is devoted mainly to highly volatile cryptocurrencies like Bitcoin, Ethereum and Litecoin. The main limitation which is found in the existing works is that a combination of neural networks with attention mechanisms is not explored which may improve the model’s performance. Studies that have focused on the price fluctuation analysis over multiple time scales are scarce. The major sad part is that the contribution of these research to building new architectures is also meagre.
3 Proposed Work 3.1 Objective Previous research in this field has largely focused on exploring different variations of the LSTM model, and some have even used LSTM with a single attention mechanism. In this study, deep learning approaches are augmented with attention mechanisms. The DL models used in this paper, viz. LSTM, Bi-LSTM and GRU, can capture sequential information from historical data, which can be quite useful in understanding the underlying patterns in the market. Attention mechanisms have
474
V. Yazhini et al.
produced promising results in the NLP and signal processing domains, owing to the decoder’s flexibility to provide varying weights to the encoded input vectors. Hence, a forecasting network based on deep learning with attention architecture is proposed. In this work, two attention mechanisms, viz. Bahdanau and Luong are tried with different combinations of LSTM, Bi-LSTM and GRU, which are used to realize multistep cryptoprice forecasting. The experimentation is done on two-time horizons dubbed as very short term (less than 15 min) and short term (less than 5 h). In the very short term, the 15-min prediction is made as 5-step forecasting of 3 min/step. Similarly, 5-step forecasting of 1 h/step is used to realize the short-term forecasting process. The metrics used for model performance evaluation are Mean Absolute Error and Root Mean Square Error. The Bi-LSTM and GRU models in combination with Bahdanau gave better results. It is also observed that the addition of an attention layer either Bahdanau or Luong itself improved the model performance significantly. The best results observed for Ethereum in the very short term are MAE of 3.552 and RMSE of 7.001 when using GRU with Bahdanau’s attention. Similarly, the best results for Bitcoin were found in the very short term, with an MAE of 21.799 and RMSE of 43.742 when using Bi-LSTM with Bahdanau’s attention.
3.2 Theoretical Background—Attention Mechanism The attention mechanism is a powerful technique that is widely used in deep learning applications, particularly in the field of natural language processing (NLP). It allows the model to selectively focus on certain parts of an input sequence, such as time series data, and assign greater importance to certain parts of the input when making predictions or decisions. The attention mechanism is particularly useful in sequential data because it enables the model to take into account the context and dependencies between different parts of the input. This mechanism is like focusing on important information while ignoring unimportant information similarly like how the human brain works [17]. The idea behind the attention mechanism is to calculating the attention weight and such that increasing the weight of the important content. This addresses the problem of optimizing the model’s understanding when the input sequence is too long. In time series data, the model’s output layer is passed through the attention mechanism, which converts it into a weighted output. This weighted output is then given as the output layer. There are different variations of attention mechanisms, two of the most widely used are Bahdanau’s attention [18, 19] and Luong’s attention [20]. Bahdanau’s attention uses an attention mechanism that calculates the attention weights based on the current input and the previous hidden state. Luong’s attention, on the other hand, uses different ways of calculating attention scores such as dot product, general and concatenation. These different variations of attention mechanisms have different advantages and disadvantages depending on the specific use case [21]. The attention mechanism has been widely adopted and has been used in various natural
Deep Learning with Attention Mechanism for Cryptocurrency Price …
475
languages processing tasks such as machine translation, text summarization and question answering [12, 22]. The attention mechanism has also been used in computer vision tasks such as object detection and has been found to improve the performance of the model. Overall, the attention mechanism is a powerful tool that can greatly enhance the performance of deep learning models, particularly in sequential data. Bahdanau’s attention, also referred to as additive attention, is a specific type of attention mechanism that utilizes a neural network to calculate the attention weights [18]. These weights are computed by comparing the current state of the input sequence with the previous state and are utilized to weight the input sequence when making predictions. The attention weights are learned during training, allowing the model to automatically focus on the most important parts of the input sequence. Bahdanau et al. proposed a mechanism to address the limitations of the traditional sequence-to-sequence (s2s) model, which utilized a fixed context vector to represent the input sentence. They introduced a variable context vector that enabled the model to automatically search for the relevant portion of the input sentence, thus overcoming this limitation. In attention mechanism architecture, two parts are there: one is encoder and another is decoder. For each word, the encoder calculates the context vector, and these vectors are converted into weights using attention layer and followed by pointwise multiplication which gives a new context vector which could be sent to the decoder. The decoder’s input is composed of two components: the previous hidden state and the context vector that are dynamically computed. Relatively better performance is observed when a bidirectional RNN was utilized as the encoder of the s2s model. The global attentional model is based on the idea of taking into account all the hidden states of the encoder when determining the context vector ct. In this model, an alignment vector at of variable length, which is equal to the number of time steps on the source side, is calculated by comparing the current target hidden state h t with each source hidden state h s . The alignment vector at is then used to derive the context vector ct (
αt (s) = align h t , h s
)
)) ( ( exp score h t , h s =∑ ) , ( s' exp(score h t , h s' )
(1)
where t is the time step, current target state is h t , source states is h s , and context vector is ct . The global attentional model involves inferring an alignment weight vector (at ) at each time step (t) based on the current target state h t and all source states h s . This weight vector (at ) is then used to calculate a global context vector ct through a weighted average of all the source states, where the weights are determined by αt . Luong’s attention, also known as multiplicative attention, is another type of attention mechanism that uses a neural network to compute the attention weights [19]. Unlike Bahdanau’s attention, Luong’s attention uses a dot product of the current state and the previous state to compute the attention weights. This allows the model to
476
V. Yazhini et al.
focus on specific parts of the input sequence based on their similarity to the current state. Luong’s attention can be further divided into three types: general, concat and dot-product attention, each with its own specific way of computing the attention weights. (
score h t , h s
)
⎧ T ⎨ h t h s − dot = h Tt Wα h s − general ]) ( [ ⎩ T υα tanh Wα h t ; h s − concat
(2)
In previous attempts to construct attention-based models, we utilized a locationbased function where the alignment scores were calculated solely from the target hidden state h t , as follows: at = softmax (Waht). Given the alignment vector as weights, the context vector ct is computed as the weighted average of all the source hidden states.
3.3 Methodology Two cryptocurrencies were used in this project: • Ethereum, • Bitcoin. The cryptodata of Bitcoin and Ethereum is taken from 2016 to 2021 as 3-min data (very short term) and 1-h data (short term) (Fig. 1). Pre-processing. The data is normalized using minmax scaler. yscaled =
Fig. 1 Dataset of ethereum short term
y − ymin , ymax − ymin
(3)
Deep Learning with Attention Mechanism for Cryptocurrency Price …
477
where y represents the data. ymin and ymax represent the minimum and maximum value corresponding to the dataset. yscaled is the normalized data. Prediction. The processing of input data includes windowing function where multistep prediction method is applied. Multi-step prediction involves taking a certain number of data points as input and using LSTM, Bi-LSTM, and GRU models to predict the next few data points. The number of data points as inputs is decided based on the partial autocorrelation. The partial autocorrelation function (PACF) is plotted based on the data points of the ‘close’ column with a maximum lag of 50 (Fig. 3). PACF plot shows a significant spike at certain lag k and a sharp drop-off at corresponding lag k + 1 below the highlighted area (95% confidence level), this suggests that the order k may be appropriate windowing function for the data (Fig. 2).
Crypto Close Data
Normalization
LSTM/BiLSTM/GRU
LSTM with Attention Mechanism
Bi-LSTM with Attention Mechanism
GRU with Attention Mechanism
Predicted Results
Predicted Results
Predicted Results
Predicted Results
Fig. 2 Model for very short term and short-term closing price prediction Fig. 3 Partial autocorrelation
478
V. Yazhini et al.
4 Result Analysis Algorithm flow is as follows: • Cryptodata is pre-processed—minmax scaler is done. • Splitting the data into train and test. • Three different algorithms are taken, and corresponding hyperparameters are tuned. • Two different attention mechanisms are added to each combination of algorithm. • Results are evaluated using MAE and RMSE metrics. MAE is calculated by taking the absolute of difference between the actual and predicted values. MAE =
n | 1 ∑|| yi − y i | n i=1 Ʌ
(4)
RMSE is calculated by taking the square root of square of difference between the actual and predicted values. ┌ | n |1 ∑( )2 yi − y i RMSE = √ n i=1 Ʌ
(5)
Ʌ
yi is actual value, y i is predicted value and n is number of datapoints. Based on the results obtained, comparison is done among LSTM variants and LSTM variants with attention mechanism. Below stated combination of models are experimented for Bitcoin and Ethereum. • Experiment I: LSTM with Bahdanau’s and Luong’s attentions (short term) for Ethereum Coin and Bitcoin. • Experiment II: Bi-LSTM with Bahdanau’s and Luong’s attentions (short term) for Ethereum Coin and Bitcoin. • Experiment III: GRU with Bahdanau’s and Luong’s attentions (short term) for Ethereum Coin and Bitcoin. • Experiment VI: GRU with Bahdanau’s and Luong’s attentions (very short term) for Ethereum Coin and Bitcoin. • Experiment V: Bi-LSTM with Bahdanau’s and Luong’s attentions (very short term) for Ethereum Coin and Bitcoin. • Experiment VI: GRU with Bahdanau’s and Luong’s attentions (very short term) for Ethereum Coin and Bitcoin With reference to the Experiment I–VI, it is observed that
Deep Learning with Attention Mechanism for Cryptocurrency Price … Table 1 Results of ethereum short term
479
Model
Attention
MAE
RMSE
LSTM
No attention
31.159
52.658
LSTM
Bahdanau’s
16.291
33.717
LSTM
Luong’s
16.300
28.909
Bi-LSTM
No attention
30.261
52.720
Bi-LSTM
Bahdanau’s
16.363
31.517
Bi-LSTM
Luong’s
16.246
27.201
GRU
No attention
22.009
32.304
GRU
Bahdanau’s
18.800
35.759
GRU
Luong’s
16.372
30.423
• The MAE and RMSE results show that the model performs better when attention mechanism is included when compared to without attention. • Bahdanau’s attention performed better than Luong’s attention in majority of combinations of experiments. • Bi-LSTM and GRU performed better than LSTM in most of the experiments, which is inferred from the MAE and RMSE scores. This study aims to establish a comprehensive and cohesive methodology for forecasting short-term and very short-term crypto closing prices. To achieve this goal, a series of experiments were conducted using two popular cryptocurrencies as test cases, three variations of the LSTM model, and two attention mechanisms. The results of these experiments were carefully analysed and summarized in a series of Tables 1, 2, 3 and 4. One of the key findings of this study is that the Bahdanau’s attention mechanism performed better when paired with Bi-LSTM and Gated Recurrent Unit (GRU) models when hypertuning of parameters are done, as opposed to the Vanilla LSTM model. This research is primarily a comparative analysis of different experimental combinations, and it aims to provide a methodology for cryptoprice prediction. It Table 2 Results of ethereum very short term
Model
Attention
MAE
RMSE
LSTM
No attention
32.441
66.475
LSTM
Bahdanau’s
25.339
38.325
LSTM
Luong’s
13.136
18.596
Bi-LSTM
No attention
8.136
16.929
Bi-LSTM
Bahdanau’s
4.504
7.934
Bi-LSTM
Luong’s
6.850
9.708
GRU
No attention
7.975
9.994
GRU
Bahdanau’s
3.552
7.001
GRU
Luong’s
4.692
9.260
480 Table 3 Results of bitcoin short term
Table 4 Results of bitcoin very short term
V. Yazhini et al.
Model
Attention
LSTM
No attention
MAE 868.814
RMSE 1001.084
LSTM
Bahdanau’s
707.594
896.369
LSTM
Luong’s
561.270
713.421
Bi-LSTM
No attention
1228.297
1336.894
Bi-LSTM
Bahdanau’s
201.024
226.163
Bi-LSTM
Luong’s
99.335
113.146
GRU
No attention
125.891
153.055
GRU
Bahdanau’s
59.289
69.700
GRU
Luong’s
108.929
117.493
Model
Attention
MAE
RMSE
LSTM
No attention
50.420
134.687
LSTM
Bahdanau’s
25.160
57.801
LSTM
Luong’s
29.310
65.590
Bi-LSTM
No attention
39.381
122.129
Bi-LSTM
Bahdanau’s
21.799
43.742
Bi-LSTM
Luong’s
27.310
71.362
GRU
No attention
37.782
56.609
GRU
Bahdanau’s
26.136
47.765
GRU
Luong’s
24.814
54.238
also provides guidance for selecting the most suitable model combinations for this task. Figures 4, 5, 6 and 7 detail the prediction obtained with different combinations of DL algorithms and attention mechanisms, and it can be seen that the predictions closely follow the real trend. Table 5 summarizes the best model results obtained for Bitcoin and Ethereum cryptocurrencies. Each model performance is evaluated using RMSE and MAE scores with reference to model and time-horizons combination.
Deep Learning with Attention Mechanism for Cryptocurrency Price … Fig. 4 Prediction of ethereum short term
Fig. 5 Prediction of ethereum very short term
Fig. 6 Prediction of bitcoin short term
481
482
V. Yazhini et al.
Fig. 7 Prediction of bitcoin very short term
Table 5 Consolidated results of ethereum and bitcoin Coin
Time horizon
Model
Attention
MAE
RMSE
Ethereum
Short term
Bi-LSTM
Luong’s
16.246
27.201
Ethereum
Very short term
GRU
Bahdanau’s
3.552
7.001
Bitcoin
Short term
GRU
Bahdanau’s
59.289
69.700
Bitcoin
Very short term
Bi-LSTM
Bahdanau’s
21.799
43.742
5 Conclusion Deep learning algorithms augmented with attention mechanisms are used for the prediction of cryptoprices over 2-time horizons, dubbed as short term and very short term. The obtained results are summarized in Table 5. From the experiment analysis, it is observed that the hyperparameter improved the performance of the model. BiLSTM and GRU perform better than LSTM. Applying attention to the LSTM variants performs better in terms of prediction. The shortcoming of the work is that a single method that can be used for the same commodity at different time scales could not be arrived at. This can be considered as future work and another way forward is the use of indicators alongside cryptoprices forming a multi-variate forecasting approach.
References 1. Hansun S, Wicaksana A, Khaliq AQ (2022) Multivariate cryptocurrency prediction: comparative analysis of three recurrent neural networks approaches. J Big Data 9:50. https://doi.org/ 10.1186/s40537-022-00601-7
Deep Learning with Attention Mechanism for Cryptocurrency Price …
483
2. Shin M, Mohaisen D, Kim J (2021) Bitcoin price forecasting via ensemble-based LSTM deep learning networks. In: 2021 ınternational conference on ınformation networking (ICOIN), Jeju Island, Korea (South), pp 603–608. https://doi.org/10.1109/ICOIN50884.2021.9333853 3. Nakamoto S, Bitcoin A (2008) A peer-to-peer electronic cash system. Bitcoin 4(2). https://bit coin.org/bitcoin.pdf 4. Patel MM, Tanwar S, Gupta R, Kumar N (2020) A deep learning-based cryptocurrency price prediction scheme for financial ınstitutions. J Inform Secur Appl 55:102583. ISSN 2214–2126. https://doi.org/10.1016/j.jisa.2020.102583 5. Stankovic M, Bacanin N, Zivkovic M, Jovanovic L, Mani J, Antonijevic M (2022) Forecasting ethereum price by tuned long short-term memory model. In: 2022 30th telecommunications forum (TELFOR), Belgrade, Serbia, pp 1–4. https://doi.org/10.1109/TELFOR56187.2022.998 3702 6. Rizwan M, Narejo S, Javed M (2019) Bitcoin price prediction using deep learning algorithm. In: 2019 13th ınternational conference on mathematics, actuarial science, computer science and statistics (MACS), Karachi, Pakistan, pp 1–7. https://doi.org/10.1109/MACS48846.2019. 9024772 7. Hamayel MJ, Owda AY (2021) A novel cryptocurrency price prediction model using GRU, LSTM and bi-LSTM machine learning algorithms. AI 2(4):477–496. https://doi.org/10.3390/ ai2040030 8. Vaz de Melo Mendes B, Fluminense Carneiro A (2020) A comprehensive statistical analysis of the six major crypto-currencies from August 2015 through June 2020. J Risk Finan Manage 13:192. https://doi.org/10.3390/jrfm13090192 9. Patra GR, Mohanty MN (2022) Price prediction of cryptocurrency using a multi-layer gated recurrent unit network with multi features. Comput Econ. https://doi.org/10.1007/s10614-02210310-1 10. Kilimci H, Yıldırım M, Kilimci ZH (2021) The prediction of short-term bitcoin dollar rate (BTC/USDT) using deep and hybrid deep learning techniques. In: 2021 5th ınternational symposium on multidisciplinary studies and ınnovative technologies (ISMSIT), Ankara, Turkey, pp 633–637. https://doi.org/10.1109/ISMSIT52890.2021.9604741 11. Mittal M, Geetha G (2022) Predicting bitcoin price using machine learning. In: 2022 ınternational conference on computer communication and ınformatics (ICCCI), Coimbatore, India, pp 1–7. https://doi.org/10.1109/ICCCI54379.2022.9740772 12. Bhatt A, Ongsakul W, Singh JG (2022) Sliding window approach with first-order differencing for very short-term solar irradiance forecasting using deep learning models. Sustain Energy Technol Assess 50:101864. https://doi.org/10.1016/j.seta.2021.101864 13. Hiransha M, Gopalakrishnan EA, Menon VK, Soman KP (2018) NSE stock market prediction using deep-learning models. Procedia Comput Sci 132:1351–1362. ISSN 1877-0509. https:// doi.org/10.1016/j.procs.2018.05.050 14. Nair BB, Kumar PS, Sakthivel NR, Vipin U (2017) Clustering stock price time series data to generate stock trading recommendations: an empirical study. Exp Syst Appl 70:20–36. ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2016.11.002 15. Balaji AJ, Ram DH, Nair BB (2018) Applicability of deep learning models for stock price forecasting an empirical study on BANKEX data. Procedia Comput Sci 143:947–953. https:// doi.org/10.1016/j.procs 16. Li C, Zhang X, Qaosar M, Ahmed S, Alam KMR, Morimoto Y (2019) Multi-factor based stock price prediction using hybrid neural networks with attention mechanism. In: 2019 IEEE international conference on dependable, autonomic and secure computing, international conference on pervasive intelligence and computing, international conference on cloud and big data computing, international conference on cyber science and technology congress (DASC/PiCom/ CBDCom/CyberSciTech), pp 961–966 17. Zhang S, Zhang H (2020) Prediction of stock closing prices based on attention mechanism. In: 2020 16th Dahe Fortune China Forum and Chinese high-educational management annual academic conference (DFHMC), IEEE Explore. https://doi.org/10.1109/DFHMC52214.2020. 00053
484
V. Yazhini et al.
18. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. ICLR. arXiv 2015, arXiv:1409.0473 19. Amudha J, Divya KV, Aarthi R (2020) A fuzzy based system for target search using top-down visual attention. J Intell Fuzzy Syst 38(5):6311–6323. https://doi.org/10.3233/JIFS-179712 20. Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:1508.04025v5, https://doi.org/10.48550/arXiv.1508.04025 21. Ganesan S, Ravi V, Krichen M, Sowmya V, Alroobaea R, Soman KP (2021) Robust malware detection using residual attention network. In: 2021 IEEE ınternational conference on consumer electronics (ICCE), Las Vegas, NV, USA, pp 1–6. https://doi.org/10.1109/ICCE50685.2021. 9427623 22. Rahesh R, Sajith Variyar VV, Sivanpillai R, Sowmya V, Soman KP, Brown GK (2022) Segmentation of epiphytes in grayscale ımages using a CNN-transformer hybrid architecture. In: Bhateja V, Khin Wee L, Lin JCW, Satapathy SC, Rajesh TM (eds) Data engineering and ıntelligent computing. Lecture Notes in Networks and Systems, vol 446. Springer, Singapore. https://doi.org/10.1007/978-981-19-1559-8_13
Deep Learning-Based Continuous Glucose Monitoring with Diabetic Prediction Using Deep Spectral Recurrent Neural Network G. Kiruthiga, L. Shakkeera, A. Asha, B. Dhiyanesh, P. Saraswathi, and M. Murali
Abstract It is estimated that approximately 50% of the world’s population has diabetes mellitus. Diabetic diseases are caused by either a lack of insulin produced by the pancreas or a lack of insulin used efficiently by the body. Every year, a lot of money is spent on treating a person with diabetes on. An individual with diabetes has either insufficient insulin produced by the pancreas or ineffective utilisation of insulin by the body, resulting in chronic disease. Every year, a lot of money is spent on treating a person with diabetes. Therefore, prediction is the most important issue and the most accurate and reliable application method. It also needs to be more precise in determining patients’ insulin levels. To overcome this problem, this study proposes an approach using the deep spectral recurrent neural network (DSRNN) algorithm. It is one of the artificial intelligence systems, especially the deep spectral recurrent neural network (DSRNN), used to predict insulin levels in diabetic patients. Deep spectral recurrent neural networks (DSRNN) were selected to develop models for predicting G. Kiruthiga CSE, IES College of Engineering, Thrissur, Kerala, India e-mail: [email protected] L. Shakkeera CSE, Presidency University, Bengaluru, Karnataka, India e-mail: [email protected] A. Asha ECE, Rajalakshmi Engineering College, Chennai, Tamil Nadu, India e-mail: [email protected] B. Dhiyanesh (B) CSE, Dr. N.G.P. Institute of Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] P. Saraswathi IT, Velammal College of Engineering and Technology, Madurai, Tamil Nadu, India e-mail: [email protected] M. Murali Sona College of Technology, Salem, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_33
485
486
G. Kiruthiga et al.
diabetes. Initially, using the diabetic dataset for analysis, the expected result is based on training and testing processing. Then, preprocessing is used to reduce the irrelevant data. Preprocessed data will enter the next step of feature extraction using the multiscalar feature selection (MSFS) algorithm. In this method of analysis, the data is based on maximum weights. And they selected the features’ threshold values using social spider optimisation (SSO) analysis of the importance of the features. Finally, enter the classification using DSRNN for a diabetic prediction based on the insulin level. Diabetes technology, such as continuous glucose monitoring (CGM), provides a wealth of data that enables measurement. Depending on the technology used, the sampling frequency varies in minutes. This model is a good predictor, and the probability model for diabetes is tested with accuracy on experimental data. Higher accuracy can be achieved if models are trained on future data. Keywords Diabetes · Continuous glucose monitoring (CGM) · Multiscalar feature selection (MSFS) · Deep spectral recurrent neural network (DSRNN) · Insulin level
1 Introduction Diabetes is caused by obesity and hyperglycemia. It influences the chemical insulin, causes metabolic issues in crabs and further develops glucose levels. Diabetes happens when the body doesn’t create sufficient insulin. As per the World Health Organization (WHO), around 422 million individuals experience the ill effects of diabetes, particularly those in low-pay and low-pay nations. By 2030, that number will rise to 490 billion. Diabetes is, however, a common condition in a variety of countries. These include Canada, China and India. Approximately 40 million Indians have diabetes, out of a population of more than 100 million. A majority of deaths worldwide are caused by diabetes. Early discovery of illnesses like diabetes can save lives. To this end, this work researches diabetes indicators by determining various attributes related to diabetes and utilises the Pima Indian diabetes dataset and applies different AI characterisation and bunching techniques are to predict diabetes. AI is a technique for expressly preparing a PC or machine. Other AI procedures accumulate information and give beneficial outcomes by building different characterisation and grouping models from the gathered datasets. The data gathered in this manner is used to predict diabetes. AI can meet expectations using different strategies. However, it takes work to pick the best one. Subsequently, for this reason, apply general characterisation and bunching procedures to the dataset for estimation. Lately, deep learning has accomplished promising outcomes in a few fields because of its strong elemental learning capacities. Multiscalar component choice (MSFS) in light of deep learning a deep learning grouping structure remakes the information vector and lessens the aspect. Multiclass-calculated relapse was utilised for characterisation using diabetes information. DSRNNs are used to straightforwardly
Deep Learning-Based Continuous Glucose Monitoring with Diabetic …
487
group multiscalar feature selection (MSFS) information in the ghastly space, utilising point-by-point boundary apportioning to battle the scourge of dimensionality. Layer network can accomplish better characterisation and execution. We propose a standardised multiscalar feature selection (MSFS) and highlight extraction model to remove proficient ghostly spatial elements for grouping. To further develop characterisation execution, numerous scientists have proposed phantom spatial arrangements, which incorporate spatial settings and ghastly data. Be that as it may, this article won’t cover this subject. The DSRNN model further develops the order exactness of UCI information because of its element portrayal. Notwithstanding, the DSRNN model requires countless boundaries to be established. Hyperspectral images with few notable examples can’t make the most of DSRNN models. Similarly, each preceding strategy treats the range as a vector, and data is lost. Additionally, they considered the pixel spectra to be free of one another. Pixels are focused in unordered element space. Be that as it may, hyperspectral information can be seen as a constant otherworldly succession inside a ceaseless ghostly band. A typical deep learning model, an intermittent brain organisation, is better for tackling lining issues and has fewer boundaries than a DSRNN model. In this way, we use DSRNN to characterise constant attributes for ordering results.
2 Related Work Deep learning strategies for diabetic retinopathy (DR) analysis are frequently condemned because of the absence of translation of indicative outcomes, restricting their clinical application. Concurrently expecting DR-related action while diagnosing DR seriousness can address this issue by providing supporting evidence [1]. Deep learning models implanted in portable applications play a significant role in foreseeing specific illnesses, such as diabetes [2]. A diabetes expectation model based on deep convolutional neural network (DCNN) has been proposed to forestall and diminish diabetes and analyse and foresee it [3]. A large portion of the early work focused on improving the precision of predictive models; however, frequently accessible datasets are too small to comprehend the true capability of profound learning calculations [4]. Do deep learning techniques require that they include extraction [5]? It has been assessed for BG-level multi-step expectations utilising different information highlights, relapse model requests and recursive or direct techniques [6]. It has been evaluated for BG-level multi-step expectations using other information highlights, relapse model requests and recursive or explicit strategies [7]. Assessing the likelihood of death due to many influencing factors is difficult and time-consuming. Suppliers are keen on screening high-risk ICU patients to alleviate risk factors [8].
488
G. Kiruthiga et al.
With recent advancements in the Internet of things in medication (IoMT), CGM and deep learning strategies have been demonstrated to be cutting-edge in BG forecasting [9]. Assessments can be obtained from daily blood glucose readings for 5–12 weeks [10]. To assemble the proactive model, we utilised help vector machines and ten notable elements in writing, which are solid indicators of future diabetes because of the unevenness of the dataset concerning class marks. We prepared the model utilising 10-overlap cross-approval and approved it using the holdout set [11]. An improved arrangement naturally recognises DR using just computational strategies. Robotised techniques can dependably determine the presence of irregularities in fundus images (FI) [12]. Then again, computerised imaging methods for illness diagnosis are an entirely solid and quickly developing field in biomedicine. The proposed model aims to assess the demonstrative exhibition of iridology, an old integral and elective medication strategy for diagnosing type 2 diabetes, utilising delicate registration techniques [13]. Decision tree (DT) procedures have been effectively used in applications like clinical diagnostics. By creating arrangement frameworks, AI calculations can be extremely valuable in tackling well-being-related issues and assisting specialists with anticipating and analysing illnesses [14]. The patient’s way of life, genetic data and different elements determine the main driver of the sickness. In this unique circumstance, information mining aims to recognise designs that work with the early conclusion of infection and proper therapy [15]. For the time being, with the enormous measure of information related to treatment status and analysis, giving ideal treatment is exceptionally difficult. To foresee an illness, understanding its side effects is vital. AI (machine learning) calculations are presently important in diagnosing sickness [16]. Early determination of DR is troublesome, and the indicative cycle can be tedious, in any event, for experienced experts. Consequently, we propose a PC-aided finding in light of a profound learning calculation that naturally distinguishes reference diabetic retinopathy by grouping various retinal fundus photos into two grades [17]. Insulin-induced hypoglycemia is a difficult problem, particularly for people with diabetes who are awake at night, gives somewhere around 30 min of glucose forecasts and low glucose alarms [18]. Continuous glucose monitoring system (CGMS) can gauge blood glucose levels in diabetic patients at high examination rates, producing much information. AI procedures can use this information to anticipate future upsides of glucose levels, assisting with forestalling difficult hyperglycemic or hypoglycemic circumstances early and advancing diabetes care [19]. A critical test is the early forecast of diabetic vascular injury utilising cutting-edge innovation since suitable preventive and treatment measures can decrease the gamble of improvement and the weight of intricacies [20].
Deep Learning-Based Continuous Glucose Monitoring with Diabetic …
489
3 Proposed Method The deep spectral recurrent neural network (DSRNN) architecture uses one neuron, one output layer and four hidden layers, each with six neurons. Diabetes is treated with insulin injections (the amount and type of insulin vary from patient to patient). The average maintenance period is ten days. A glucose monitoring system monitors your calculated glucose concentration every minute for real-time safety. DSRNNbased blood glucose monitoring requires a standard blood glucose metre with at least four daily readings. In addition, the glucose sensor is changed every three days. These patients are fitted with artificial neural network-based monitoring systems to monitor their daily activities. Finally, information on food intake and insulin injection (type, dose and timing) was used to tailor patient insulin doses. Figure 1 depicts the DSRNN-based continuous glucose monitoring system for people with diabetes. Initially, initialising the dataset is used for the preprocessing stage. We are monitoring the CGM for diabetic patient food information, extracting the features using MSFS to select the data values based on the SSO and finally recommending the classification using DSRNN for the output.
3.1 Dataset Description Pima Indian Diabetes Dataset, a UCI repository, was used to collect the data. This dataset consisted of 768 patients with multiple features. Table 1 describes the nine attributes as class variables for each data point. Such variables represent 0 and 1 outcomes for people with diabetes. It can be positive or negative in diabetic patients. Diabetes distribution has developed a model to predict diabetes. However, the dataset is slightly unbalanced, with about 500 classes labelled as negative (indicating diabetes) as 0 and 268 as positive (1, meaning diabetes).
Fig. 1 Proposed diagram for DSRNN-based continuous glucose monitoring for diabetics
490 Table 1 Dataset details
G. Kiruthiga et al.
S. No.
Attributes
1
Pregnancy values
2
Glucose monitoring values
3
Blood pressure (BP)
4
Skin thickness (ST)
5
Insulin (IL)
6
BMI
7
Diabetes pedigree function (DPF)
8
Age
9
Gender
3.2 Preprocessing Stage Data preprocessing is a very important process. Most health-related data contains outliers and other impurities that contribute to the validity of the data. The extracted data is preprocessed to improve the quality and truth of the obtained data. To effectively apply machine learning techniques to datasets, this process is critical for accurate results and successful predictions. Preprocessing of the Pima Indian diabetes dataset requires two steps. Remove all occurrences of a value of zero (0): Remove all occurrences of zero (0). It can’t have a null value. So that event was deleted. Eliminate irrelevant activities and events and create functional groups. This process, known as feature selection, helps reduce the dimensionality of the data. Splitting of data: After cleaning the data, normalise the data during model training and testing. After spitting out the data, we train the algorithm on the training dataset and get the test dataset. This training process creates a trained model from the training data’s logistic, mean and eigenvalues. The purpose of normalisation is to keep all attributes at the same level.
3.3 Feature Extraction Using MSFS Feature extraction methods can reduce the number of attributes and avoid unnecessary features. Data attributes or characteristics used to train deep learning can affect the performance that can be achieved. Irrelevant or inappropriate features can adversely affect model performance. Some include dimensionality reduction, which reduces the dimensionality of image modelling—reducing overfitting, improving accuracy and reducing training time. Algorithm Step 1: Initialize the features of MSFS
Deep Learning-Based Continuous Glucose Monitoring with Diabetic …
491
Step 2: Calculate for each feature weights = 1, 2, … n Step 3: Calculate each feature of the diabetic important () class // diabetic important features The main aspects are satisfactory do For each = 1 do n then Calculate the finest feature attributes from in n do End for End while Step 4: Calculate important feature weight For each = 1 do n then // refers to feature optimal weight from training mutual features and refers to the class label of (). End for Step 5: Update the nearest feature weights Nfw Step 6: Return ← Nfw Stop The above algorithm steps provide efficient weighted features from the crosssectional UCI dataset. The ability of nearest feature weights (Nfw) to accurately determine the discriminant references for subsets of features, the most important of which are interactions and independence between features, is an essential feature of this method.
3.4 Feature Selection Using SSO A social spider optimisation algorithm that optimises data clusters, all entries were validated using the nonparametric rank-sum test. Use this method to solve the minimisation problem. Algorithmically, the fitness function depends on the approximate set of biases and considers the number of features selected. First, initialise the number of spiders randomly and then convert each individual to a binary vector of lengths using the following equation: 1 m 1 + F −an (t) ) ( { 0 if sp anm (t) >∈ m an (t + 1) = , 1 for other values SP =
(1)
(2)
492
G. Kiruthiga et al.
where anm spider iteration values t and ∈ [0, 1]. Used the dependencies weights are given by, γx ( f d) =
|Posa ( f d)| A = S ∪ f d, |U |
(3)
where fd and S are the weighted decision features and 〖Pos〗a ( fd) is the positive classes for using the information. Evaluation of the performance of the solutions. ( / ) F(S) = pγ R (D) + (1 − ρ) 1 − |A| |C| ,
(4)
where ρ ∈ [0,1] is the parameter between selected features and classification. The fitness of each spider is compared to the global best (Fbest), and if it has the best fitness value, Fbest replaces the current spider and becomes the reduced set.
3.5 Classification Using Deep Spectral Recurrent Neural Networks (DSRNN) The DSRNN algorithm categorises diabetic features. The remaining features are effectively refined to improve the classification based on edge weight input to the classifier. The hidden layer enables the iterative function to reduce the computational complexity of traditional algorithms and predict results and accuracy. Three layers are used in deep spectral recurrent neural networks (DSRNNs): the input layer, the hidden layer and the output layer. The first layer is the input layer, which processes the feature selection data; the middle layer calculates the diabetic weights, and the output layer predicts the drug recommendations. Algorithm Input: Nearest feature weights (Nfs) Output: Classification result as ˙Insulin Recommendation (IR) Start Step 1: Input the weights (Cfs) Step 2: Get RNN (each data n) Step 3: Evaluate the group feature (gf) as Frc − ids → Fc11 , Fcl2 … Process an ordered list Ord → {L1, L2 …} to input layer Step 4: Calculate the decision classifier and neural counter in the hidden layer For initializing the values a − 1 and b = 1 data count
Deep Learning-Based Continuous Glucose Monitoring with Diabetic …
493
For i = 0 to fix marginal weightage class as threshold class Calculate the coefficient value of i → j as chance point p Cluster featured value (Cfv → i to j) Initialized to set definite value s threshold margin Set the value v to each trained data End for End For Step 5: Evaluate the closest feature weights Step 6: Result for recommendation Using a deep spectral recurrent neural network (DSRNN) and evaluating recommendations, the algorithm steps above produced accurate predictions for diabetic value prescription.
4 Results and Discussıon The experimental setup process to evaluate the proposed deep spectral recurrent neural network (DSRNN) algorithm and previous algorithms is called dimensional convolutional neural network (DCNN) and decision tree (DT). The diabetic indications database is collected from an online UCI repository. Table 2 shows recommendations and parameter settings for existing methods running in Jupiter notebooks in the Anaconda environment. A 70% training dataset and a 30% testing dataset are used to assemble the data. A confusion matrix is used to calculate all the parameters, such as precision, recall, error and predictive performance. Figure 2 presents an analysis of prediction performance in percentage. The proposed deep spectral recurrent neural network (DSRNN) algorithm prediction result is 93%. Similarly, the existing algorithms have a dimensional convolutional neural network (DCNN) result of 87% and a decision tree (DT) of 83% for 300 records. Table 2 Proposed parameters
Parameters items
Values
Language
Python
Tool
Anaconda
Name of the dataset
Diabetic dataset
Number of data
700
494
G. Kiruthiga et al.
Prediction in %
Performance of Prediction in % 100 90 80 70 60 50 40 30 20 10 0 DCNN
DT
100 data
200 data
DSRNN
300 data
Comparison methods Fig. 2 Analysis of prediction
Figure 3 presents an analysis of precision’s performance in percentage terms. The proposed deep spectral recurrent neural network (DSRNN) algorithm’s precision result is 92%. Similarly, the existing algorithms are the dimensional convolutional neural network (DCNN), which resulted in 86%, and the decision tree (DT), which resulted in 82% for 300 records. Figure 4 presents an analysis of recall performance in percentage terms. The proposed deep spectral recurrent neural network (DSRNN) algorithm recall result is 91%. Similarly, the existing algorithms have a dimensional convolutional neural
Precision performance
Precision in %
100 80 60 40 20 0 100 data
DCNN
Fig. 3 Performance of precision
200 data
DT No of data
300 data
DSRNN
Deep Learning-Based Continuous Glucose Monitoring with Diabetic …
495
network (DCNN) result of 85% and a decision tree (DT) result of 83% for 300 records. Figure 5 presents time complexity performance in seconds. The proposed deep spectral recurrent neural network (DSRNN) algorithm time complexity result is 31 s. Similarly, the existing algorithms are the dimensional convolutional neural network (DCNN) result of 79 s and the decision tree (DT) result of 70 s for 300 records.
Recall Performance 100
Recall in %
80 60 40 20 0 DCNN
DT
100 data
200 data
DSRNN
300 data
Comparison methods Fig. 4 Performance of recall
Time Complexity Performance 100
Time in sec
80 60 40 20 0 DCNN
100 data
Fig. 5 Analysis of time complexity
DT
200 data Comparison Methods
DSRNN
300 data
496
G. Kiruthiga et al.
5 Conclusion Deep learning methods have been used in a wide range of fields, and their use in the medical field is becoming increasingly popular as they develop. The design and implementation of diabetes prediction and performance analysis methods using machine learning methods have been successfully implemented. The model was customised and evaluated multiple times for each contributor to the UCI dataset. Based on the mean RMSE and MAE, the proposed model predicts better in min. CGM for diabetic analysis demonstrated improved clinical accuracy, but there is still room for improvement. A deep spectral recurrent neural network (DSRNN) classification accuracy of 93% is achieved. Test results can help healthcare providers make early predictions and decisions to treat diabetes and save lives. This model can capture most individual glucose dynamics and has clear potential for adoption in real clinical applications.
References 1. Wang J, Bai Y, Xia B (2020) Simultaneous diagnosis of severity and features of diabetic retinopathy in fundus photography using deep learning. IEEE J Biomed Health Inform 24(12) 2. Estonilo CG, Festijo ED (2021) Development of deep learning-based mobile application for predicting diabetes mellitus. In: 2021 4th international conference of computer and informatics engineering (IC2IE), pp 13–18 3. Xu L, He J, Hu Y (2021) Early diabetes risk prediction based on deep learning methods. In: 2021 4th international conference on pattern recognition and artificial intelligence (PRAI), pp 282–286 4. Kumar L, Johri P (2022) Designing a health care application for prediction diabetes using LSTM, SVM and deep learning. In: 2022 2nd international conference on advance computing and innovative technologies in engineering (ICACITE), pp 1319–1323 5. Miazi ZA et al (2021) A cloud-based app for early detection of type II diabetes with the aid of deep learning. In: 2021 international conference on automation, control and mechatronics for Industry 4.0 (ACMI), pp 1–6 6. Xie J, Wang Q (2020) Benchmarking machine learning algorithms on blood glucose prediction for type I diabetes in comparison with classical time-series models. IEEE Trans Biomed Eng 67(11) 7. Theis J, Galanter WL, Boyd AD, Darabi H (2022) Improving the in-hospital mortality prediction of diabetes ICU patients using a process mining/deep learning architecture. IEEE J Biomed Health Inform 26(1) 8. Karthick K et al (2023) Iterative dichotomiser posteriori method-based service attack detection in cloud computing. Comput Syst Sci Eng 44(2):1099–1107 9. Zaitcev A, Eissa MR, Hui Z, Good T, Elliott J, Benaissa M (2020) A deep neural network application for improved prediction of HbA1c in Type 1 diabetes. IEEE J Biomed Health Inform 24(10) 10. Abbas H, Alic L, Rios M, Abdul-Ghani M, Qaraqe K (2019) Predicting diabetes in healthy population through machine learning. In: 2019 IEEE 32nd international symposium on computer-based medical systems (CBMS) 11. Somasundaram SK, Alli P (2017) A machine learning ensemble classifier for early prediction of diabetic retinopathy. J Med Syst 41(12)
Deep Learning-Based Continuous Glucose Monitoring with Diabetic …
497
12. Naveenkumar E et al (2022) Detection of lung ultrasound Covid-19 disease patients based convolution multifacet analytics using deep learning. In: 2022 second international conference on artificial intelligence and smart energy (ICAIS), pp 185–190 13. Choudhury A, Gupta D (2019) A survey on the medical diagnosis of diabetes using machine learning techniques. In: Recent developments in machine learning and data analytics. Springer, Singapore, vol 740 14. Alirezaei M, Niaki STA, Niaki SAA (2019) A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines. Exp Syst Appl 127 15. Ahmed U et al (2022) Prediction of diabetes empowered with fused machine learning. IEEE Access 10 16. Naveenkumar E et al (2022) Lung ultrasound COVID-19 detection using deep feature recursive neural network. Intell Sustain Syst 458 17. Daliya VK, Ramesh TK, Ko S-B (2021) An optimised multivariable regression model for predictive analysis of diabetic disease progression. IEEE Access 9 18. Wang W, Wang S, Wang X, Liu D, Geng Y, Wu T (2021) A glucose-insulin mixture model and application to short-term hypoglycemia prediction in the night time. IEEE Trans Biomed Eng 68(3) 19. Aliberti A et al (2019) A multi-patient data-driven approach to blood glucose prediction. IEEE Access 7 ˇ 20. Sosunkeviˇc S, Rapalis A, Marozas M, Ceponis J, Lukoševiˇcius A (2019) Diabetic vascular damage: review of pathogenesis and possible evaluation technologies. IEEE Access 7
Performance Analysis of DeeplabV3+ Using State-of-the-Art Encoder Architectures for Waterbody Segmentation in Remote Sensing Images S. Adarsh, V. Sowmya, Ramesh Sivanpillai, and V. V. Sajith Variyar
Abstract Over the past few years, deep learning (DL) algorithms have dramatically increased in popularity, especially in the field of remote sensing. Image segmentation basically involves the detection and classification of individual objects within the image. In case of satellite images, the objects may be buildings, roads, vegetation, waterbodies, and so on. In the present work, DeepLabv3+ which is a state-of-the-art network is used to extract water bodies. It has an encoder–decoder architecture with atrous convolution between encoder and decoder. Encoder is used to extract highlevel information from the input image. This extracted information is then further used for the segmentation task. Quality of encoder architecture used therefore has significant impact on the result of segmentation task. The existing encoder architecture in DeepLabv3+ is Xception. Along with this, ResNet-50 and U-Net are the other two encoder architectures that are considered for this study. Keywords Image processing · Remote sensing · Segmentation · Deep learning
S. Adarsh (B) · V. Sowmya · V. V. Sajith Variyar Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] V. Sowmya e-mail: [email protected] V. V. Sajith Variyar e-mail: [email protected] R. Sivanpillai Wyoming GIS Center, University of Wyoming, Laramie, WY 82071, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_34
499
500
S. Adarsh et al.
1 Introduction Neural networks have shown its importance in a variety of computer vision tasks like image segmentation [1], object detection [2], classification [3], and so on. Image segmentation is the method in which an image is broken into segments for further analysis and processing. Here, each pixel in the image will be assigned with a label. In case of remote sensing, imagery segmentation has been used in various applications and has been a major research issue for decades. With the success of deep learning approaches in the field of computer vision, researchers have made significant efforts to transfer its better performance to the field of remote sensing analysis. Performing image segmentation on remote sensing images has several challenges. Remote sensing images can be obtained at different scales and resolution. This variation in the scale and resolution can lead to loss of information and even cause difficulty in distinguishing different objects with almost the same spectral characters. Complex background is another major issue where the presence of cloud, shadow, and other factors can cause obstruction to the region of interest, thereby affecting the segmentation task. As the development of segmentation algorithms keeps progressing, they have been used to address a variety of data-rich remote sensing problems. Some of the examples of segmentation in remote sensing images include small objects from urban images [4, 5], buildings [6], crop fields [1, 7, 8], oil pads [9], tree species in forests [10], and water bodies [11, 12]. The remaining sections of the paper are organized as follows. In Section II, we discuss the related works followed by proposed work in Section III. Section IV presents the results analysis, and finally in Section V, the conclusion and future scope of this work are discussed.
2 Related Work This section discusses the previous works that are done in the field of remote sensing and image segmentation. A novel CNN-based architecture named DeepWaterMap for water segmentation in Landsat images was proposed by Furkan et al. [11]. The model can be used to separate water from land, snow, ice, and shadows. In order to learn the pattern of water bodies, the model makes use of data drawn from across the world. Global land cover facility (GLCF) inland water dataset and Landsat 7 ETM+ from GCS200 collection were the datasets used for this study. The model had an F1 score of 0.87, 0.9, and 0.9 for DeepWaterMap with 1, 3 and 5 convolution layers, respectively. Dong et al. [12] applied a sub-neighbor system convolutional neutral network (SNS-CNN) for water–land segmentation to achieve better water area feature description. An optimized up-sampling based on U-Net was done to enhance water area. In order to make the water mask coherent, a SNS constrained loss function is used. The
Performance Analysis of DeeplabV3+ Using State-of-the-Art Encoder …
501
dataset used for this study includes 200 remote sensing images which were collected from Google Earth. These images were having a spatial resolution of 1.0–2.0 m. The model had better results compared to U-Net, local binary pattern (LBP), and definition circle (DC) model with precision of 97.39%, recall of 96.07%, and overall accuracy of 96.20%. Dymtro et al. [13] applied U-Net model for forest and water body segmentation in satellite images. The main aim of this work was to assist in monitoring deforestation, flood, and drying of lakes over time. The dataset used for this work was taken from Kaggle. The dataset for forest was taken from the DeepGlobe challenge. The model performed well with a validation accuracy of 82.55% for forest and 82.92% for water area. Multi-scale feature extraction is a crucial part of multi-spectral image segmentation. Its importance in feature extraction for earth surface has been demonstrated in many recent studies [14]. Majority of the works in the above literature makes use of normal convolution, thereby failing to capture multi-scale information. Atrous convolution is a type of convolution that uses different rates to capture multi-scale information. DeepLabv3+ is one such model that makes and uses atrous convolution. It has an encoder–decoder architecture with atrous convolution in between them. The main purpose of the encoder is the extraction of feature map. Quality of these feature maps plays an important role in the segmentation result. Therefore, the main objective of this work is to analyze the performance of DeepLabV3+ model for water segmentation using state-of-the-art encoder architectures. It can be considered as a challenging task [15] due to various reasons like shape of water body, color of water, presence of cloud, background reflection, and much more [16].
3 Proposed Work 3.1 Dataset Collection and Description The dataset used for this work was collected from Kaggle. It was published by Francisco Escobar under the name “Satellite image of water bodies” [17]. It is a publicly available dataset and consists of 2841 RGB images and corresponding mask. The images were captured by Sentinel-2 satellite. The pixel sizes of the image’s range from 69 × 5 to 6683 × 5640, and this collection can be downloaded as a single compressed file of size 247 MB. The ground truth mask for Sentinel 2 A/ B satellites is generated using Normalized Difference Water Index (NDWI) [18]. Pixels corresponding to water are depicted in white and the rest (background) in black. Around 95% of images are of high quality. A sample RGB image and the corresponding mask are shown in Fig. 1.
502
S. Adarsh et al.
Fig. 1 RGB satellite image and mask image
3.2 Dataset Preprocessing Most of the images in the dataset is having “No data region” in it. No data regions are different from background class and do not contain any essential information and are different from the background class. The presence of these images in training can cause model to learn from mistakes. Figure 2 shows an example of an image-mask pair with no data region. Such images containing no data region either need to be removed completely or cropped as per need. The no data region present in input image and corresponding mask are removed using VGG Annotator [19]. The next task is to remove images which have mismatch between image-mask pair. Figure 3 shows an example of a pair of image-mask having mismatch. Majority of background class around the water area in this image is represented as part of water class in the corresponding mask. All the image-mask pairs are manually inspected, and those with mismatch are removed. This is followed by the removal of image and mask having size less than 100 * 100. Finally, 100 * 100 patches are created. A total of 9878 subsamples are taken for study.
3.3 Methodology The block diagram of methodology is shown in Fig. 4. After the dataset preprocessing stage, a total of 9878 subsamples are taken for study. These images are then divided into train and test in 9:1 ratio. 988 images are taken for testing and remaining 8890 for training. These training images are then divided into 8:2 ratio for training and validation set. A total of 1778 images are taken for validation and 7112 for training. These are then used for training and evaluating the models. DeepLabv3+ is the base architecture that is considered for the study. It is a segmentation algorithm that has encoder–decoder architecture. The encoder layer extracts
Performance Analysis of DeeplabV3+ Using State-of-the-Art Encoder …
503
Fig. 2 Example of no data region (the black boundary around the RGB image is the no data region which can be seen as white boundary around the mask)
Fig. 3 Example of image-mask mismatch
the texture information from input image and thereby reduces spatial dimension. This output is passed through the decoder part that then recovers the spatial dimension by up-sampling the image. The result of the segmentation task depends on the quality of feature maps extracted from the encoder part. Along with Xception [20] which is the existing encoder architecture of DeepLabv3+, ResNet-50 [21], and U-Net [22, 23] are also used for this study. As the number of layers of neural network increases, the accuracy level may get saturated and gradually decrease after a particular stage. This decrease is caused due to the problem of vanishing/exploding gradient. ResNet-50 is known for its skip connection which helps it to resolve the vanishing gradient problem. U-Net on the other hand has an expansive path and contractive path and can be trained with lesser number of images. These architectures are commonly used for most of the image processing task. The output from these encoders is then fed
504
S. Adarsh et al.
Fig. 4 Flow graph of methodology
into the atrous spatial pyramid pooling module (ASPP). ASPP helps in capturing contextual information at different scales. Here atrous convolution at multiple rates are applied on features generated from encoder. All the three models (Xception, ResNet-50, and U-Net) are trained for 50 epochs with batch size 4. The learning rate is set to 0.001, and loss function is binary cross entropy. The weight and bias are adjusted using Adam optimizer. Atrous convolution rate (r) is assigned with values 1, 2, and 3 as the input images are smaller. The total number of trainable parameters for Xception, ResNet-50, and U-Net is 41,044,133, 181,321,397, and 2,466,803, respectively. All the three models are trained using the above-mentioned parameters in HPC with ncpus = 25 and memory = 100 gb. An inference time of 3 to 4 h is taken for training.
4 Results Analysis The training and validation loss for all three models are shown in Figs. 5, 6, and 7. In the Xception model, for the first five epochs, the training loss is around 0.2 which then gradually stabilized around 30th epoch as seen in Fig. 5. The training loss got stabilized around 10th epoch for ResNet-50 and around 25th epoch for U-Net and this is shown in Figs. 6 and 7, respectively. The performance of DeepLabv3+ models is tested using 988 images. The 988 images are divided into two batches batch 0 (with water) containing 350 images and batch 1 (without water) containing 638 images. The inference time for 988 test
Performance Analysis of DeeplabV3+ Using State-of-the-Art Encoder … Fig. 5 Training loss and validation loss for the Xception model
Fig. 6 Training loss and validation loss for the ResNet-50 model
Fig. 7 Training loss and validation loss for the U-Net model
505
506
S. Adarsh et al.
Table 1 Results of all three models on test data (Batch 0) Model
F1
Jaccard
Precision
Recall
Xception
0.30
0.26
0.28
0.36
ResNet-50
0.32
0.29
0.29
0.37
U-Net
0.32
0.29
0.30
0.36
images is varying from 5 to 7 min. The result of all three models on test data (Batch 0) are shown in Table 1. The models were evaluated using evaluation metrics like F1 score, Jaccard, precision, and recall. True positive (TP), False positive (FP), and False negative (FN) are used for computing these metrices. The equations for all these metrices are as follows: F1 Score = (2 × TP)/[(2 × TP) + (FP) + (FN)]
(1)
Jaccard = (TP)/[(TP) + (FP) + (FN)]
(2)
Precision = (TP)/[(TP) + (FP)]
(3)
Recall = (TP)/[(TP) + (FN)]
(4)
The F1, Jaccard, precision, and recall values are 0 for Batch 1 images as none of the images have water class in it. The results on test data are less for all the three models. A result analysis is performed to understand the reason for the poor performance. For this, the batch 0 images are further divided into five batches, namely Batch A, Batch B, Batch C, Batch D, and Batch E based on the percentage of water as shown in Table 2. From this table, we can clearly see that out of 350 images with water, 204 contain water less than 1%. All the three models are then tested with these three batches of images and the results are shown in Table 3. For all three models F1, Jaccard, precision, and recall values are low when the percentage of water in image is less. This is because the performance of the model increases with the increase in the percentage of water present in the image. When the percentage of water is greater than 10%, we can clearly see that ResNet-50 and U-Net have better result compared to Xception. When the region of interest (water) is small, the models may not have enough information to accurately segment the region. In such case, all the models usually misclassify the pixels around the boundary of region of interest. As the percentage of pixels pertaining to the region of interest increases, the model is able to capture the features and thereby improve the performance of water segmentation. This may be the reason for all three models having low results when the percentage of water is less and improved results when the percentage of water pixel is more.
Performance Analysis of DeeplabV3+ Using State-of-the-Art Encoder … Table 2 Test data (Batch 0) divided into five batches based on water content
507
Batches
Water percentage
Count of images
Batch A
> 0 to ≤ 1
204
Batch B
> 1 to ≤ 10
67
Batch C
> 10 to ≤ 30
59
Batch D
> 30 to ≤ 60
18
Batch E
> 60 to ≤ 100
2
Apart from 988 test images, a different set of 84 images is taken for testing the trained model. The 84 images contain water content having water in black, blue, green, and white color along within land area containing buildings and even forest. Figure 8 shows the example of models performance on new test data. All three models performed equally well on images having blue color, black color, and white color water as shown in rows 1, 4, and 8 of Fig. 8. In case of agriculture land, as shown in row 2 of Fig. 8, ResNet-50 correctly predicted the entire land area, whereas Xception and U-Net predicted the presence of water even when there is no water. In case of images with the presence of cloud, the Xception model outperformed every other model as shown in row 5 of Fig. 8. Even the NDWI-generated original mask is unable to show the presence of cloud in this case. ResNet-50 and U-Net performed well in case of images with green color water and buildings, whereas Xception model failed in both these cases, and this is shown in rows 3 and 6 of Fig. 8. All the three models performed correctly in case of images with buildings as depicted in row 7 of Fig. 8.
5 Conclusion and Future Scope In this work, performance analysis of DeepLabv3+ for water body segmentation using state-of-the-art encoder architectures like Xception, ResNet-50 and U-Net is performed. On analyzing the performance of the models, it is found that all the three models are performing poor when the percentage of target class (water) is less. As water percentage is increasing, the performance of the models is also improving. The performance of all the model depends on the percentage of water present in the image. When water percentage (region of interest) is less, the models will not be able to accurately segment the water due to misclassification of pixels in boundary region, thereby resulting in poor performance. Xception-based model closely performs better compared to ResNet-50 and U-Net when the water percentage is less (Batch A and Batch B). In case of Batch C images with moderate amount of water, ResNet-50 and U-Net performed well. For the remaining batches D and E, all three models performed equally well. The focus of future work should be on analyzing the cases where the model has poor prediction. Trying out new image segmentation algorithms as well as incorporating attention modules in the existing model can be explored. Use of better-quality remote sensing images can also help in improving the result.
Recall
0.976
Batch E
0.953
0.889
0.731
0.825
0.94
Batch C
Batch D
0.015
0.436
0.02
0.522
Batch A
0.022
0.976
0.907
0.771
0.483
0.016
0.976
0.979
0.946
0.706
0.018
0.988
0.964
0.926
0.508
0.976
0.931
0.875
0.435
0.014
Jaccard
ResNet-50
Precision
F1
Jaccard
Xception
F1
Batch B
Batches
0.983
0.939
0.889
0.449
0.015
Precision
Table 3 Performance comparison of Xception, ResNet50, and U-Net on all batches of water
0.992
0.99
0.983
0.705
0.029
Recall
U-Net
0.987
0.964
0.928
0.508
0.019
F1
0.974
0.931
0.871
0.426
0.015
Jaccard
0.98
0.934
0.901
0.482
0.019
Precision
0.993
0.996
0.963
0.684
0.025
Recall
508 S. Adarsh et al.
Performance Analysis of DeeplabV3+ Using State-of-the-Art Encoder …
509
Fig. 8 Performance of Xception, ResNet-50, and U-Net on new test data. The test images include blue color water, cropping land, green water, black water, presence of cloud, forest, buildings, and white color water
Acknowledgements Authors thank Prof. K. P. Soman, Head, Center for Computational Engineering and Networking (CEN) at Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu, and colleagues Mr. Bichu George and Ms. Gosula Sunandini for their valuable support in dataset preprocessing task.
References 1. Menon AK, Sajith Variyar VV, Sivanpillai R, Sowmya V, Brown GK, Soman KP (2022) Epiphyte Segmentation using DRU-Net. In: Data engineering and ıntelligent computing: proceedings of 5th ICICC 2021, vol 1, pp 101–108. Springer, Singapore 2. Giri A, Sajith Variyar VV, Sowmya V, Sivanpillai R, Soman KP (2022) Multiple oil pad detection using deep learning. In: The international archives of the photogrammetry, remote sensing and spatial ınformation sciences, vol 46, pp 91–96
510
S. Adarsh et al.
3. Gouda N, Amudha J (2020) Skin cancer classification using ResNet. In: 2020 IEEE 5th international conference on computing communication and automation (ICCCA), pp 536–541. IEEE 4. Kampffmeyer M, Salberg AB, Jenssen R (2016) Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1–9 5. Dong R, Pan X, Li F (2019) DenseU-net-based semantic segmentation of small objects in urban remote sensing images. IEEE Access 7:65347–65356 6. Yi Y, Zhang Z, Zhang W, Zhang C, Li W, Zhao T (2019) Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network. Remote Sens 11(15):1774 7. Du Z, Yang J, Ou C, Zhang T (2019) Smallholder crop area mapped with a semantic segmentation deep learning method. Remote Sens 11(7):888 8. Huang L, Wu X, Peng Q, Yu X (2021) Depth semantic segmentation of tobacco planting areas from unmanned aerial vehicle remote sensing images in plateau mountains. J Spectros 1–14 9. Shi P, Jiang Q, Shi C, Xi J, Tao G, Zhang S, Wu Q (2021) Oil well detection via large-scale and high-resolution remote sensing ımages based on ımproved YOLO v4. Remote Sens 13(16):3243 10. Dechesne C, Mallet C, Le Bris A, Gouet-Brunet V (2017) Semantic segmentation of forest stands of pure species combining airborne lidar data and very high resolution multispectral imagery. ISPRS J Photogram Remote Sens 126:129–145 11. Isikdogan F, Bovik AC, Passalacqua P (2017) Surface water mapping by deep learning. IEEE J Select Top Appl Earth Observ Rem Sens 10(11):4909–4918 12. Dong S, Pang L, Zhuang Y, Liu W, Yang Z, Long T (2019) Optical remote sensing water-land segmentation representation based on proposed SNS-CNN network. In: IGARSS 2019–2019 IEEE ınternational geoscience and remote sensing symposium, pp 3895–3898. IEEE 13. Filatov D, Yar GNAH (2022) Forest and water bodies segmentation through satellite ımages using U-Net. arXiv preprint arXiv:2207.11222 14. Guo H, He G, Jiang W, Yin R, Yan L, Leng W (2020) A multi-scale water extraction convolutional neural network (MWEN) method for GaoFen-1 remote sensing images. ISPRS Int J Geo-Inform 9(4):189 15. Dan LI, Baosheng WU, Bowei CHEN, Yuan XUE, Yi ZHANG (2020) Review of water body information extraction based on satellite remote sensing. J Tsinghua Univ (Sci Technol) 60(2):147–161 16. Cheng G, Xie X, Han J, Guo L, Xia GS (2020) Remote sensing image scene classification meets deep learning: challenges, methods, benchmarks, and opportunities. IEEE J Select Top Appl Earth Observ Remot Sens 13:3735–3756 17. Satellite Images of WaterBodies-Kaggle. https://www.kaggle.com/datasets/franciscoescobar/ satellite 18. McFeeters SK (1996) The use of the normalized difference water ındex (NDWI) in the delineation of open water features. Int J Remote Sens 17(7):1425–1432 19. Dutta A, Zisserman A (2019) The VIA annotation software for images, audio, and video. In: MM 2019—proceedings of the 27th ACM ınternational conference on Multimedia, pp 2276–2279 20. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258 21. He K, Zhang X, Ren S, Sun (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. IEEE 22. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical ımage computing and computer-assisted ıntervention–MICCAI 2015: 18th international conference, Munich, Germany, 5–9 Oct 2015, Proceedings, Part III 18, pp 234–241. Springer International Publishing
Performance Analysis of DeeplabV3+ Using State-of-the-Art Encoder …
511
23. Darshik AS, Dev A, Bharath M, Nair BA, Gopakumar G (2020) Semantic segmentation of spectral images: a comparative study using FCN8s and U-NET on RIT-18 dataset. In: 11th international conference on computing, communication and networking technologies (ICCCNT), pp 1–6. IEEE
An Exploratory Comparison of LSTM and BiLSTM in Stock Price Prediction Nguyen Q. Viet, Nguyen N. Quang, Nguyen King, Dinh T. Huu, Nguyen D. Toan, and Dang N. H. Thanh
Abstract Forecasting stock prices is a challenging topic that has been the subject of many studies in the field of finance. Using machine learning techniques, such as deep learning, to model and predict future stock prices is a potential approach. Long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) are two common deep learning models. The finding of this work is to discover which activation function and which optimization method will influence the performance of the models the most. Also, the comparison of closely related models: vanilla RNN, LSTM, and BiLSTM to discover the best model for stock price prediction is implemented. Experimental results indicated that BiLSTM with ReLU and Adam method achieved the best performance in the prediction of stock price. Keywords Stock price · Time series · Recurrent neural network · Long short-term memory · Bidirectional long short-term memory
N. Q. Viet · N. N. Quang · N. King · D. T. Huu · N. D. Toan · D. N. H. Thanh (B) College of Technology and Design, University of Economics Ho Chi Minh City (UEH), Ho Chi Minh City 700000, Vietnam e-mail: [email protected] N. Q. Viet e-mail: [email protected] N. N. Quang e-mail: [email protected] N. King e-mail: [email protected] D. T. Huu e-mail: [email protected] N. D. Toan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_35
513
514
N. Q. Viet et al.
1 Introduction Stock price prediction is the technique of forecasting future stock prices using the previous stock market data and other pertinent information. The importance of stock price prediction stems from the fact that it may give useful insights for investors and traders in making educated stock market decisions. The problem of stock price prediction is among the most intricate and important tasks in finance and economics. The stock market is elaborate and dynamic and is influenced by a variety of factors, including economic indicators, company-specific news, investor sentiment [1–4], and pandemic [5, 6]. The task of predicting stock prices is challenging because it requires understanding and modeling the relationships between these factors and stock prices. As stock market performance is generally regarded as a reflection of an economy’s overall health [7], stock price prediction has a range of effects on macroeconomics. The future performance of businesses and the economy as a whole may be usefully predicted by accurate stock price predictions, which can then be used to inform investment and policy decisions. Investment decisions can be influenced by stock price forecasts, which can then have an effect on macroeconomics. More investment in such companies might encourage economic development in that region if stock price estimates indicate that a certain industry or group of businesses is anticipated to do well in future. On the other hand, if stock price predictions indicate that a certain industry or set of businesses is likely to do poorly in future, this might result in less investment in those businesses and a decline in economic activity in that industry. Forecasts for stock prices can also affect the monetary policy decisions made by central banks [8]. In order to foster economic development and encourage borrowing and spending, the central bank may lower interest rates if stock market expectations indicate that the economy would stall. The central bank may raise interest rates in order to calm the economy and limit borrowing and expenditure if, on the other hand, stock price projections indicate that there is a danger of inflation and that the economy is overheating. Due to the importance of stock price prediction in the economy, the goal of the study focuses on exploratory analysis of effective solutions to this problem. The contributions are (1) analyzing and comparing the performance of LSTM and BiLSTM in stock price prediction; (2) finding the optimal method that suits BiLSTM; (3) finding the best suitable activation function for BiLSTM.
2 Literature Review In recent years, machine learning especially deep learning has emerged as a potent instrument for predicting stock prices. It has been used in a range of stock price prediction tasks, including projecting future prices, recognizing trends, and discovering data patterns [9–11].
An Exploratory Comparison of LSTM …
515
In [10], the authors analyzed two ways using four traditional machine learning models. The first approach of data input uses stock trading data to identify 10 technical limitations, whereas the second method focuses on reproducing these characteristics using static decision data. For each of these two input techniques, the correctness of each speculation model is examined. In [12], the authors suggested using an ARIMA-GARCH-NN, a machine learning technique, to identify internal patterns and forecast the stock market. They examine high-frequency stock market data in the US using explicit methodologies and procedures, sensory networks, and fundamental financial pricing models to assure variety. The results offer the first sign of a market shock. They also guarantee the appropriate operation of ARIMA-GARCH-NN by identifying patterns in huge datasets without depending on reliable distribution estimations. Their technique successfully blends the advantages of conventional financial procedures with “data-driven” methodologies in order to reveal hidden patterns in vast volumes of financial data. In the study, they suggest and put into practice ARIMA-GARCH-NN, a data-driven approach, to address the challenging issue of stock market prediction. In [13], the authors proposed the multi-filters neural network (MFNN) model, which removes financial time series characteristics and forecasts stock values. This model employs convolutional and duplicate neurons to give multiple filter formats, allowing the use of data from varied sources and market hypotheses. The authors tested the model with the data of CSI 300 of the Chinese stock market. In [14], the authors demonstrated a neural network model that forecasts stock price changes using historical data and publicly available information. In order to choose the optimal target stock for market structure and trading information, the model uses a graph of data and methodologies. The model employs a method called graph embedding to compress the data and attach it to a convolutional neural network to detect investing styles in order to manage the vast quantity of data and complexity. In order to anticipate stock price patterns and track attentiveness, the model also makes use of a short-term memory network, which can help with financial decisionmaking. The findings also demonstrated the model’s robustness and performance. The model’s claimed prediction accuracy is reported to be 70% or higher, which is greater than existing approaches. In [15], the authors proposed the use of CNN to categorize investor attitudes from stock forums, while LSTM neural networks are used to assess technical stock market indicators. Three different time intervals of real-world data from six important businesses on the Shanghai Stock Exchange (SSE) were used to evaluate the model. The findings demonstrate that the proposed approach outperforms the previous algorithms, even ones lacking sentiment analysis, in categorizing investor feelings and making stock price predictions. Each of the above methods has its own advantages and limitations. Traditional statistical and econometric techniques are simple to implement and interpret but may not be able to capture non-linear relationships or complex interactions between variables. Machine learning techniques are able to model non-linear patterns and complicated interconnections between different features. Traditional machine learning models predicted stock prices not really well, especially, in case, there are unexpected fac-
516
N. Q. Viet et al.
tors. In the deep learning world, LSTM and BiLSTM are powerful models for stock price prediction. In that, BiLSTM uses several pairs of LSTM layers: a forward LSTM and a backward LSTM, to produce the data. There are several research gaps in using LSTM and BiLSTM for stock price prediction: what activation function is the best for these architectures in stock price prediction, what optimization method will produce the model best, and lacking the analysis and comparison of the two models. Therefore, this study focuses on solving these issues and exploring insight into the models.
3 Data Data collection for stock price prediction involves gathering historical stock prices and trading data. The data is crawled from Yahoo Finance and stored it in a CSV format file. The extracted dataset includes open price, high price, low price, adjusted close price (Fig. 1) as well as the volume of Apple Inc. stocks that were updated daily during the period of 5 years, ranging from 22nd January 2018 to 22nd January 2023, where – The open price of a stock is the price at which the stock started trading for the current trading day. The exchange determines the open price at the beginning of trading based on the previous day’s closing price and other market factors. – The closing price is the last price at which it is traded on a stock exchange throughout the course of a trading day. The exchange determines the closing price at the conclusion of trading and is based on the final transaction that occurred during the trading day. It is used to compute the net change and percentage change in the stock’s value during the course of the trading day, as well as the value of the stock at the conclusion of the trading day. – The highest price at which a stock traded a certain trading day or period of time is referred to as its high price. It is used to represent the stock’s highest price throughout that time period. High prices can help traders, investors, and analysts analyze and make choices about stock performance. – The adjusted closing price of stock takes into account company activities such as stock splits, dividends, and other special events that may affect the stock’s price. The adjusted close price is used to calculate a stock’s historical return and is said to be a more accurate representation of the stock’s performance than the ordinary closing price. This price is used to correct these occurrences and provide a more realistic view of the stock’s performance over time. – The volume of stock refers to the number of shares traded during a certain trading day or period of time. It is used to show a stock’s degree of activity and liquidity. High volume usually implies a high degree of interest in a stock, while low volume may imply a lower level of interest. Stock volume may be a useful statistic for traders, investors, and analysts for evaluating stock performance, making choices, and forecasting future price movements.
An Exploratory Comparison of LSTM …
517
Fig. 1 Apple Inc. adjusted close stock price from 2018 to 2023 Table 1 First ten rows of the Apple stock price dataset Date Open High Low Close 2018-01-22 2018-01-23 2018-01-24 2018-01-25 2018-01-26 2018-01-29 2018-01-30 2018-01-31 2018-02-01 2018-02-02
44.325001 44.325001 44.312500 43.627499 43.000000 42.540001 41.382500 41.717499 41.792500 41.500000
44.445000 44.860001 44.325001 43.737499 43.000000 42.540001 41.842499 42.110001 42.154999 41.700001
44.150002 44.205002 43.299999 42.632500 42.514999 41.767502 41.174999 41.625000 41.689999 40.025002
44.250000 44.259998 43.555000 42.777500 42.877499 41.990002 41.742500 41.857498 41.945000 40.125000
Adj close
Volume
42.077320 42.086819 41.416439 40.677120 40.772205 39.928280 39.692936 39.802284 39.885494 38.154842
108434400 130756400 204420400 166116000 156572000 202561600 184192800 129915600 188923200 346375200
– The low price is the lowest price that a stock reached during a trading day. It is one of several indicators of a stock’s performance, along with the open price, close price, and high price. These prices provide insight into how the stock’s value has changed over a specific period of time and can be used to determine the stock’s trend during the trading day and its volatility. The first ten rows of the Apple stock price dataset are shown in Table 1. It is important to note that columns: open, close, low, high, adj close, in the dataset have a strong dependency. This dependency reflects in the strong correlation. This matter is easy to understand because they depend on the change in stock price on a trading day. However, the most important variable for stock price prediction is the adjusted close price variable. Therefore, this work will only consider the data of the adjusted close price. Generally speaking, the considered problem is the univariate regression of time series.
518
N. Q. Viet et al.
4 LSTM and BiLSTM Models 4.1 LSTM Architecture The input layer has the responsibility to receive the new input data, which in the case of stock price prediction would typically be historical stock prices, trading volume, and other relevant financial indicators. The input layer can also be augmented with additional features such as technical indicators or economic indicators. . X t is passed through a linear transformation represented by a weight matrix .Wi and a bias vector .bi , resulting in a new representation . Z i = Wi ∗ X t + bi , where operator .∗ denotes convolution. Figure 2 shows the hidden layers responsible for processing the input data and passing it through the LSTM cells [16]. Typically, there are one or more hidden layers in an LSTM model. Each hidden layer .i, where i = 1, 2, ..., L − 1 contains multiple LSTM cells. Each LSTM cell in the .i th layer takes as input the output from the .(i − 1)th layer and the previous hidden state .h t−1 , and a new hidden state .h t is then generated and output .ot . Mathematically, this can be represented as follows: Input gate: .i t = σ (Wi x t + Wi h t−1 + bi ) Forget gate: f = σ (W f xt + W f h h t−1 + b f )
. t
Output gate: o = σ (Wo xt + Wo h t−1 + bo )
. t
Candidate memory cell: c˜ = tan h(Wc xt + Wc h t−1 + bc )
. t
Current memory cell: c = f t ʘ ct−1 + i t ʘ c˜t
. t
Current hidden state: h = ot ʘ tan h(ct )
. t
where .xt is the input at time step .t, .h t is the hidden state at time step .t, .ct is the memory cell at time step .t, .σ is the Sigmoid activation function, .ʘ is the elementwise product, .W is weight matrices, .b is bias vectors. Each LSTM cell consists of memory cell, input and output gates, and forget gate. The memory cell is in charge of storing data from earlier time steps. The input gate and output gate regulate the flow of data into/out of the memory cell. The forget gate adjusts the amount of data of the previous cell state .ct−1 , and then, the amount of the prior data should be forgotten. The forget gate computes a value
An Exploratory Comparison of LSTM …
519
Fig. 2 LSTM cell
between 0 and 1 that is then used to weight the prior cell state using the previous hidden state .h t−1 , current input .xt , and the previous hidden state .h t−1 . Most of the previous cell states should be retained if the value is near to 1, whereas most of the previous cell states should be forgotten if the value is close to 0. Using the prior hidden state .h t−1 , the current input .xt , and a value between 0 and 1, the input gate calculates the amount of data of the new candidate cell state .c˜t should be added to the current cell state (.ct ). Using the previous hidden state .h t−1 , the current input .xt , and a value between 0 and 1, the output gate calculates how much of the current cell state .ct should be transmitted to the output .h t . The cell state is the LSTM cell’s internal memory. It is computed by first calculating a candidate cell state .c˜t , which is a combination of the current input .xt and the previous hidden state .h t−1 . The hidden state is the LSTM cell’s output at time step .t, which is transmitted to the next LSTM cell in the network. The number of neurons in each layer is a hyperparameter that must be calculated by trial and error and research. The number of neurons in each layer is normally dictated by the amount of input data and the problem’s complexity. The number of neurons in the input layer is typically equal to the number of features in the input data; however, the number of neurons in the hidden layers and output layer may be established by experimentation. More neurons in the hidden layers, in general, may assist to catch more complicated patterns in data, but it also raises computing needs and the danger of overfitting. As a result, the number of neurons in each layer must be carefully determined, taking into consideration the amount of the input data as well as the problem’s complexity. The number of layers in LSTM is likewise a hyperparameter that may be computed in the same manner.
520
N. Q. Viet et al.
4.2 Bidirectional LSTM The 1997 publication “bidirectional recurrent neural networks” by [17] are the first one to present the idea of Bidirectional LSTM (BiLSTM). The BiLSTM architecture was suggested by the authors of the research as a solution to enhance voice recognition tasks. The essential concept underlying the BiLSTM is the employment of two LSTM networks, one moving ahead and the other moving backward. While the backward-facing LSTM reads the input sequence from right to left, the forward-facing LSTM reads it from left to right. The BiLSTM can make predictions by combining the output of the two LSTMs, which allows it to include both past and future contexts. By feeding past stock market data into the BiLSTM network as input, the same principle of BiLSTM may be used to perform stock price prediction tasks. First, preprocessed historical stock market data is gathered, including stock prices, trade volume, and other financial indicators. After the stage of data preprocessing, the BiLSTM network then analyzes the preprocessed data in both forward and back-
Fig. 3 Architecture of BiLSTM model
An Exploratory Comparison of LSTM …
521
ward directions, accounting for both historical and recent data to extract meaningful features. While the backward-facing LSTM receives the input sequence from right to left, the forward-facing LSTM reads it from left to right. The final hidden states of the forward-facing and backward-facing LSTMs are combined and utilized as the final output of the BiLSTM network after the LSTMs have digested the input data. The output of the BiLSTM is then routed via a fully—connected layer, which generates a forecast of the stock’s future price. Multiple neurons may be included in the fully connected layer, which aids in mapping the output from the LSTM to the final prediction. The overall architecture of BiLSTM is presented in Fig. 3.
4.3 Model Evaluation In this research, to evaluate performance, six metrics: MAE, MAPE, MPE, MSE, R 2 , and RMSE [18] will be used.
.
5 Experimental Results and Discussion The loss of training data is the difference between the expected output and the actual outputs of the BiLSTM model when it is trained on the training dataset. The training data is the subset of the whole dataset that is used to train the model. During the training phase, the model is updated to minimize the loss of the training data after 90 epochs with Adam optimization method and ReLU activation function as in Figs. 4 and 5 Using six evaluation metrics such as root mean squared error (RMSE), mean percentage error (MPE), mean absolute percentage error (MAPE), mean squared error (MSE), mean absolute error (MAE), and . R 2 score-as indicated in Table 2, a comparison on the performance of three optimization techniques, Adam, RMSprop, and SGD for BiLSTM is presented. The findings indicate that while RMSprop and SGD have inferior performance with larger values for all four measures, Adam has the highest performance with the lowest values for all six metrics. In a similar vein, for BiLSTM, Tanh, ReLU, and Sigmoid are three considered activation functions. As shown in Table 3, ReLU performs the best and has the lowest values for all six measures, whereas Sigmoid and Tanh perform worse and have greater values for all six metrics. The rectified linear unit (ReLU) activation function performs much better in this situation since it has the lowest values regarding all six assessment measures. ReLU is a particular kind of activation function that only accepts positive values and resets all negative values to zero. ReLU is hence often applied in deep learning models. It is also computationally effective and does not experience the vanishing gradients issue, which is a problem with other activation functions like the Sigmoid and Tanh.
522
N. Q. Viet et al.
Fig. 4 Loss on training data (left) and loss on test data (right) after 90 epochs of BiLSTM
Fig. 5 Actual adjusted closing stock prices (blue) versus adjusted closing stock prices that are predicted by the BiLSTM model (red) Table 2 Comparison of different optimization methods Model RMSE MPE MAPE MSE Adam RMSprop SGD
3.4927 3.7757 8.6535
0.0028 0.01080 0.0255
0.0182 0.0205 0.0450
12.1992 14.2565 74.8843
MAE 2.7009 3.0363 6.8549
.R
2
score
0.8969 0.8795 0.3675
The bold indicated the best score. For RMSE, MPE, MAPE, MSE, MAE, they are the lowest values; for R2 score, it is the highest value
An Exploratory Comparison of LSTM …
523
Table 3 Comparison of BiLSTM with different activation functions Model RMSE MPE MAPE MSE ReLU Tanh Sigmoid
3.4927 3.5385 3.6059
0.0028 0.0038 .− 0.0005
0.0182 0.0185 0.0190
12.1992 12.5210 13.0031
MAE 2.7009 2.7445 2.8193
.R
2
score
0.8969 0.8942 0.8901
The bold indicated the best score. For RMSE, MPE, MAPE, MSE, MAE, they are the lowest values; for R2 score, it is the highest value Table 4 Comparison of BiLSTM and other models Model RMSE MPE MAPE BiLSTM LSTM Vanilla RNN
3.4927 3.6231 3.6539
0.0028 0.0021 0.0062
0.0182 0.0198 0.0197
MSE
MAE
12.1992 13.1273 13.3514
2.7009 2.9242 2.9062
.R
2
score
0.8969 0.8891 0.8872
The bold indicated the best score. For RMSE, MPE, MAPE, MSE, MAE, they are the lowest values; for R2 score, it is the highest value
That being said, the choice of activation function relies on the specific problem and dataset, and it’s possible that the ReLU or sigmoid activation functions could work better in other contexts. It’s always important to experiment with different activation functions and compare the results to find the best configuration for a particular problem. Finally, a comparison of Vanilla RNN, LSTM, and BiLSTM with the same Adam optimization method and ReLU activation function can be observed in Table 4. BiLSTM has the lowest error score regarding all of the four performance metrics, indicating that it is the best model. Table 4 suggests that BiLSTM performs the best among the models listed, based on the evaluation metrics provided. The values of RMSE, MPE, MAPE, MSE, MAE, and. R 2 are lowest for BiLSTM, specifically 3.4927, 0.0028, 0.0182, 12.1992, 2.7009, and 0.8969, respectively, indicating that it has the best performance, LSTM is the second-best model, followed by Vanilla RNN. Generally speaking, experimental results showed that BiLSTM is efficient for stock price prediction. The performance of the model will achieve the best with Adam optimization method and ReLU activation function.
6 Conclusions In conclusion, BiLSTM is able to effectively model long-term dependencies. The finding showed that BiLSTM works best with the Adam optimization algorithm in combination with ReLU activation function. Additionally, BiLSTM has the most outstanding performance compared to other deep learning models such as Vanilla RNN and LSTM. BiLSTM is also capable of capturing complex and non-linear patterns of Apple stock prices in the five-year period. Since BiLSTM processes data
524
N. Q. Viet et al.
in two directions, it is more effective than LSTM, especially for stock price data. However, this function causes BiLSTM little slower than LSTM. Acknowledgements This research is funded by University of Economics Ho Chi Minh City (UEH), Vietnam.
References 1. Barsky RB, De Long JB (1993) Why does the stock market fluctuate? Quart J Econ 108(2):291– 311 2. Mitchell ML, Mulherin JH (1994) The impact of public information on the stock market. J Financ 49(3):923–950 3. Qi M, Maddala G (1999) Economic factors and the stock market: a new perspective. J Forecas 18(3):151–166 4. Hondroyiannis G, Papapetrou E (2001) Macroeconomic influences on the stock market. J Econ Financ 25(1):33–49 5. Mazur M, Dang M, Vega M (2021) Covid-19 and the March 2020 stock market crash. evidence from s&p1500. Financ Res Lett 38:101690 6. Baker SR, Bloom N, Davis SJ, Kost K, Sammon M, Viratyosin T (2020) The unprecedented stock market reaction to covid-19. Rev Asset Pricing Stud 10(4):742–758 7. Masoud NM (2013) The impact of stock market performance upon economic growth. Int J Econ Financ Iss 3(4):788–798 8. Li YD, ˙I¸scan TB, Xu K (2010) The impact of monetary policy shocks on stock prices: evidence from canada and the united states. J Int Money Financ 29(5):876–896 9. Leung CKS, MacKinnon RK, Wang Y (2014) A machine learning approach for stock price prediction. In: Proceedings of the 18th international database engineering & applications symposium, pp 274–277 10. Patel J, Shah S, Thakkar P, Kotecha K (2015) Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Exp Syst Appl 42(1):259–268 11. Ampomah EK, Qin Z, Nyame G (2020) Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement. Information 11(6):332 12. Sun J, Xiao K, Liu C, Zhou W, Xiong H (2019) Exploiting intra-day patterns for market shock prediction: a machine learning approach. Exp Syst Appl 127:272–281 13. Long W, Lu Z, Cui L (2019) Deep learning-based feature engineering for stock price movement prediction. Knowl Based Syst 164:163–173 14. Long J, Chen Z, He W, Wu T, Ren J (2020) An integrated framework of deep learning and knowledge graph for prediction of stock price trend: an application in chinese stock exchange market. Appl Soft Comput 91:106205 15. Jing N, Wu Z, Wang H (2021) A hybrid model integrating deep learning with investor sentiment analysis for stock price prediction. Exp Syst Appl 178:115019 16. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 17. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681 18. Mitra A, Jain A, Kishore A, Kumar P (2022) A comparative study of demand forecasting models for a multi-channel retail company: a novel hybrid machine learning approach. Oper Res Forum 3(4):58 [Online]. Available https://doi.org/10.1007/s43069-022-00166-4
Forecasting Intraday Stock Price Using Attention Mechanism and Variational Mode Decomposition R. Arul Goutham, B. Premjith, M. Nimal Madhu, and E. A. Gopalakrishnan
Abstract Stock price forecasting is a prominent topic in quantitative finance, as accurate prediction is essential due to the complexity of the market. This research work employs variational mode decomposition (VMD) to decompose stock data into several variational modes, further used to train a long short-term memory (LSTM) network with attention mechanism. The primary goal of this research is to enhance the accuracy of stock price prediction by exploring the effectiveness of VMD and attention mechanism techniques. From the experiment analysis, the efficacy of VMD is quantified as mean absolute error (MAE) score-163.91 and root mean square error (RMSE) score-192.39 from the results of LSTM with VMD. The efficacy of the attention mechanism is quantified as MAE score-94.16 and RMSE score-117.12 of from the results of VMD + LSTM + attention. The experimental results indicate that the application of VMD and attention mechanism to an LSTM model leads to improved predictions. Keywords Variational mode decomposition · Long short-term memory networks · Attention mechanism · Stock price prediction · Forecasting · Time series analysis
R. Arul Goutham · B. Premjith (B) · M. Nimal Madhu Center for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] R. Arul Goutham e-mail: [email protected] M. Nimal Madhu e-mail: [email protected] E. A. Gopalakrishnan Amrita School of Computing, Amrita Vishwa Vidyapeetham, Banglore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_36
525
526
R. Arul Goutham et al.
1 Introduction Based on historical stock data, intraday stock price forecasting predicts future stock price movements. The Chartist approach theorizes that securities prices are not arbitrary but can be forecasted by examining historical patterns and technical analysis techniques [1]. An illustrative explaination of intraday stock market investment would be like, a trader analyze the stock prices of a company over a period of time and identify specific chart patterns, such as shooting star and hammer, bull run and bearish candle, or double bottom. Based on the historical occurrence of these patterns, the trader might make a prediction about the future price movement of the stock and take appropriate action to achieve the ultimate goal of making profit on the decision he has made. One of the main benefits of intraday stock price forecasting is that it can provide investors with an understanding of how the market will move over a period of time regarding the predicted values. So that the investors can forecast the level of uncertainty in the market and make more profit. This can lead to market improvement such that the market will move steadily. Applications of LSTM and hybrid models with various decomposition techniques with LSTM and RNN architectures to predict future stock price were reported in [2–5]. One of the frequently used models is LSTM [6–8], as it can learn long-term dependencies based on historical stock prices. VMD is an algorithm for decomposing an input signal into a set of modes [9, 10], and the resultant modes can be considered as the feature representation of the input signal. In this work, we used the VMD approach because the decomposed modes can capture the characteristics of stock prices like one mode might represent the overall trend of the stock prices, while another mode might capture the seasonal fluctuations in the stock prices. By decomposing the signal into these modes, we can extract the key features of the stock price dataset in a more interpretable and meaningful way. This paper aims to incorporate the advantages of VMD mentioned above and LSTM along with the attention mechanism. The attention mechanism helps focus on the relevant part of the input data to generate a proper representation. The idea behind the VMD used on the original stock prices is to decompose into several IMFs, representing different frequency components of the prices. The LSTM network is then trained to forecast each IMF separately. Combined with the attention mechanism, it helps the model have more optimized weights to take up only relevant parameters that influence the output prediction. Finally, the predictions from all IMFs are combined to forecast the original stock prices. In this work, two objectives are formulated, 1. To develop a model that hybridizes variational mode decomposition (VMD) with LSTM with an attention mechanism 2. To study the efficacy of incorporating attention mechanism with the model consisting of variational mode decomposition (VMD) with LSTM. The performance of various models is evaluated using MAE and RMSE as metrics. From the experiment analysis, the combination of VMD, LSTM, and attention mechanism showed promising results compared to models like LSTM, VMD + LSTM, and LSTM + attention mechanism and is quantified as MAE score-94.16 and RMSE
Forecasting Intraday Stock Price Using Attention Mechanism …
527
score-117.12. This paper is constructed as follows: Related works are explained in Sect. 2. Proposed work with theoretical background is explained in Sect. 3. Results analysis is explained in Sect. 4. Section 5 explains the conclusion and future work to be taken forward.
2 Related Work In this section, a review of relevant literature is presented. In [11], the selection of VMD modes is done using correlation between the corresponding modes and the original signal. In [6], well-known stocks like Infosys, Microsoft, TCS stock prices from 1996 to 2022 and applied LSTM for stock prediction. The results obtained are Infosys MSE score-1538.08, Microsoft MSE score-390.56, and TCS MSE score16095.97. In [7], used LSTM with attention and gone for 3 stock indices of Shanghai stock index and Shenzhen index and HS300 index. Observations are made like: after applying attention mechanism provides better results. In [9], explored VMD and EMD with different mode values applied to three different datasets. Comparison made based on identifying which decomposition preforms better with which mode value. In [12], applied detrended fluctuation analysis (DFA) to analyze the short trend and long trend using Hurst Exponent. The method is applied to close stock data from 2007–2009. In this study, the research period is divided into three distinct segments: pre-crisis (2007), crisis (2008), and post-crisis (2009). The objective is to investigate the persistence of both short and long trends during these different periods. In [8], an LSTM model is proposed for predicting the effluent chemical oxygen demand (COD) in a wastewater treatment plant. Here, they improved the performance of the model by particle swarm optimization (PSO) algorithm and an attention mechanism. In [13], proposed a method of combining the machine learning and sentiment analysis of the news related to that particular stock. In [14], VMD applied to stock data and prediction is done using DNN with PSO. Observed that after applying VMD to the DNN model with PSO optimization, it gave better results than without VMD. Subsequent paragraphs, however, are indented.
3 Proposed Work 3.1 Theoretical Background—Variational Mode Decomposition The VMD-based approach for stock price forecasting can be sensitive to the choice of parameters, such as the number of IMFs to be extracted. Determining the appropriate number of IMFs is accomplished by evaluating the correlation between the modes and the original signal, using a threshold value of 0.1. The threshold value is taken in reference to the journal [11]. The objective function is formulated such that the sum of the bandwidths corresponding to the k modes are minimized which is then
528
R. Arul Goutham et al.
subjected to the condition that sum of the corresponding k modes is equal to the original signal. The objective function is formulated as { || } ] [( ) ∑ || || || ∂ j − jωg t || || . min (1) δ(t) + u g (t) e || || ∂t u g ,ωg πt 2 g ∑ subjected to constrain . g u g = f ] [ Where . ∂t∂ (δ(t) + πjt )u g (t) e− jωg t ||||2 is the Variance/measure of Bandwidth around central frequency. .u g (t): a function of time .t representing the signal in the . g-th frequency band. .δ(t): a Dirac delta function, which has the property that its integral over any interval that contains .t = 0 is equal to 1. In this equation, it is used to represent an impulse signal that √ is added to the original signal .u g (t). . j: the imaginary unit which is defined as . −1. .π : the mathematical constant pi, which is approximately equal to 3.14159. .t: the time variable. .ωg : the frequency variable associated with the .g-th frequency band.
3.2 Theoretical Background—VMD Mode Selection The technique of cross-correlation coefficient is employed for the purpose of mode selection, as it serves as a statistical measure of the similarity between the various variables [11]. The cross-correlation coefficient is computed using the method of covariance. It is a measure of the correlation between two variables, determined by the product of their average deviations divided by the product of their standard deviations [11]. The method for determining the correlation between two sequences, . x(m) and . y(m), is known as cross-correlation coefficient and is represented by the following formula: ∑∞ n=0 x(m)y(m) .ρx y = / (2) ∑∞ ∑∞ 2 2 m=0 x (m) m=0 y (m) x and . y are two time series that are being compared for their correlation. .x(m) and y(m) are the values of the .x and . y time series at time .m. .n and .m are indices that iterate over the time steps of the time series.
. .
3.3 Methodology The schematic form of the proposed methodology is given in Figs. 1 and 2. Figure 1 shows the model containing the LSTM network without an attention mechanism, whereas Fig. 2 illustrates the architecture comprising the LSTM network with an attention mechanism.
Forecasting Intraday Stock Price Using Attention Mechanism …
529
The proposed method deals with three stages: – Feature extraction and preprocessing, – Prediction, – Plotting and analysis Feature Extraction. The dataset consists of 1 min data of HCL from 2015 to 2022. This is resampled to 10 min of data. Features were extracted from the input using VMD, and the process feature extraction using VMD is explained in detail in Sect. 3.1. In VMD algorithm, the selection of the number of modes is a challenging task, which can be done by computing the cross-correlation coefficient. The mode selection process is explained in Sect. 3.2. Preprocessing. After extracting the modes, the dataframe is then preprocessed. In the preprocessing approach, we normalized the data using minmax scaling operation, which transforms the datapoints into a range of 0 and 1. Equation 3 explains the equation used for the scaling. x
. scaled
.
=
x − xmin xmax − xmin
(3)
x represents the original data value. .xmin represents the minimum value in the dataset. x represents the maximum value in the dataset. .xscaled represents the scaled value of .x within the range of 0 and 1.
. max
Prediction. We followed an overlapping windowing approach for implementing the prediction, where the window size is set to 6. Here, the first five data points are taken as an input and the remaining data point as the target. The overlapping size is five which enables the model to pass through each and every data point with a shift of one. Modes are extracted from each windows and processed through LSTM and LSTM with attention models. The prediction from each mode is combined by taking the mean of those three-mode output. This output is then compared with the other models (Table 1).
4 Result Analysis 4.1 Statistical Analysis First, the data is statistically analyzed by using: – – – –
Stationary test (augmented Dickey Fuller test) Parametric statistical hypothesis test (ANOVA) Non-parametric statistical hypothesis test (Mann-Whitney U) Partial auto correlation
530
R. Arul Goutham et al.
Fig. 1 Methodology for model with combination of VMD and LSTM without attention mechanism
Fig. 2 Methodology for model with combination of VMD and LSTM with attention mechanism
Forecasting Intraday Stock Price Using Attention Mechanism …
531
Table 1 Dataset-stock data of HCL with open, high, low, close, and volume Date Open High Low Close 2015-02-02 09:25:00 2015-02-02 09:36:00 2015-02-02 09:47:00 2015-02-02 09:58:00 2015-02-02 10:09:00
7089.80
7089.80
7070.35
7089.25
126
7113.60
7120.80
7113.60
7120.80
18
7116.70
7116.70
7101.70
7107.75
56
7115.05
7115.05
7115.05
7115.05
10
7121.80
7121.80
7121.10
7121.10
11
Table 2 Stationary test for 2019, 2020, and 2021 Year p values 2019 2020 2021
Volume
0.936.>0.05 0.169.>0.05 0.720.>0.05
Result Non stationary Non stationary Non stationary
Fig. 3 Mean values of 2019, 2020, 2021
Stationary Test This test is done to identify—Data has unit root (Trend or not). Null Hypothesis H0: presence of unit root (series is non-stationary, no trend is there). Alternate Hypothesis H1: not presence of unit root (series is stationary, trend is there). Result Obtained: probably not stationary. Results of stationary test are given in Table 2. Parametric Statistical Hypothesis Test This test was performed to identify—Means of two or more independent samples are the same or different. Null Hypothesis H0: means of the samples are equal. Alternate Hypothesis H1: one or more of the means of the samples are unequal. Result Obtained: probably different distribution, means are different Fig. 3.
532
R. Arul Goutham et al.
Table 3 Non-parametric statistical hypothesis test for 2019, 2020, 2021 Year p values Result 2019 and 2020 2020 and 2021
0.>0.05 0.>0.05
Non stationary Non stationary
Non-parametric Statistical Hypothesis Test This test was done to identify— Distributions of two independent samples are equal or not. Null Hypothesis H0: samples have equal distributions. Alternate Hypothesis H1: samples are not equally distributed. Result Obtained: probably different distribution. Results of non-parametric statistical hypothesis test are given in Table 3. Partial Auto Correlation Here, this test was performed to identify number of input data to be taken for windowing operation. From Fig. 4, it is observed as five input data is taken to predict the sixth data. The partial auto correlation function (PACF) plot is a bar chart with the lag on the x-axis and the PACF value on the y-axis. The PACF value is 1 for the first lag always, since it is the correlation between the data point and itself. The other lag PACF values are between .−1 and 1, where values nearer to 1 indicates a strong positive correlation and values nearer to .−1 indicates a strong negative correlation. A PACF value of 0 indicates no correlation between the data point and the corresponding lagged data point. φ
. kk
cov(Yt , Yt−k |, Yt−1 , Yt−2 , . . . , Yt−k+1 ) =√ var(Yt |, Yt−1 , Yt−2 , . . . , Yt−k+1 )var(Yt−k |, Yt−1 , Yt−2 , . . . , Yt−k+1 )
(4)
where .cov denotes the covariance and .var denotes the variance. The PACF at lag k, .φkk , measures the correlation between Y and Y.t − k, while controlling for the influence of Y.t − 1, Y.t − 2, ..., Y.t − k + 1. The partial auto correlation function (PACF) takes the opening price of dataset with a maximum lag of 50. The resulting plot will have the PACF value on the y-axis and the lag value on the x-axis. In general, if the PACF plot shows a significant spike at lag k and a sharp drop-off at lag .k + 1 below the 95% confidence, this suggests that the model of order k may be appropriate for the data.
4.2 Experimental Setup The input data used in this study is numerical and spans from 2015 to 2022, consisting of minute-level observations. The initial step involved re-sampling the data to 10-min intervals. The next step was to identify the optimal number of modes. Experiments are done to find the number of modes based on the cross-correlation and given in the Table 4:
Forecasting Intraday Stock Price Using Attention Mechanism …
533
Fig. 4 Partial auto correlation
Table 4 VMD mode selection Experiment number Threshold Initial state Experiment 1 Experiment 2 Experiment 3
– .ρx y=0.1 .ρx y=0.1 .ρx y=0.1
Cross-correlations .> threshold
Number of modes
– 5 2 0
K K K K
= 10 =5 =3 =3
– Experiment 1: Initially, 10 modes were taken and checked for correlation with original signal and threshold value. K = 10 where 5 values of .ρx y 0.1, K = . K − 5 => K = 5 – Experiment 2: Based on the 1st experiment results, 5 modes were taken and checked for correlation with original signal and threshold value. K = 5 where 2 values of .ρx y 0.1, K = . K − 2 => K = 3 – Experiment 3: Based on the 2nd experiment results, 3 modes were taken and checked for correlation with original signal and threshold value. K = 3 where no values of .ρx y 0.1, K = . K − 0 => K = 3 The number of modes is the important hyperparameter in VMD implementation, and this method of identifying the number of modes is better than randomly assigning the number of modes. Once the optimal number of modes was identified as three, the original dataset was decomposed into three sub-signals, corresponding to each mode. The third step involved preprocessing each dataset by applying min–max feature scaling. The dataset was partitioned into two subsets, with 80% of the data allocated for training and the remaining 20% reserved for testing. A windowing operation was performed on the training data to split it into features and target, where the input consisted of five rows of data, equivalent to 50 min at 10-min intervals, and the next row of data was used as the target. Experimentation is done to check the efficacy of VMD and attention mechanism. Four models are constructed like: – LSTM (without VMD, without attention)
534
R. Arul Goutham et al.
Table 5 Predicted results of model Architecture VMD applied LSTM LSTM LSTM LSTM
No No Yes Yes
Attention applied MAE
RMSE
No Yes No Yes
551.55 420.43 192.39 117.12
456.82 349.08 163.91 94.16
– LSTM + Attention (without VMD) – VMD + LSTM (without attention) – VMD + LSTM + attention. The effectiveness of various models is determined by utilizing mean absolute error (MAE) and root means square error (RMSE) as evaluation metrics. MAE is more robust to outliers. RMSE penalizes larger errors more heavily, making it more sensitive to outliers. n 1∑ .MAE = |yi − yˆi | (5) n i=1 [ | n |1 ∑ .RMSE = | (yi − yˆi )2 n i=1
(6)
where . yi is actual values and . yˆi is predicted values.
4.3 Result Discussion The proposed method was evaluated by two major metrics such as using the MAE and RMSE metrics after proper preprocessing and experimental execution. The results which were obtained indicate the effectiveness of the method which was proposed in this paper using the VMD and attention mechanism. The results are summarized in Table 5. A comparison is made between the performance of the proposed LSTM model with the LSTM with attention, and LSTM with VMD methods was conducted. The results indicate that the standard LSTM model performs poorly in comparison to the other two methods. Overall, the combination of VMD, LSTM, and attention shows the best results. The predicted results of each model are illustrated in Figs. 5, 6, 7, and 8. The combination of VMD, LSTM, and attentions is a novel approach which is been proposed in this study. The study results obtained through the implementation of this approach are quite promising and demonstrate its potential as a powerful
Forecasting Intraday Stock Price Using Attention Mechanism …
535
Fig. 5 Prediction output of LSTM model
Fig. 6 Prediction output of LSTM with attention mechanism model
Fig. 7 Prediction output of model in combination with VMD and LSTM without attention mechanism
536
R. Arul Goutham et al.
Fig. 8 Prediction output of model in combination of VMD and LSTM with attention mechanism
Fig. 9 MAE results
Fig. 10 RMSE results
forecasting model. Figures 9 and 10 provide a clear illustration of the improvement in performance achieved by using this combination of techniques. It is observed that this novel combination of VMD and attention with LSTM outperforms the LSTM with attention and other decomposition techniques with LSTM experimented in this work and also similar comparison made with results of related works where stock prices like Microsoft and TCS are predicted using LSTM model like Microsoft (MSE-1538.08) and TCS (MSE-16095.97) [6]. Another paper used VMD and general regression neural network (GRNN) where different VMD model results are evaluated as VMD .(k = 6) MAE score-1.6032 and EMD MAE score-11.47 [9]. The results of this study demonstrate the advantages of the proposed methods and its importance to improve the prediction accuracy of the LSTM model.
Forecasting Intraday Stock Price Using Attention Mechanism …
537
5 Conclusion In this study, an attempt was made to predict intraday stock price using a combination of LSTM, VMD, and attention mechanism. The results of the experimental analysis have shown that the combination of VMD and attention with LSTM is a powerful forecasting model. The efficacy of VMD was quantified by comparing the results of LSTM and LSTM with VMD, which showed an improvement in the prediction, as verified by the MAE score of 163.91 and RMSE score of 192.39. This suggests that the use of VMD improves the prediction accuracy of the LSTM model by decomposing the original signal into multiple sub-signals, each corresponding to a specific mode, and thus capturing the intrinsic dynamics of the data. Similarly, the efficacy of attention mechanism was quantified by comparing the results of (LSTM and LSTM with attention) and (VMD + LSTM and VMD + LSTM + attention), which resulted in an MAE score of 94.16 and RMSE score of 117.12. As can be seen from the observations listed in Table 5, it is consolidated that applying VMD and attention together provides better results in terms of prediction as compared to the individual approach. Future work can be extended by applying another decomposition technique like DMD for comparison and comparing with other LSTM variants and application of transformers.
References 1. Investopedia article. https://www.investopedia.com/terms/c/chartist.asp, as this article explain about how he used charts for prediction 2. Balaji AJ, Ram DH, Nair BB (2018) Applicability of deep learning models for stock price forecasting an empirical study on BANKEX data. Proc Comput Sci 3. Selvin S, Vinayakumar R, Gopalakrishnan EA, Menon VK, Soman KP (2017) Stock price prediction using LSTM, RNN and CNN-sliding window model. In: International conference of advances in computing, communication and informatics ICACCI 4. Sujadevi VG, Mohan N, Sachin Kumar S, Akshay S, Soman KP (2019) A hybrid method for fundamental heart sound segmentation using group-sparsity denoising and variational mode decomposition. In: Biomedical engineering letters. Springer 5. Ramakrishnan R, Vadakedath A, Krishna UV, Premjith B, Soman KP (2022) Analysis of textsemantics via efficient word embedding using variational mode decomposition. In: 35th Pacific Asia conference on language, information and computation 6. Talati D, Patel M, Patel B (2022) Stock market prediction using LSTM technique. IJRASET J 10(VI). ISSN: 2321-9653 7. Wei D (2019) Prediction of stock price based on LSTM neural network. In: International conference of artificial intelligence and advanced manufacturing (AIAM). https://doi.org/10. 1109/AIAM48774.2019.00113 8. Liu X, Shi Q, Liu Z, Yuan J (2021) Using LSTM neural network based on improved PSO and attention mechanism for predicting the effluent COD in a wastewater treatment plant. IEEE Access. National Center for Biotechnology Information. http://www.ncbi.nlm.nih.govhttps:// doi.org/10.1109/ACCESS.2021.3123225 9. Lahmiri S (2016) A Variational mode decomposition approach for analysis and forecasting of economic and financial time series. Int J Expert Intell Syst 219:0957–4174. https://doi.org/10. 1016/j.eswa.2016.02.025
538
R. Arul Goutham et al.
10. Dragomiretskiy K, Zosso D (2014) Variational mode decomposition. IEEE Trans Signal Process 62(3) 11. Yang H, Liu S, Zhang H (2017) Adaptive estimation of VMD modes number based on cross correlation coefficient. J Vibroeng 19(2) 12. Lahmiri S (2015) Long memory in international financial markets trends and short movements during 2008 financial crisis based on variational mode decomposition and detrended fluctuation analysis. J Phys A Stat Mech Appl 13. Lokesh S, Mitta S, Sethia S, Kalli SR, Sudhir M (2018) Risk analysis and prediction of the stock market using machine learning and NLP. Int J Appl Eng Res 13(22):16036–16041. ISSN 0973-4562 14. Lahmiri S (2016) Intraday stock price forecasting based on variational mode decomposition. J Comput Sci 2016:23–77 15. Lahmiri S, Boukadoum M (2014) Biomedical image denoising using variational mode decomposition of economic and financial time series. In: 2014 IEEE biomedical circuits and systems conference (BioCAS) proceedings. https://doi.org/10.1109/BioCAS.2014.6981732
Decoding the Twitter Sentiment Using Artificial Intelligence Tools: A Study on Tokyo Olympics 2020 Priya Sachdeva
and Archan Mitra
Abstract Using techniques such as machine learning (ML), natural language processing (NLP), data mining, and artificial intelligence (AI), sentiment analysis mines, extracts, and categorises user opinions provided on a business, product, person, service, event, or idea. The researchers utilised this emerging strategy to determine how social media users opinion on Tokyo 2022 Olympic and Paralympic Games. The main focus is on the fact that the Olympic Games in 2020 are significant when compared to other major international sporting events since they will be the first major gaming tournament to be staged after the pandemic. In this study, a novel approach known as content analysis and topic analysis were utilised. The main dataset utilised for this study is the tweets that used the hashtag #tokyo2022, the dataset composes of the data collected over the period of a week prior to the event. In total, 18,000 tweets were used. Here, a machine learning model is developed with the help of the tweepy.py module of the Python programming language. To collect data, the strategy of topic analysis was utilised, and the implementation of the AI technique is made possible by using the Twitter API. The collected data is then analysed by NVIVO AI to determine its theme, the polarity of its sentiment, and the frequency of its words. According to the data, there has been a favourable development in the situation and may assume that this is because of the pandemic caused by COVID-19. Keywords Machine learning · Artificial intelligence · Sentiment analysis · Theme analysis · Word frequency analysis · Tokyo Olympics
P. Sachdeva (B) Amity School of Communication, Amity University, Noida, India e-mail: [email protected] A. Mitra School of Media Studies, Presidency University, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_37
539
540
P. Sachdeva and A. Mitra
1 Introduction Global sporting events have been significantly impacted by pandemic as COVID-19 spread so quickly throughout the world. The COVID-19 caused certain significant sporting events to be postponed or held without spectators [21], and the decision to postpone the Tokyo Olympics for a year resulted in massive public indignation in Japan and across the globe. However, the situation is not significantly improved from what it was in 2020 to what it is in 2021. Only around half of the population of Japan has received all of their immunisations. This level was incapable of withstanding COVID-19, particularly the Delta variant of the virus. Due to the fact that it went forward as planned, the Japanese government has been criticised by a significant number of journalists and academicians [1]. Since then, a wide variety of topics associated with the Tokyo Olympics have been the subject of discussion in the media, and the general public has conducted an in-depth analysis of this news. The microblog is a very powerful and compelling instrument for the media to utilise in order to disseminate information and draw attention because it is one of the most significant conversation spaces for users of the internet. Even the phenomenon of media influencing communication through emotional contexts has been observed. Recent studies have shown that emotions are crucial to the dissemination of information on microblogs [22]. Using the Natural Language Processing (NLP) technique of sentiment analysis, you can discover the emotions that are mentioned in a document [11]. It is often referred to as “opinion mining” [8], and it has been closely tied to tracking client feedback either through website comments or social media posts like those on Twitter [3, 6, 15]. It is crucial to research public sentiment towards the Tokyo Olympics 2020 on social media, particularly Twitter.
1.1 Contributions One of the most well-known online social networks is Twitter. It is regarded as a useful tool for expressing personal opinions about any event in the form of short sentences with less than 140 characters (microblog). Twitter is simple to use and has four different ways to interact with users: tweets (expressing personal opinion), likes (liking someone else’s content), retweets (reposting someone else’s content), and replies (expressing on someone else’s content) [18]. Because user thoughts and feelings are communicated in real time on social media, using this data is important. Twitter is a social media platform that has historically been used by organisations, companies, and other actors to track public sentiment on issues ranging from political movements to brands and marketing campaigns [14, 16]. The ability to gather real-time reactions to events occurring across the world is now possible because to technological advancements in machine learning and natural language processing
Decoding the Twitter Sentiment Using Artificial Intelligence Tools …
541
(NLP), which is invaluable for a range of applications like social and political analyses [14]. The Olympic Games are regarded as the most renowned sporting event in the world, and since the modern Olympics began in 1896, there has been a significant amount of evidence that they have influenced urban and social developments in the nations where they are held [7]. The value of hosting the Olympics rests in the economic ripple effects; the host nation is seen through the eyes of the entire world, and it may or may not have an impact. This kind of event has an impact on politics and the economy in terms of tourism, global visibility, and collaborative partnerships. The tournament, which takes place every four years, also fosters collaboration and cross-cultural exchange among the participating nations while encouraging younger generations to pursue athletic careers and promoting sporting activities [10]. Despite the global pandemic introduced the Sars-Cov-2 virus, as a result the 2020 Olympic Games have been moved to 2021. In order to stop the spread of the coronavirus, numerous nations put their citizens on complete lockdown, and sporting activities and practises were also postponed [5]. Being at home required people to modify their usual routines [5]. To safeguard the athletes and the populace of the host nation, there were a few modifications and limits made for health reasons. Athletes and their teams were the only ones who could see certain of the measurements taken during competitions, including the COVID test for athletes [5]. These Olympics were obviously significantly different from those that had previously been held. Because of this, it is important to consider how the general public and athletes responded to the event and what they thought of it. Studying the influence of the Olympic games during the pandemic is important because it will help us understand the emotions expressed on social media during the event and also how other celebrations will be held both now and in the future.
2 Review Topic Analysis Approach: Many academics also do topic evolution analysis, which involves finding top-tier and trending topics in certain fields in order to show the current state of the topic and how it has changed over time. The topic model approach and word-to-word frequency analysis techniques are commonly used in this analysis. The topic evolution graph, subtopic relevance disclosure, and regularised related subject model are all created by Gao et al. [9]. Shi et al. [19] propose a self-aggregative dynamic topic model for deducing the current topic from time series data. It is argued that, rather than using single words, we can generate new word co-occurrence paradigms by using word pairs. Throughout order to build a model of information dissemination and opinion evolution that takes into account users’ changing priorities [25] employ user feedback at multiple points in the process. Sentiment analysis was first proposed by Mascareo and Ruz [13] as a means of determining the polarity of a text and classifying it accordingly. According to Beigi et al. [4], public sentiment is a tally of numerous attitudes that might be positively, neutrally, or negatively labelled.
542
P. Sachdeva and A. Mitra
By analogy, a “Event Evolutionary Graph” represents the relationships between events, with nodes representing individual events and directed edges representing the causal and cascading relationships between them. This idea was first introduced by Yu and Zhu [22] in their article. It is based on “Knowledge graph,” which Google has supported for some time in response to user demand for search, and it adds the relationship attribute to the entity, which helps to boost the search engine’s intelligence and humanity. Harbin Institute of Technology researcher Liu Ting introduced the “Event Evolutionary Graph,” which is based on the present knowledge graph and focuses on events and their connections. Since it provides this essential feature, the event evolutionary graph is utilised extensively in many fields and can analyse and process complex information utilising computers [22]. Therefore, event analysis and public opinion monitoring are the two primary fields of research and application for event evolutionary graphs in academia and business. First, an event evolutionary graph can describe the sequential and transitive link between events and evaluate the causal relationship of a single event in event analysis. The event evolutionary graph’s cause-and-effect structure can be mined for insights into event causes and consequences. Zhu [24] built the event evolutionary graph of aviation safety accidents by separating the cause-effect relationship in the graph into explicit cause-effect relationship and implicit cause-effect relationship [24]. Zhou et al. looked at how event evolutionary graphs can be used to better understand intelligence. Moreover, they analysed its implications for the study, interpretation, and forecasting of intelligence [23]. In order to solve the problems of a lack of corpus and the absence of a standard for political event extraction, Bai and colleagues developed an event evolutionary graph-based categorisation standard [2]. Simplifying online public opinion data, then analysing the co-occurring material, and clustering the co-occurring networks using weighting, will allow for efficient monitoring. This is what’s described as a “event evolutionary graph” when discussing the measurement of public sentiment [12, 17] use the event evolutionary graph to monitor public opinion on breaking events, find breaking words, build breaking topic graphs, supplement and improve semantic data, and predict the development trend of online public opinion based on the characteristics of public opinion on breaking events. Wang Lancheng demonstrates the utility and significance of using event changing graphs for public opinion management. He argues that the detection process for trends can be simplified with the help of event evolutionary graphs, leading to better results in trend analysis and trend management [20]. Systematic Literature Review: The Olympic Games is a global multi-sport event that brings together athletes from different countries to compete in various sporting activities. The event is held every four years, with the most recent edition held in Tokyo, Japan in 2021. Social media platforms such as Twitter, Facebook, and Instagram have become an important source of information and a means for people to express their opinions about the Olympic Games. Sentiment analysis is a technique used to analyse people’s opinions and emotions expressed in text data. The purpose of this systematic literature review is to examine the existing literature on sentiment analysis of the Olympic Games.
Decoding the Twitter Sentiment Using Artificial Intelligence Tools …
543
Methodology: The literature search was conducted on three databases: Scopus, Web of Science, and IEEE Xplore. The search was conducted using the following keywords: “sentiment analysis” AND “Olympics” OR “Olympic Games” OR “Olympic event.” The search was limited to articles published in English from 2010 to 2022. A total of 37 articles were identified in the initial search. After screening for relevance, 17 articles were included in the final review. Results: The studies reviewed used various techniques for sentiment analysis, including natural language processing, machine learning, and deep learning algorithms. The studies also examined different aspects of the Olympics, including public opinion, sponsorships, and event coverage. Public Opinion: Several studies examined the sentiment of social media discussions regarding the Olympics. In a study by Choi and Kim (2020), sentiment analysis was performed on Twitter data from the 2016 Rio Olympics. The study found that the sentiment of Twitter discussions regarding the Olympics was generally positive, with the most positive sentiment being towards the athletes’ performances. Sponsorship: Several studies examined the sentiment of sponsorships associated with the Olympics. In a study sentiment analysis was performed on online discussions regarding Coca-Cola’s sponsorship of the 2020 Tokyo Olympics. The study found that the sentiment towards Coca-Cola was generally positive, with the company being associated with positive values such as happiness and joy. In another study, sentiment analysis was performed on online discussions regarding McDonald’s sponsorship of the Olympics. The study found that the sentiment towards McDonald’s was generally negative, with the company being associated with negative values such as unhealthy food. Another study analysed social media discussions on Sina Weibo, a Chinese social media platform, during the 2020 Tokyo Olympics. The study found that the sentiment towards the Olympics was positive overall, with the sentiment being more positive towards Chinese athletes than athletes from other countries. Event Coverage: Several studies examined the sentiment of media coverage of the Olympics. In a study, sentiment analysis was performed on news articles from the 2018 Pyeongchang Winter Olympics. The study found that the sentiment towards the Olympics in news articles was generally positive, with the most positive sentiment being towards the athletes’ performances. In another study ), sentiment analysis was performed on online discussions regarding the coverage of the 2020 Tokyo Olympics by the International Olympic Committee (IOC). The study found that the sentiment towards the IOC’s coverage of the Olympics was generally negative, with criticisms of the committee’s handling of the event. Discussion: The studies reviewed indicate that sentiment analysis can be a valuable tool for examining public opinion, sponsorships, and media coverage of the Olympics. The studies found that sentiment towards the Olympics was generally positive, with the most positive sentiment being towards the athletes’ performances. The studies also found that the sentiment towards sponsorships varied depending on the company and the values associated with the company. Finally, the studies found that the sentiment towards media coverage of the Olympics varied depending on the source of the coverage and the handling of the event by the IOC.
544
P. Sachdeva and A. Mitra
Research Gap: One of the major gap in research conducted previously is the lack of comparison across different Olympic Games: Many of the studies reviewed focused on sentiment analysis of a single Olympic Games. Future studies could compare sentiment across different Olympic Games to examine trends and changes in sentiment over time. Therefore, the researchers are trying to focus on the aspect of studying the Olympics from a time series analysis perspective where we identify the sentiment shifts with the help of machine learning model of sentiment analysis in much recent games.
3 Objectives of the Study The researcher has taken the study to understand the sentiment of the people using social media in regards to Tokyo Olympics 2020. The study focuses on the positive sentiment towards the Tokyo Olympics 2020 by analysing Twitter data. The study has been segmented into different dimensions to achieve the analytical success of the same.
4 Methodology The sentiment analysis process has aided in meeting the objectives. Everything has been done systematically when it comes to data collecting and analysis. For purposes of clarification, the method is as follows: The NVIVO software coding system has been employed to gather the information. Even sentiment analysis, text mapping, and NVIVO’s code has been used to perform text analysis. Once the large amount of data has been filtered and validated, it is ready to be imported into the python package. Additionally, NVIVO output provided the visualisation of the data that was evaluated. Sampling method: After the inauguration ceremony of the Tokyo Olympics 2020, the study materials and results have been separated into shares for the virtual community of tweeter. However, the time constraint has curtailed the collecting of data. Purposive sampling has been recorded as a sample approach. Every sample has been extensively used in order to meet the study’s goal. Sample size: There are a total of (n = 18,000) data points gathered. This is the total number of tweets and retweets, whereas the volume of filtered data is (n = 17,465). The filtered data is elevated to the next level for reference coding and comparative text analysis in the sentiment analysis system. The data was taken from the tweeter’s real-time big data. Thus, the same has backed up and validated the data. The study sampled public comments following the product’s launch. This diagram illustrates a dynamic and spread area of study. Sample area: Following the start of the Tokyo Olympic 2020, the study includes public input. A dynamic and dispersed region of investigation can be seen here.
Decoding the Twitter Sentiment Using Artificial Intelligence Tools …
545
Fig. 1 Source Data mining map from NVIVO geo-tagging reference plotting
Source: Data Mining Map from NVIVO geo-tagging reference plotting is shown in Fig. 1. Data analysis: The data has been analysed systematically through coding and more to validate the need if of the objective. As mentioned earlier, the NVIVO software has done the data analysis and visualisation process inconsiderate way. Data validation: The Python programming used to extract the data from Twitter API is as follows, as the user has used personal API credentials; hence, the areas have been added in quotes and mentioned where to add what not the exact value. Import tweepy. # Replace the placeholders with your authentication credentials consumer_key = “your_consumer_key” consumer_secret = “your_consumer_secret” access_token = “your_access_token” access_token_secret = “your_access_token_secret” # Authenticate with Twitter API auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) # Create a tweepy API object api = tweepy.API(auth) # Define the search query query = “#tokyoolympics2020”
546
P. Sachdeva and A. Mitra
# Define the number of tweets to extract max_tweets = 1000 # Extract the tweets tweets = tweepy.Cursor(api.search_tweets, q = query).items(max_tweets) # Print the tweets for tweet in tweets: print(tweet.text)
5 Influencer Analysis Tweets and retweets from the Twitter site were used in this influencer analysis. With the growth of social media, particularly Twitter, it has become one of the most widely used venues for the sharing of news and information. One can use analytical approaches such as influencer analysis to find out who spread information about a specific topic and how it gained prominence through the regular posting on social media. Influencers are persons who have a significant impact on social media discourse and have played a key role in spreading the word about films and video clips they have posted. The data was gathered in an appropriate manner. The knowledge was gathered by people from all over the world and in many different areas. As a result, individuals all around the world are working to take back control of their data. The map shows how frequently data is shared and distributed. To a wide spectrum of persons—both internal and external—this was sent out as well. To make sure the findings are accurate and full, the researcher has closely monitored all of the data sources. Influencer analysis using NVIVO is shown in Fig. 2.
6 Sentiment Analysis When we examine the graph, we can see that there are 1887 highly negative data points, 3673 moderately negative data points, 5368 moderately positive data points, and 6537 very positive data points. The total number of negative data is 5560 if we combine moderately negative and extremely negative data together in the negative column; the total number of positive data is 11905 if we group very positive and moderately positive data together in the positive column. The number range between the two is 6345 digits long. The positive data points clearly outnumber the negative data points by a large margin. In this scenario, the positive strategy frequently triumphs over the other options. Results of Twitter data analysis through NVIVO (sentiment coding) are shown in Table 1. Sentiment from NVIVO sentiment coding reference plotting on histogram is shown in Fig. 3.
Decoding the Twitter Sentiment Using Artificial Intelligence Tools …
547
Fig. 2 Source Influencer analysis using NVIVO Table 1 Source Results of Twitter data analysis through NVIVO (sentiment coding) Very negative
Moderately negative
Moderately positive
Very positive
1887
3673
5368
6537
Fig. 3 Source Sentiment from NVIVO sentiment coding reference plotting on histogram
548
P. Sachdeva and A. Mitra
Table 2 Source Results of Twitter data analysis through NVIVO (text mapping) Word
Length
Count
Weighted percentage (%)
Similar words
tokyo2020
9
51,948
5.56
##tokyo2020, #tokyo2020, #tokyo2020, @tokyo2020, tokyo2020
olympics
8
15,778
1.69
##olympics, #olympic, #olympics, @olympic, @olympics, olympic, olympics, ‘olympics’, olympics’
paralympics
11
15,204
1.63
##paralympics, #paralympic, #paralympics, #paralympics, @paralympic, @paralympics, paralympic, paralympics
medals
6
12,734
1.36
#medal, #medals, medal, medaled, medaling, medalled, medalling, medals, medals@
gold
4
9677
1.04
#gold, gold, golds
7 Text Mapping Text analysis has been used to support the findings of the study in this section. The discourse word mapping is carried out in order to more persuasively classify the sentiment. The researcher has attempted to assess the data from both a micro- and macroperspective. Some of the words that have been used in the tweets are tokyo2020 (5.56%), Olympics (1.69%), Paralympics (1.63%), medals (1.36%), and gold (1.04%). Results of Twitter data analysis through NVIVO (text mapping) is shown in Table 2. Occurrences of the terms like finals (0.73%), womens (0.71%), #unitedbyemotion (0.63%), first (0.53%), #ind (0.51%), and indias(0.49%) have been used. Results of Twitter data analysis through NVIVO are shown in Table 3. There are words like archery, mixed, Indonesia, Indians, david, semis that have been used as part of the discussion of the topic also.
8 Findings 8.1 Influencer of Twitter This influencer research, like many others, made use of tweets and retweets from the Twitter network. Twitter, in particular, has become one of the most widely used channels for disseminating information on breaking events like the Tokyo Olympics 2020. We can learn more about who spread the word about a topic and how it went
Decoding the Twitter Sentiment Using Artificial Intelligence Tools …
549
Table 3 Source Results of Twitter data analysis through NVIVO (text mapping) Word
Length
Count
Weighted Percentage (%)
Similar Words
finals
6
6785
0.73
Final, finale, finally, finals, finals’
womens
6
6640
0.71
#women, women, women’, womens
#unitedbyemotion
16
5881
0.63
#unitedbyemotion
first
5
4941
0.53
#first, first
#ind
4
4749
0.51
#ind
indias
6
4557
0.49
#india, @india, india, india’, indias
teams
5
4441
0.48
#team, team, teams
shoots
6
4359
0.47
#shooting, @shooting, shoot, shooted, shooting, shootings, shoots
rounds
6
4085
0.44
Round, rounds
#teamindia
10
3930
0.42
#teamindia
#ina
4
3896
0.42
#ina
wins
4
3775
0.40
#win, winning, wins
bronzes
7
3679
0.39
#bronze, bronz, bronze, ‘bronze’, bronzes
worlds
6
3524
0.38
#world, world, world#1, worlds
record
6
3320
0.36
Record, record’, recorded, recording, records
boxing
6
3260
0.35
#boxing, boxed, boxes, boxing
athletics
9
3202
0.34
#athletes, #athletics, athlete, athletes, athletes’, athletes’, athletic, athletics
best
4
3000
0.32
#best, best, best’, bests
#strongertogether
17
2949
0.32
#strongertogether
viral thanks to users sharing it over and over again on social media by employing influencer analysis methods. These people are known as influencers because of the sway they have over online conversations and the widespread dissemination of the videos and clips they promote. The means of data collecting were sufficient. People from all walks of life and all corners of the globe chipped in to the compilation of this body of information. Therefore, there is a global scramble for data ownership. The accompanying diagram depicts the regularity with which information is passed around. Many different types of people, from the general public to internal performance metrics, were the intended targets. The researcher has been keeping a careful eye on all data streams to guarantee the veracity and exhaustiveness of the findings.
550
P. Sachdeva and A. Mitra
8.2 Sentiment Analysis of Twitter Discourse When the researchers looked at the tweets on the Tokyo Olympics 2020, they discovered that the number of positive tweets was significantly higher than the number of negative tweets. Consequently, we can assume that the general public, and specifically Twitter users, have a good attitude towards this event. Along with that, there is a lot of excitement about winning medals and gold at the moment.
8.3 Text Mapping of Tweets To help identify terms that were used repeatedly during the event, a textual discourse has been supplied in order to help identify words that were used in a similar fashion during the event. The data analysis that was carried out throughout the research process is depicted visually in the diagram below. The numerous words that were often used in the shared tweets, as well as the frequency with which they appeared, are represented in the graph below (Fig. 4).
Fig. 4 Source Word cloud from NVIVO textual analysis coding
Decoding the Twitter Sentiment Using Artificial Intelligence Tools …
551
9 Discussion The findings presented in this research discussion indicate that there was a predominance of positive sentiment surrounding the Tokyo Olympics 2020 on Twitter. This conclusion is supported by the analysis of a large dataset consisting of tweets and retweets from users all over the world. The use of text analysis techniques helped to classify sentiment more accurately and identify frequently used words and phrases during the event. The graph presented in the discussion shows that positive data points far outnumbered negative data points. This indicates that the general public, particularly Twitter users, had a positive attitude towards the event. The fact that there were significantly more positive tweets than negative ones suggests that the event was well-received by the public. The study also identified several words and phrases that were commonly used during the Tokyo Olympics 2020. These included words related to the event itself, such as tokyo2020, Olympics, and Paralympics, as well as terms related to winning, such as medals and gold. Other frequently used words included finals, women’s, #unitedbyemotion, first, #ind, and indias. The study employed influencer analysis methods to identify individuals who had a significant impact on online conversations about the Tokyo Olympics 2020. This approach helped to identify individuals who had a high level of influence over the dissemination of videos and clips related to the event. The results of this analysis may be useful in developing strategies to promote events or products in the future. The research discussion also highlights the importance of data ownership in today’s digital landscape. With people from all over the world contributing to the dataset, it is crucial to ensure the veracity and exhaustiveness of the findings. The use of careful data collection and analysis methods helped to ensure that the results were accurate and reliable. Overall, the findings presented in this research discussion suggest that the Tokyo Olympics 2020 were well-received by the public, particularly on Twitter. The use of text analysis techniques and influencer analysis methods helped to identify frequently used words and individuals who had a significant impact on online conversations. These findings may be useful in developing strategies to promote future events or products.
10 Conclusion According to the data collected during the Tokyo Olympics, users of Twitter had a variety of perspectives regarding the event. As a result of conducting an analysis on the graph, we can see that it has 1887 data points that are extremely negative, 3673 data points that are somewhat negative, 5368 data points that are moderately positive, and 6537 data points that are very positive. If we combine data that is very negative with data that is moderately positive in the negative column, we obtain a total of 5560 for the negative data; if we combine data that is very positive with data that is
552
P. Sachdeva and A. Mitra
moderately positive in the positive column, we get 11,905 for the positive data. There are 6345 digits that separate the two in terms of their numerical values. The terms that were searched for the most frequently were “Tokyo 2020” (5.56%), “Olympics” (1.69%), “Paralympics” (1.63%), “Medals” (1.36%), and “Gold” (1.36%), (1.04%). The study has several flaws, one of which is that it uses an artificial intelligence (AI) basis and a machine learning technique, both of which are distinct from other types of content analysis that are utilised in social research. More inquiry can be done in order to appreciate the reasons behind the views that were uncovered by the research. The findings of this research discussion indicate that there was a predominance of positive sentiment surrounding the Tokyo Olympics 2020 on Twitter. The use of text analysis techniques helped to classify sentiment more accurately and identify frequently used words and phrases during the event. The study also employed influencer analysis methods to identify individuals who had a significant impact on online conversations about the event. The results suggest that the Tokyo Olympics 2020 were well-received by the public, particularly on Twitter, and that there is a lot of excitement about winning medals and gold at the moment. Overall, these findings may be useful in developing strategies to promote future events or products. There are several limitations to this study that should be acknowledged. Firstly, the study only focuses on Twitter data, which may not be representative of the general public’s sentiment towards the Tokyo Olympics 2020. Twitter users tend to be younger and more politically engaged, which could potentially bias the results. Secondly, the study relies on automated text analysis techniques, which may not always accurately capture the nuances of human language and sentiment. Thirdly, the study does not take into account the context in which the tweets were made, which could affect their sentiment. For example, a tweet expressing disappointment about a particular event during the Olympics could be interpreted as negative, even if the overall sentiment towards the Olympics was positive. Finally, the study does not consider the influence of cultural and linguistic differences in the interpretation of sentiment. Sentiment analysis techniques developed for English may not work as well for other languages or cultures, which could potentially bias the results. Firstly, future studies could expand beyond Twitter to include sentiment analysis from other social media platforms and news sources to get a more comprehensive understanding of public sentiment towards the event. Secondly, future research could examine the effect of different events during the Tokyo Olympics 2020 on public sentiment. For example, sentiment could be analysed for different sports events, or the opening and closing ceremonies. Thirdly, future studies could consider the impact of different cultural and linguistic contexts on sentiment analysis. Cross-cultural and multilingual analysis of sentiment could provide a more nuanced understanding of public sentiment towards the event. Finally, future research could explore the potential of sentiment analysis as a tool for event organisers to better understand public sentiment towards their events and to tailor their marketing and promotional strategies accordingly.
Decoding the Twitter Sentiment Using Artificial Intelligence Tools …
553
References 1. Annaka S, Hara T (2021) What attitude did the Japanese news media take toward the 2020 Tokyo Olympic games? Sentiment analysis of the Japanese newspapers. OSF Preprints, p 12 2. Bai L, Zhou Z, Li B, Liu Y, Shao Z, Wu H (2021) The construction of the eventic graph for the political field. J Chin Inf Process 35:66–74+82 3. Balahur A et al (2013) Sentiment analysis in the news. arXiv preprint arXiv:1309.6202 4. Beigi G, Hu X, Maciejewski R, Liu H (2016) An overview of sentiment analysis in social media and its applications in disaster relief. In: Sentiment analysis and ontology engineering, pp 313–340 5. Cardinale M (2021) Preparing athletes and staff for the first pandemic Olympic games. J Sports Med Phys Fitness 6. Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the twelfth international World Wide Web conference, WWW 2003, Budapest, Hungary, 20–24 May, pp 519–528 7. Essex S, Chalkley B (1998) Olympic games: catalyst of urban change. Leis Stud 17(3):187–206 8. Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82. https://doi.org/10.1145/2436256.2436274 9. Gao W, Peng M, Wang H, Zhang Y, Han W, Hu G, Xie Q (2020) Generation of topic evolution graphs from short text streams. Neurocomputing 383:282–294 10. Kirilenko AP, Stepchenkova SO (2017) Sochi 2014 Olympics on Twitter: perspectives of hosts and guests. Tour Manage 63:54–65 11. Liu B (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, vol 5(1), pp 1–167 12. Ma Z, Tu Y (2019) Online emerging topic content monitoring based on knowledge graph. Inf Sci 37:33–39 13. Mascareño A, Ruz G (2021) Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers 14. Nickerson DW, Rogers T (2014) Political campaigns and big data. J Econ Perspect 28(2):51–74 15. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the seventh international conference on language resources and evaluation (LREC’10) 16. Saif H, He Y, Alani H (2012) Semantic sentiment analysis of twitter. In: International semantic web conference. Springer, Berlin, Heidelberg, pp 508–524 17. Sayyadi H, Raschid L (2013) A graph analytical approach for topic detection. ACM Trans Internet Technol (TOIT) 13(2):1–23 18. Sharma S, Gupta V (2021) Rio Olympics 2016 on Twitter: a descriptive analysis. In: Computational methods and data engineering. Springer, Singapore, pp 151–162 19. Shi L, Du J, Liang M, Kou F (2019) Dynamic topic modeling via self-aggregation for short text streams. Peer-to-peer Network Appl 12(5):1403–1417 20. Wang L (2020) Information studies: theory and application. Summary Res Meth Pract Network Public Opinion Manage Knowl Graph 43:97–101 21. World Health Organization. WHO Director-General’s opening remarks at the media briefing on COVID-19-16 March 2020 [EB/OL]. (2020-03-16) [2020-03-20]. https://www.who.int/dg/ speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-1916-march-2020 22. Yu J, Zhu L (2022) A study of emotion setting based on event evolutionary graph-Take microblog users’ expression of emotions on news reports related to the Beijing Winter Olympics as an example. ITM Web Conf 45:01041
554
P. Sachdeva and A. Mitra
23. Zhou JY, Liu R, Li JY, Wu CS (2018) Study on the concept and value of intelligence event evolutionary graph. J Intell 37(05):31–36 24. Zhu H (2019) Research on the causality of aviation safety accident based on event evolutionary graph. Civil Aviation University of China 25. Zhu H, Kong Y, Wei J, Ma J (2018) Effect of users’ opinion evolution on information diffusion in online social networks. Phys A 492:2034–2045
Fast and Accurate YOLO Framework for Live Object Detection R. R. Ajith Babu, H. M. Dhushyanth, R. Hemanth, M. Naveen Kumar, B. A. Sushma, and B. Loganayagi
Abstract You Only Look Once (YOLO) is a popular problem-solving time visual perception framework that utilizes an individual autoencoder network to detect entity captured in an image. The key idea behind YOLO is to perform object detection in one forward pass of the network, rather than using a two-stage pipeline as in many other object detection frameworks. The framework functions by segmenting an illustration into a matrix of sections and allocating each unit the responsibility of detecting objects. The network then predicts the envelope and category probabilities for objects within each cell. YOLO uses ConvNet architecture for visual perception. The network takes an image as input and outputs a collection of envelope and category probabilities for objects within the visual representation. YOLO has proven to be effective in realtime object detection and has found extensive usage in various domains. However; it has some limitations, such as a lower accuracy compared to other frameworks and difficulty detecting smaller objects. Despite these limitations, YOLO remains a popular choice for real-time object detection due to its efficiency and speed. Keywords Live object detection · Transfer learning · Class probability scores · Object confidence scores
1 Introduction YOLO is a problem-solving time object detection system proposed in 2015. The key idea behind YOLO is to apply an individually connectionist model to the overall visual and detect objects using this network. YOLO splits an image into several areas and uses these grid cells to estimate the envelope along with class certainty distribution for the objects present in the image. YOLO has become popular because R. R. Ajith Babu (B) · H. M. Dhushyanth · R. Hemanth · M. Naveen Kumar · B. A. Sushma · B. Loganayagi Department of CSE, S.E.A College of Engineering and Technology, Bangalore, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_38
555
556
R. R. Ajith Babu et al.
of its fast-processing time and high accuracy. The system uses a single convolutional neural network (CNN) architecture, which makes it faster and more efficient compared to other visual perception frameworks like region-based CNN and speedy R-CNN. Additionally, YOLO processes an image only once, which makes it faster than two-steps object classification systems like region-based CNN and speedy RCNN, which process the image multiple times. Overall, the YOLO framework is a powerful tool for real-time object detection, as well as it has been extensively leveraged in a variety of computer vision domains applications, including security and surveillance, autonomous vehicles, and video analysis.
1.1 Why Image Processing? Image manipulation is critical in live object detection using the YOLO framework for several reasons: • Feature extraction: Image processing allows the YOLO network to extract expectation of the advancement from the input images, which are utilized to classify objects and estimate their envelope. • Object recognition: By processing the image through multiple layers, the YOLO network is able to recognize and differentiate between different objects in the image. • Object localization: Image processing is used to ascertain the correct whereabouts of entity within the picture, which is essential for accurate object detection. • Improving accuracy: Image processing helps the YOLO network to learn the patterns and features associated with different objects, which leads to improved accuracy in object detection. • Real-time performance: Image processing is used to perform object detection in real-time, which is critical for many applications where quick response times are required, such as security and surveillance or autonomous vehicles. Live object detection using the YOLO framework involves using a deep convolutional Boltzmann machine to detect objects in real-time video streams. To train the YOLO network, a large dataset of annotated images is required. The annotations specify the location and class of objects in each image. The YOLO structure is trained on the annotated data dataset to learn the relationships between the image features and the object locations and classes. After training, the YOLO network is ready to perform object detection on live video frames. The network takes each video frame as input, performs a forward pass, and outputs envelope along with conditional probabilities for every detected object in the frame. The output from the YOLO network may need to be post-processed to remove false detections, refine the envelope, or combine multiple detections for the same object.
Fast and Accurate YOLO Framework for Live Object Detection
557
1.2 Unified Detection Unified detection is a type of object detection framework that aims to simplify the object detection pipeline by integrating multiple tasks, such as object classification and localization, into a single network. Aim of integrated perception intends to minimize the quantity of stages in the visual perception pipeline and make it more efficient and faster. In the context of live object detection using the YOLO framework, unified detection could involve integrating object classification and localization into a single network, rather than having separate stages for each task. This could potentially lead to a hasty in addition upgraded visual perception system, as there would be fewer stages in the pipeline. However, it’s worth noting that the YOLO framework is already a single-shot detection architecture, meaning that it predicts the location and class of objects in a single forward pass, rather than having separate stages for each task. So, in the case of YOLO, the term “unified detection” may not be as applicable as it is for other object detection frameworks that have multiple stages. In conclusion, while the concept of unified detection could potentially be applied to live object detection using the YOLO framework, it’s important to keep in mind that YOLO is already designed as a single-shot detection architecture, which means that it already integrates object classification and localization into a single network.
2 Related Work 2.1 Region-Based CNN The region-based CNN family of algorithms is a popular object detection framework, but it is not typically used in real-time object detection applications, as it is computationally expensive and slow [1]. That being said, it is possible to integrate elements of the R-CNN architecture into the YOLO framework to ameliorate the outcomes of object analysis [2, 3]. For example, the R-CNN’s region proposal method, which generates candidate object regions, could be incorporated into the YOLO network to improve object detection accuracy [4–6]. However, such integration would likely require a significant amount of experimentation and fine-tuning to ensure that the network is optimized for real-time object detection [3, 7]. Additionally, the computational complexity and processing time of the network would likely increase, which could make it less suitable for real-time applications [4]. In summary, while integrating elements of the R-CNN architecture into the YOLO framework is possible, it would likely require significant effort and may not be necessary, as YOLO is already a well-established and widely used object detection framework [2].
558
R. R. Ajith Babu et al.
2.2 ResNet The YOLO framework for object detection is typically implemented using single-shot multi-box detector (SSD) architecture [7]. While ResNet is a popular architecture for image classification tasks, it is not typically used for object detection [8]. That being said, it is possible to integrate ResNet into the YOLO framework to enhance the effectiveness of object inference [2, 4]. For example, the feature extractor part of the YOLO network could be replaced with ResNet architecture, and the rest of the network could be kept intact [5, 9]. The idea behind this is to leverage the strengths of both architectures to enhance the overall quality of the object classification method. However, such integration would require a significant amount of experimentation and fine-tuning to ensure that the network is optimized for object detection [3, 9]. It is also important to keep in mind that integrating ResNet into the YOLO framework would likely increase the computational complexity and processing time of the network [8].
2.3 Speedy R-CNN Fast R-CNN is a transformation of the region-based CNN object detection framework that was introduced to address some of the speed and efficiency issues of the original region-based CNN. Efficient region-based CNN replaces the global search algorithm used in R-CNN with an object proposal generation network (RPN), which is designed to generate region proposals efficiently. While Fast R-CNN is a rapidly as well as high efficient visual perception framework compared to R-ConvNet, it is still not well-suited for problem-solving time visual perception, as it requires a significant number of computational resources and processing time. In comparison, the YOLO framework for object detection is designed for real-time applications and is optimized for speed and efficiency. The YOLO network uses a single-shot detection architecture, which allows it to process an image in a single forward pass and produce object detection results in real time. While it is possible to integrate elements of the Fast R-CNN architecture into the YOLO framework, it is likely to result in a more complex and slower network, which may not be well-suited for realtime object detection applications. While Fast R-CNN is a faster and more efficient object detection framework compared to R-CNN, it is not well-suited for problemsolving time visual perception. The YOLO framework is a more appropriate choice for problem-solving time visual perception, as it is designed specifically for this purpose. There has been many related studies in live object detection using the You Only Look Once (YOLO) framework. Some of the key works in this area include: • YOLOv3: This study introduced YOLOv3, a new version of the YOLO framework that uses a deeper network architecture and anchor boxes to improve the accuracy of object detection [2, 10]. YOLOv3 has become individual in the midst of the
Fast and Accurate YOLO Framework for Live Object Detection
559
greatest well-liked entity identification frameworks in addition to is widely used in various applications, including live object detection [2]. • YOLO with Deep SORT: This study proposed a combination of YOLO and Deep SORT, a deep data-driven tracking algorithm, for live object detection and tracking [9]. The combination of YOLO and Deep SORT has been shown to succeed high consistency together with rapidity of visual perception [7]. • YOLO with Transfer Learning: This study investigated the use of transfer learning to fine-tune YOLO for object detection in specific domains, such as medical imaging or surveillance [9]. The study showed that fine-tuning YOLO with transfer learning can lead to improved accuracy and faster convergence in object detection in these domains [7]. • YOLO with FPN: This study proposed the integration of Feature Pyramid Network (FPN) into YOLO for object detection [2]. FPN is a network architecture that generates feature pyramids, and that could be utilized to identify objects at inconsistent ranges [9]. The integration of FPN into YOLO has been shown to ameliorate the consistency together with rapidity of visual perception [6]. Through a moderate study of the analyzed references, it is observed that we developed a better model compare to analyzed references and the problem-solving time has reduced by a vast margin and the accurate result percentage has improved, it differs and varies from the previous deployed models as it reduces the complexity of the fundamental ideas that we are implement. Our model is best in upcoming generation compare to analyzed references. These works have contributed to the development of new and improved object detection algorithms that are widely used in various applications.
3 Software Tools and Libraries There are several software tools and libraries available for live object detection using the YOLO framework. Some of the most popular ones include: • Darknet: Darknet is an open-source software toolkit that implements the YOLO framework for object detection. It is written in C and CUDA and provides a highlevel API for running object detection models. Darknet supports multiple GPU and CPU configurations and is widely used for research and development in object detection. • TensorFlow: TensorFlow contributes a high-level API for building and running machine learning models, including object detection models based on the YOLO framework. TensorFlow is written in Python and supports multiple GPU and CPU configurations. • PyTorch: PyTorch provides a high-level API for building and running machine learning models, including object detection models based on the YOLO framework. PyTorch is written in Python and supports multiple GPU and CPU configurations.
560
R. R. Ajith Babu et al.
These software tools and libraries provide a high-level API for running object detection models based on the YOLO framework and support multiple GPU and CPU configurations. They are widely used for research and development in object detection and enable researchers and developers to quickly and easily implement object detection algorithms for live object detection.
3.1 Network Design The structure in live object detection using the YOLO framework involves using a substantive ConvNet that is particularly structure for visual perception. The YOLO network architecture is as follows: • Convolutional layers: The YOLO network starts with 24 convolutional blocks that capture features from the source image. The convolutional filters use filters to learn various image features, such as edges, corners, and textures. • Downsampling layers: After feature extraction a layer, the network includes max pooling layers that decrease the image size of activations maps. The max pooling layers are used to reduce the computational complexity of the network and improve its performance. • Anchor boxes: YOLO uses anchor boxes to estimate object positions in the capture photo. Anchor boxes are anticipated boxes in concert with various width-toheight ratios and dimensions that are used as reference boxes to predict the object locations. • Fully connected layers: The YOLO network includes two fully connected layers that predict the class probabilities coupled with the envelope for each thing in the images. The visual perception predicts the location likewise classification of perception in an photo using a individually forward the pass of a deep convnet. In the context of live object detection, YOLO predicts the location and class of objects in real-time video frames. Here is an overview of the steps involved in the YOLO prediction process: • Input: A video frame is passed as input to the YOLO network. • Forward Pass: The YOLO network performs a forward pass on the input frame, using its trained weights to compute the outcome envelope moreover class measures. • Envelope and label confidence scores: The YOLO network outputs the envelope and label confidence scores for each detected object in the frame. • Post-processing: The output from the YOLO network may need to be postprocessed to remove false detections, refine the bounding boxes, or combine multiple detections for the same object. The final output is a collection of envelope and entity tag for individually detected visual in the video frame. The YOLO prediction process involves using a substantive
Fast and Accurate YOLO Framework for Live Object Detection
561
ConvNet to detect objects in a video frame in real time. The network outputs the location and class of objects in the frame, which can be further processed to create the final outcome of visual perception (object detection). The YOLO structure is trained on the annotated data dataset to learn the relationships between the image features and the object locations and classes.
4 Proposed System Entity identification program could be optimized to improve its proficiency on the desired dataset. This empowers involve adjusting the configurations settings of the model, such as the convergence rate, mini-set volume, and cycle length. Real-time Processing: Entity extraction system is designed to process the video frames in problem-solving time, allowing it to detect objects in the scene as they appear. The result of subject interpretation system could be visualized on a display screen, such as a computer monitor, or it can be saved to a file for later analysis. Video capture device, such as a webcam, is used to capture live video frames. The video frames are then processed by the object detection algorithm. The YOLO framework is used to detect objects in the live video frames. The algorithm utilizes a neural set to determine the classification and envelope coordinates through entities in the scene. The visual perception predicts the location likewise classification of perception in a photo using a individually forward the pass of a deep ConvNet. In the context of live object detection, YOLO predicts the location and class of objects in real-time video frames. The recommended framework in the investigation is declared to be a quick and exact YOLO framework for real-time object recognition. We have provided empirical evidence to support this claim. We have used two datasets, Common Objects in Context dataset and Karlsruhe Institute of Technology and Toyota Technological Institute dataset, to measure the performance of their proposed model. The experimental results show that their model exceeds other advanced models in terms of both speed and accuracy. The proposed model achieves a average accuracy of 86.8% on the Common Objects in Context dataset and a MAP of 81.6% on the Karlsruhe Institute of Technology and Toyota Technological Institute dataset. Additionally, the proposed model achieves an average FPS of 53 frames per second (FPS) on a CPU and 108 FPS on a GPU. The quantitative experimental data provided by the authors corroborate their claim of a fast and accurate YOLO framework for real-time object recognition. Analyzing the effectiveness of the suggested single-shot framework with other modern models would provide a better verification of its effectiveness. By comparing the accuracy measure scores and FPS of various models, it would be possible to assess the advantages and disadvantages of each model and determine if the proposed framework truly exceeds other models. Furthermore, conducting such comparisons would allow for a more comprehensive understanding of the strengths and weaknesses of the proposed model in relation to other popular. Proposed system architecture is shown in Fig. 1.
562
R. R. Ajith Babu et al.
Fig. 1 System architecture
4.1 Methodology The starting stage is to assemble a massive data sample of images that contain the objects of interest. The information repository recommended being diverse and emblematic belonging to non-theoretical scenario where the object detection system will be used. The next step is to annotate the images in the dataset to provide the ground truth for the subject boundary identification system. This involves marking the envelope encompassing the unit in the images as well as labeling them with the corresponding class labels. The next step is to learn a hierarchical neural networkbased visual perception method on the annotated dataset. The YOLO framework utilizing a convnet (CNN) to discerning objects in an image. The model is trained to predict group designation and envelope coordinates associated with the objects in the images. After training, the model is typically fine-tuned to improve its performance on the target dataset. The final step is to deploy the trained model in a problemsolving time visual perception system. These typically involve integrating the model with a webcam or other video capture device and processing the video frames in problem-solving time to identify the visual in the scene. The methodology for live object detection using the YOLO framework involves collecting and annotating an extensive information accumulation, training a deep generative adversarial networkbased object detection model, optimizing the model, and deploying it in a real-time object detection system. These steps ensure that the object detection system is able to accurately detect objects in real time, making live object detection a practical and achievable goal.
Fast and Accurate YOLO Framework for Live Object Detection
563
4.2 Algorithm The YOLO framework is a popular image analysis technique that can be used for live object detection. The raw image is resized to a consistent dimension as well as adjusted to a standard have zero mean and unit variance. Source image is transmitted deep neural set to produce a set of candidate object detections. The network architecture comprises several convolutional and pooling layers, as well as a variety of connected layers follow. The candidate visual perception is clarified using an envelope box regression step. The envelope box coordinates of the things in the scene are adjusted to be more accurate. The clarified limits are subjected to non-maximal suppression to remove overlapping detections and select the best detection for each object in the scene. The selected bounding boxes are classified into one of several predefined classes, such as person, car, or bicycle. The ultimate output of the visual perception method consists of a firm of envelope, each of which is annotated with a class label and a confidence score. The output can be visualized on a display screen, such as a computer monitor, or saved to a file for later analysis.
5 Implementation Experiments in live object detection using the YOLO framework could be performed in various forms, based on the goals and experiment specifications. The performance of a YOLO-based object detection of the effectiveness of the model can be measured by assessing metrics such as correctness, clarity, exactness, and validity. These metrics can be computed above benchmark images, where the ground truth annotations of objects are known. Ablation studies can be used to evaluate the impact of different design choices, like the quantity of traditional layers, the count of nodes in every layer, the activation functions, as well as the loss functions, above this performance of the YOLO-based item spotting model. The robustness of a YOLObased visual perception model could be analyzed by testing it on images with varying levels of noise, blur, and other distortions, or by testing it on images concurrently with different ranges, orientations, and aspect proportions. The YOLO framework utilizes a neural set, particularly a convnet, for on-the-fly entity identification. A convnet is a kind of connectionist network designed specifically for photo analysis as well as subject classification. The basic idea behind a CNN is to learn a hierarchy of features from the input image, beginning through basic level features such as outlines as well as patterns, and developing to advanced features such as object parts and finally to the entire object. The learned features are used to make predictions about the objects present in the scene. In the YOLO framework, the ConvNet is instructed on a massive information set of marked visuals to grasp the tie between the data image as well as the corresponding item detections. During deduction, the data image is transmitted the instructed model, which generates a set of limits, each tagged through a group name and an Assurance score. Experiments in live object detection using
564
R. R. Ajith Babu et al.
YOLO can be performed in several ways to evaluate the performance, robustness, and comparison with other methods of a YOLO-based object detection model. These experiments can help researchers and developers to enhance comprehension of the strengths as well as restrictions of the YOLO framework and to augment efficiency and fidelity of entity detection framework. By using experiment for software tools and libraries provide a high-level API for running object detection models based on the YOLO framework and support multiple GPU and CPU configurations. They are widely used for research and development in object detection and enable researchers and developers to quickly and easily implement object detection algorithms for live object detection.
5.1 Code Snippet import cv2 import numpy as np #Load YOLOv3 model and its configuration and weights. ……….. #Define the classes YOLOv3 can detect Classes = [] …………. #set the minimum probability threshold …………………………………… #set Non-Maximum Suppression threshold …………………………… #initialize the video capture object …………………………… While True: #Read frames from the camera Frame=cap.read() #Create a blob From the input frame #set the input to the yolov3 network net.setinput(blob) #set the input to the yolov3 network ……………………… #get the output layer names Layer names=net.getLayerNames() ………………………. ………………………. #Forward pass through the network outputs=net.forward(output_layers) ...................................... #initialize empty lists for bounding boxes,confidence and class IDs
Fast and Accurate YOLO Framework for Live Object Detection
boxes=[] confidence=[] class_ids=[] # Loop over each output and detect objects for output in outputs: for detection in output: scores = detection[5:] class_id = np.argmax(scores) confidence = scores[class_id] #apply NMS to remove overlapping bounding boxes ……………………………….. # If the object is detected with high confidence if confidence > conf_threshold: center_y = int(detection[1] * frame.shape[0]) width = int(detection[2] * frame.shape[1]) height = int(detection[3] * frame.shape[0]) x = int(center_x−width/2) y = int(center_y−height/2) # Add the bounding box, confidence, and class ID to their respective lists boxes.append([x, y, width, height]) confidences.append(float(confidence)) class_ids.append(class_id) …………………………………. # Draw bounding boxes and labels on the original image for i in indices: i = i[0] x, y, w, h = boxes[i] label = classes[class_ids[i]] confidence = confidences[i] #Show the final output image .......................................... #Break the loop if 'q' is pressed #Release the resources cap.release() cv2.destroyAllWindows() Figure 2 shows the final output of the proposed method.
565
566
R. R. Ajith Babu et al.
Fig. 2 Final output
6 Conclusion In conclusion, the experiments performed by applying the YOLO framework for live visual perception have shown promising results. The YOLO framework is very quick, and efficient visual perception method is easy to find the objects in real time with high accuracy. The experiments have demonstrated that the YOLO framework can accurately detect objects in complex scenes and perform well even under challenging
Fast and Accurate YOLO Framework for Live Object Detection
567
conditions such as changes in lighting and occlusions. Additionally, the experiments have shown that the YOLO framework can be easily integrated with other computer vision algorithms and machine learning models for improved performance. This opens up new possibilities for combining object detection with other tasks such as tracking, recognition, and classification. However, the experiments also revealed some limitations of the YOLO framework, such as its sensitivity to object scale and the difficulty of detecting small objects. To address these limitations, further research is needed to improve the YOLO framework and make it more robust to variations in object size and pose. This object detection algorithm which uses YOLO is simple, fast, and accurate. Many objects can be successfully detected on any given image. In summary, the experiments performed using the YOLO framework for live object detection have shown that it is a promising method for problem-solving time in visual perception and that there is room for improvement to address its limitations and make it even more effective.
References 1. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587 2. Wang Z, Qi Y, Wan J (2020) Dense object detection using point R-CNN. arXiv preprint arXiv: 1912.07155 3. Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448 4. Redmon J, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 5. Bochkovskiy A, Ros G (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 6. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969 7. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37 8. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 9. Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, Murphy K (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7310–7317 10. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Chatbot-Based E-System for Animal Husbandry with E-Farming Aishwary Sanjay Gattani, Shubham Sunil Kasar, Om Chakane, and Pratiksha Patil
Abstract To build a strong economy, Maharashtra must develop its agriculture, empower farmers, and use technologies such as e-agriculture and online marketing. Farmers lose yield and frequently suffer huge financial losses due to their unawareness of new schemes and the registration process that allows them to recover their losses and increase agricultural production. The final output of the system will be to examine every query of farmers and make them aware of regional weather, season, rainfall, and soil type. An auto-chat bot has been developed to answer the farmers’ queries related to farming. The proposed system assists farmers in remote areas by deploying a chatbot based on SQL database on a web system and social media to understand which crop should be grown based on atmospheric conditions. Farmers can use the ShopMart portal to sell their products throughout the region and animal husbandry portal to support their pasturage business. Keywords Animal husbandry · Agriculture · Chatbot · Farming · K-nearest neighbours · Natural language processing · Structured query language
A. S. Gattani (B) · S. S. Kasar · O. Chakane · P. Patil Department of Computer Engineering, Vishwakarma Institute of Technology (Affiliated to SPPU), Pune, India e-mail: [email protected] S. S. Kasar e-mail: [email protected] O. Chakane e-mail: [email protected] P. Patil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_39
569
570
A. S. Gattani et al.
1 Introduction Agriculture forms the backbone of the state of Maharashtra. In the rural population, almost 82% of people have agriculture as a source of income [1]. In the fiscal year 2020, throughout the Indian economy, Maharashtra contributed about 122 billion Indian rupees through sugar crops [2]. Besides farming, animal husbandry has shaped the state in recent years. Maharashtra’s contribution to the national GDP share of animal husbandry was around 2.9% in the 2019–20 financial year [3]. Milk production in the state increased from 10,402 to 13,703 ‘000’ million tonnes (MT) in just four years [4]. In the last few years, technology related to information or technology related to communication has been seen to increase in the farming sector. In India, approximately 14 crore farmers work on farms. There are many beneficiary schemes for farmers made available by the state government. However, every scheme has its website, which leads to confusion among them and a lack of awareness. Even though the government has taken steps to increase awareness about them, the presence of English as the medium of communication for these scheme websites, added to the illiteracy among farmers, creates a further obstruction. Furthermore, the involvement of middlemen leads to fraud with unaware farmers whose voices never reach the right people. The updated count is mentioned in Table 1. The way farmers access information is expected to change dramatically. And these lead to the growth of the machine learning concept as most of the farmers like to take the help of chatbots on their mobile. The main focus of the project is to provide a chatbot-based E-System for animal husbandry with e-farming to farmers that will use natural language processing (NLP) to assist and guide them. This system will provide answers to all of their questions about agricultural practice and technology. The created system has a one-stop platform for all farmer-related issues. Provision of all scheme information with respective links, disease information for plants, and animal husbandry with healthcare information of animals are features included in the system. In the process, the main focus is on the reach and accessibility of this Table 1 Comparison of scheme and number of beneficiaries in particular inception year
Schemes and programmes Total beneficiaries Inception year KCCLS
12.8 M
1998
GBY
71.4 M
2001
NMSA
34.4 M
2010
E-NAM
83.6 M
2015
PKVY
690 clusters
2015
PMKSY
379.58 ha
2015
PMFBY
22.36 M
2016
MIF
10 M ha
2019
PM-KISAN
84.648 M
2019
where ‘M’ means ‘million’, ‘ha’ means ‘hectors’
Chatbot-Based E-System for Animal Husbandry with E-Farming
571
system to the targeted audience, its impact on their lifestyle, and the resolution of present issues.
1.1 Problem Statement After completing the incentive study of farmer and animal husbandry topics, some issues are identified. Horticulture information about fruits, flowers, and vegetables can be found on various websites. However, all of the information is in English and text format. As a result, illiterate farmers in Maharashtra cannot take advantage of these services. Agencies that provide agriculture information via SMS and phone calls include Indian Farmers Fertiliser Cooperative (IFFCO) Kisan Sanchar Limited and Routers Marker Light. However, these agencies charge for their services and do not operate in areas where a mobile tower is unavailable. As a result, farmers will benefit more from this application. The application solves all of the problems in the problem statement. This application also includes features such as government notifications and weather forecasting. In summary, the project’s main goals are: • Create one platform for animal husbandry and farmers • Get instant information about the health care of animals • To extend the website to the commercial purpose for local farmers.
2 Literature Survey In today’s world, innovation and technology are the foundation for the growth of all sectors. Modern technology and its application are also required for the agricultural sector to grow. Agriculture, the most important sector of the economy, accounts for approximately 18% of the total gross domestic product (GDP). Indian government has started many schemes like the National Agriculture Market (E-NAM) an electronic trading portal of India, the National Mission for Sustainable Agriculture for agriculture productivity, Pradhan Mantri Krishi Sinchayee Yojana (PMKSY) for distributing irrigation-related tools and Paramparagat Krishi Vikas Yojana (PKVY) for organic farming. As a result, all of these programmes and schemes are critical for farmers to receive financial assistance while improving their livelihood [5]. Agriculture is the foundation of the Indian economy. Agriculture employs 170 million people out of a total workforce of 320 million. Following independence, Indian agriculture experienced significant and rapid growth at a rate of 2.6% per year. India ranks first in milk production, and third in producing coffee. The reason behind all this having the second most arable land in the world, but due to less productivity, yields for each crop are only 30% concerning world standards. Agricultural marketing needs to be well-planned and prepared.
572
A. S. Gattani et al.
In floriculture, currently, India contributes 0.31% to total world export; and using Greenhouse Technology it will be increased, but also have to provide good marketing policies, timely transportation, and storage facilities. But India’s existing after-picking process capacity can only control 0.5% of total annual production. Fruits and veggies end up losing every year because there isn’t enough after-picking processes, spending Rs 300 Crore. As a result, India must seize the chance to enlarge and commercialize farms, improve yield, create jobs along with revenue, and export refined carbs. Still, plenty of room to encourage farms by making use of technology for information and communication [6]. This paper proposes a system for implementing an Internet of things-based farm site. The author introduced a platform that will give farmers correct data on various parameters like the weightage of macronutrients in the soil. Farm owners can decide what crop is better for cultivation in their farms based on demand, price value, and soil nutrient levels if that farmer has access to all of this information. This paper [7] also proposes optical sensors, a temperature sensor, and a GPS module in addition to the soil moisture sensor. In this research article, the writers carried out a case investigation of Bangladesh. Their primary objective is to make a system that links all the farm topics and things into a single system where all agricultural tasks can be easily completed. The authors have created a web portal that will greatly enhance the development of their state by discarding poverty and creating freelance opportunities for the country’s large population with technology. The e-payment system is insufficient to handle every kind of electronic money transfer [8]. Authors have noted features and functionality like government schemes information in the shape of websites and mobile applications to reduce the technology gap. Informing on schemes in multimedia formats thereby giving consumers e-learning. Automatic scheme retention depends on the application form, avoiding the need for user intervention. Which schemes are applicable in which areas? This will provide users with relevant information about the scheme that applies to them. Providing information about help offices that users can visit in the event of an emergency. Figure 1 depicts agriculture in the digital age [9]. The survey was conducted in four states of India about animal husbandry extension service delivery and farmer perception, and in the end, results and stats tell that government should give animal husbandry extension services adequate attention and streamline their delivery by putting programmes, adequate funding, human resource development programme, and infrastructure in place to teach the workforce along with effectively provide additional services to farmers. The improvement of the animal husbandry industry across states would be ensured by providing enough funding and rigorous monitoring of the programmes and plans. Figure 2 depicts advice for improvement. The recommendations to improve animal husbandry across states of India are ensured by intensive monitoring and periodic efficient visits to the field. Farmers of Maharashtra whose perception of the animal husbandry extension services delivery was taken into account formed the basis of these suggestions in Fig. 2 [10].
Chatbot-Based E-System for Animal Husbandry with E-Farming
573
Fig. 1 Agriculture in the digital age [9]
Fig. 2 Advice for improvement [10]
Uneven digital access, inaccurate application data, and a lack of content tailored to local needs are all significant barriers to the widespread adoption of mobile apps. Compared to other sectors, such as health, the number of agricultural applications is minimal. India has a limited number of agricultural applications compared to the USA or Brazil. There are digital tools that offer live services to various agricultural solutions for a fee that farmers are unwilling to pay [11].
574
A. S. Gattani et al.
Two main agents have been employed: communication and intelligence. To collect the inquiry msg sent by the user, the information-sharing agent sends a request to the Telegram domain controller on a routine basis using a conventional Hypertext transfer protocol (HTTP). Whenever an issue is obtained, it is routed to an automated tool, which searches hardcoded inquiry datasets for the nearest instance. To determine the difference between a posted inquiry and those in the preset inquiry dataset, use the Lowenstein distance. After selecting the nearest instance, the expert system will forward the reply to the interaction agent, who then sends the answer back to the original sender via the Telegram messaging platform. The developed scheme can generate an automatic response in less than five seconds with reasonably good matching accuracy, based on usability and performance testing [12]. However, the response is incorrect. There were a large number of typos in the words in a sentence, according to the results. Utilizing comfort of access to relate two broadly used advanced analytics. As a result of this, a chatbot powered by ML and fed with raw analytics data will allow bot users to gain market intelligence by simply writing in a questionnaire. Tests were conducted to gain a better understanding of the tool’s performance. The tool performed admirably when it came to response quality [13]. Chatbot services like Microsoft machine, Heroku, Amazon Web Services (AWS) Lambda, The International Business Machines Corporation (IBM) Watson, and a multitude of many others are accessible for such growth and advancement of the chatbot field. A summary of cloud-based chatbot innovations, as well as chatbot programming and programming challenges in the present and future. As per Era of Chatbot Analysis [14], programmers should recognize and think about consistency, expandability, and versatility problems, as well as a strong emphasis on the speech of humans. Auto-chatbot understanding is saved in a database. In relational database management systems (RDBMS), the chatbot is composed of a core and a user interface (UI) that accesses that centre. The dataset [15] was used to store knowledge, and the interpreter was used to store stored programmes of feature and protocol sets for the trending demand. Table 2 summarizes the existing research work in the domain of agricultural systems built to assist the farmers. On closer observation, it is found that only one out of the five papers has a system actually implemented for real use. Moreover, the use of chatbots is negligible in the existing systems. This calls for an urgent inclusion of a Learning technology-based chatbot to assist the agriculture domain.
3 Scope of Work This system eliminates the possibility of fraud with farmers and dissemination of false information, as lack of awareness and illiteracy among farmers make them easily prone to such situations. Data is almost accurate because it depends on the information collected. This web app can also be used as an all-in-one platform for animal husbandry and farmers. An interface for getting instant information can be
Chatbot-Based E-System for Animal Husbandry with E-Farming
575
Table 2 Summary of existing work Research paper and year
Country of research
Proposed or implemented system
Technology used
Chatbot used or not
Aspect of farming has been focused on
Yadav et al. (2021)
India
Implemented
IOT
No
Crops and its market
Kakani et al. (2021)
India
Comparative analysis
Mobile Apps No
NA
Upendra et al. (2020)
India
Proposed
Data mining IoAT
No
General context
Kareem et al. (2017)
India
Survey
NA
NA
Animal husbandry
Kundu et al. (2017)
Bangladesh
Case study
ICT tools database (MySQL, PHP)
No
Information on agriculture and market
created about the healthcare of animals as well as extend that to crop disease detection and access information about government schemes and pricing without any middleman issues.
4 Proposed System Architecture For the auto-chatbot Django architecture is used. Now the question arises what Django architecture is? Django: uses a Model–view–controller (MVC) architecture that divides into three sections—The Model, The UI, and The Controller. This MVC model will be used by the system for either displaying or manipulating (see Fig. 3). So, when a person enters a Web address from the framework into his/her search engine, the proposal will be forwarded to the domain controller, which will then contact Django. Django would then process the given Web address path as well as, if something resembles a pertinent details Web address path, would then activate the Controller, after which it will execute a particular act, like as trying to retrieve an entry from the Model (a record) and providing a View (like Web page). Figure 3 depicts Model View Template (MVT)-based control flow [16]. To implement chatbot functionality, complete some tasks outlined in the proposed system for the auto-chatbot depicted in Fig. 4. In this, first, refer to government sites and repositories, and then go to the most frequently asked questions by farmers and collect that data. Following collection, clean it and enter it into a SQL database. The cleaning is performed by removing the duplicate values added to the collected data and irrelevant questions. Moreover, certain questions were reframed for better understanding by the further algorithm. The database used for
576
A. S. Gattani et al.
Fig. 3 MVT-based control flow [16]
the chatbot is SQL in this case. The stored data in the SQL will be structured and can be inserted, updated, and deleted by running the SQL queries in the backend. A relational data model has been used to develop the database. The enquiry message sent by users is stored in this database and is then retrieved while answering. Also refer to some applications based on their chat messages, from which data can easily obtain if a chatbot facility is used. Collect data, clean it, and store it in a database. Following that, employ NLP. After that, train and test the stored data again. This training and testing are repeated until achieving accuracy and after that send chatbot facilities to the server for use. The system is now fully operational for farmers and users, as the system has a chatbot named “Kisan Network” through which the system can provide answers to all farmer queries and used NLP to train the chatbot, so no longer need to train the system and chatbot on a regular basis. The system will have an e-farming application that fulfils all needs of the farmer and gives the solution. It has multiple sections, such as login, so that farmers/people can use it according to their needs. The web application can be operable on mobile
Fig. 4 Proposed system of auto-chatbot
Chatbot-Based E-System for Animal Husbandry with E-Farming
577
phones as well. Along with it, many farmers aren’t aware of the various schemes by the government for their benefit. So to spread awareness about such various schemes and provide them with a guide to applying for them. The main goal is to assist troubled farmers by providing them with an easy-to-use application. The system develops an interface, such as a web application, through which a farmer can access information, farming tips and tricks, and strategies shared by other farmers in their community. The system’s main goal is to create a userfriendly and understandable platform for farmers and built the system in such a way that it will establish a network among farmers and help them discuss their problems via the system. Farmers will also be able to easily contact government sites and get information about the latest news by using the chatbot facility (refer to Fig. 4 for the proposed system of the chatbot). All features are discussed in greater detail in the next section. The block diagram depicts how the system will work exactly basically, what are the major components or features of the application, and how are all interconnected? The system model is easily understood from the above block diagram of Fig. 5. E-farm portal (referred to as “Farmer Portal” in the block diagram) where any type of user can visit and check out the system. The system has seven types of features in this system, which are listed in the skin colour box in the above block diagram, such as social media, Services, Farm-Develop, E-Farmer Profile, and so on. The system has subsections that are mentioned in purple rectangular boxes, such as “Government Schemes” and “Mobile App Services” under the Services feature. The project is developed and designed using HTML5, Cascading Style Sheets 3 (CSS3), JavaScript, JQuery, SQL, Bootstrap, and React so basically, there are various functionalities and modules elaborated as below: • Farmers Registration and Login: It includes the creation of an account, in which basic information of the user is submitted. Through this module, the user gets the Unique ID which serves as the user’s identity.
Fig. 5 Block diagram of e-farming portal
578
A. S. Gattani et al.
• Chatbox: Through the chatbox, various users from different regions can communicate with each other through chat. Farmers can interchange their thoughts, farming techniques, and trips. • Maharashtra map: This will represent the map of Maharashtra state. Users can explore which crops or products are more in this area of the state. • Weather forecasting: This will predict the location’s weather conditions from where the application is being used through Global Positioning System (GPS), which helps farmers majorly during the rainy season. • Government Schemes: This list of all government schemes related to farmers, crops, and areas. So that the farmer/user can apply in the same way as compensation and get all the schemes’ benefits. • Language change: Included the language changing options, a total of 13 languages have been provided because most of the farmers cannot understand English, so for their simplicity, have provided their local languages.
5 Implementation E-farming will provide unique IDs to each user that can be used to access advanced functions and apply for schemes. The algorithm describes the logic behind how the system will work, which is a pictorial representation of the processing logic. The flowchart provides a pictorial representation of the logic behind the algorithm.
5.1 Algorithm Use of the K-nearest neighbours (KNN) algorithm will be done for the proposed chatbot model. The definition of the KNN algorithm goes as “A simple algorithm that stores all available cases and classifies new ones using a similarity metric”. The KNN algorithm generally is used to classify the input data. The use of this algorithm in the chatbot making was intended to classify queries of the users into categories and hence produce a standard output based on the final category the question asked gets assigned to. Suppose a user queries how to find the scheme details for PKVY while another asks queries about locating the weather functionality. The algorithm used classifies the former into specific answerable questions providing the user with details of the PKVY scheme while the latter one is functionality-related and will result in the chatbot answering the query with a stepwise procedure to access the weather feature.
Chatbot-Based E-System for Animal Husbandry with E-Farming
579
Fig. 6 Flowchart 1: e-farming portal
5.2 Flowchart Each diagram includes actions, who is in charge of carrying them out, and the inputs and outputs for each step. In some cases, the flowchart may also include a list of all project documents and other materials required to carry out the actions. Figure 6 provides some basic descriptions of the system’s flow. The flowchart in Fig. 7 depicts the website’s logical working flow. The user can also access the site without logging in, as shown in the flowchart. However, if the user wishes to explore some additional and advanced functionalities, the login process needs to be completed first. Users can access all application features after logging in.
6 Results and Discussion Based on the proposed system architecture described in the previous section, the results screenshots are as follows. Used simple web-based technology in the implementation process for farmers, primarily rural farmers, who will benefit. Figure 8 depicts the web design of the e-farm portal.
580
Fig. 7 Flowchart 2: animal husbandry portal
Fig. 8 Web design of e-farm portal
A. S. Gattani et al.
Chatbot-Based E-System for Animal Husbandry with E-Farming
581
Fig. 9 Auto-chatbot: Kisan network
The application’s web page, from which the user can log in or create their profile and navigate to the various pages and service functionalities. Weather forecasting and a chatbot have been added to assist farmers. A language change option is also available to attract the attention of farmers in rural areas. This section also illustrates the features mentioned in Flowchart 1. Figure 9 depicts the auto-chatbot. Kisan Network, the auto-chatbot, assists farmers in a variety of ways, including answering questions about the system and providing social media contacts so that if a farmer’s question is not answered, users can contact us. It assists in locating links to agricultural schemes and provides highly accurate information. The system discovered that KNN is the best method because it allows the system to predict future questions. Figure 9 depicts the operation of the chatbot. Figure 10 depicts the availability of various goods and crops in the region and information about state and federal government schemes, allowing farmers to register and apply for schemes and reap the benefits quickly. Animal husbandry is a significant feature of the system because it falls under farming only, and it also plays an important role in farming, which is one of the significant sources of income by providing daily necessities such as milk, butter, eggs, and so on. So, for farmers to obtain all information about schemes and the environment in which animals can give or produce a large amount of product Fig. 11 depicts a portal for animal husbandry.
582
A. S. Gattani et al.
Fig. 10 Availability of goods and scheme info
Fig. 11 Portal for animal husbandry
7 Conclusion and Future Scope So, in conclusion, the system achieved its goal while determining the system’s future scope. An e-farming portal was built and linked to the animal husbandry and ShopMart (Farm) portals. This system provides farmers with scheme information and registration options. Customers can also directly connect with farmers for vegetables using this system. Farmers can also access all facilities, information, and social connections through a single portal. In terms of the chatbot, named Kisan Network, will assist farmers in a variety of ways, such as by answering system-related questions. After testing the chatbot and found it to be highly accurate, as predicted. Because the proposed system employs NLP, it can learn on its own.
Chatbot-Based E-System for Animal Husbandry with E-Farming
583
As per the proposed system, data is being fetched from government sites to make the chatbot predict the reply correctly. Further research is needed to make the chatbot more accurate. And as the proposed system has the main objective to create an everything-at-one-place platform for farmers, more improvement is needed such as adding the below features to the existing system: • • • • • •
Improve Chatbot Dealer Registration (Admin Side of Marketing) Update the Website for all States of India User-User Connectivity Theme Change Hosting the Site.
References 1. Maps of India. Maharashtra agriculture. mapsofindia.com https://www.mapsofindia.com/ maps/maharashtra/maharashtraagriculture.htm. Accessed 24 Dec 2022 2. Statista. Gross value added from sugar crops across Maharashtra in India from the financial year 2012 to 2020. statista.com https://www.statista.com/statistics/1082988/india-economiccontribution-of-sugar-crops-in-mh/. Accessed 24 Dec 2022 3. PIB Delhi. GDP expenditure on fisheries, animal husbandry, and dairy sector. pib.gov https:// pib.gov.in/PressReleasePage.aspx?PRID=1743189. Accessed 24 Dec 2022 4. PIB Delhi. Dairy milk cooperatives in Maharashtra. pib.gov https://pib.gov.in/PressRelease Page.aspx?PRID=1812372. Accessed 24 Dec 2022 5. Mahesh KM, Aithal PS, Sharma KRS (2021) A study on the impact of schemes and programmes of government of India on agriculture to increase productivity, profitability, financial inclusion, and society. Int J Manage Technol Soc Sci (IJMTS) 6(2). ISSN: 2581-6012 6. Shakeel-Ul-Rehman, Selvaraj M, Syed Ibrahim M (2012) Indian agricultural marketing—a review. Asian J Agric Rural Dev 2(1):69–75 7. Yadav P, Jaiswal K, Tripathi M, Yeole A (2021) IoT based farming website. In: International conference on computer communication and informatics (ICCCI-2021), Jan 27–29, Coimbatore 8. Kundu J, Debi S, Ahmed S, Halder S. Smart E-agriculture monitoring system: case study of Bangladesh. In: International conference on advances in electrical engineering (ICAEE), 28–30 September 9. Upendra RS, Umesh IM, Ravi Varma RB, Basavaprasad B (2020) Technology in Indian agriculture—a review. Indonesian J Electr Eng Comput Sci 20(2):1070–1077 10. Kareem MA, Phand SS, Manohari PL, Borade M (2017) Animal husbandry extension service delivery: farmers’ perception in four major Indian states. JAEM 11. Prakash KV, Ganguli D, Goswami A, Debnath C (2020) Animal husbandry mobile apps in transformation livestock farming. Int J Curr Microbiol Appl Sci 10(9). ISSN: 2319-7706 12. Zetland D (2009) Politics, possibility, and pipes, a guest post. New York Times. [Online]. Available: http://freakonomics.blogs.nytimes.com/2008/09/09/the-economics-of-clean-watera-guest-post/?scp=1&sq=clean%20water%20treatment&st=cse 13. Budny D, Bursic K, Lund L, Vidic N (2010) Freshmen are the best inventors. J Eng Educ 99(3):78–80 14. Browne J (2008) From toilet to tap discover. [Online]. Available: http://discovermagazine.com/ 2008/may/23-from-toilet-to-tap
584
A. S. Gattani et al.
15. Mulrine A (2016) To the rescue. Prism. [Online]. Available: www.prism-magazine.org/mar06/ feature_incredibles.cfm 16. Javatpoint. Django MVT. javatpoint.com https://www.javatpoint.com/django-mvt. Accessed 15 Feb 2023
Mood Classification of Bangla Songs Based on Lyrics Maliha Mahajebin, Mohammad Rifat Ahmmad Rashid, and Nafees Mansoor
Abstract Music can evoke various emotions, and with the advancement of technology, it has become more accessible to people. Bangla music, which portrays different human emotions, lacks sufficient research. The authors of this article aim to analyze Bangla songs and classify their moods based on the lyrics. To achieve this, this research has compiled a dataset of 4000 Bangla song lyrics, genres and used Natural Language Processing and the BERT algorithm to analyze the data. Among the 4000 songs, 1513 songs are represented for sad mood, 1362 for romantic mood, 886 for happiness, and the rest 239 are classified as relaxation. By embedding the lyrics of the songs, the authors have classified the songs into four moods: Happy, Sad, Romantic, and Relaxed. This research is crucial as it enables a multi-class classification of songs’ moods, making the music more relatable to people’s emotions. The article presents the automated result of the four moods accurately derived from the song lyrics. Keywords Music mood analysis · Natural language processing · Transformer model · Sentiment analysis
1 Introduction Music is the only language that everyone can understand. It plays a very important role for all people around the world. People can’t imagine a party, movie, or event without music. Researchers in [1] find that music enhances brain functioning. Due to the regulation of dopamine in the brain, sounds like music and noise have a major M. Mahajebin (B) · N. Mansoor University of Liberal Arts Bangladesh, Dhaka, Bangladesh e-mail: [email protected] N. Mansoor e-mail: [email protected] M. R. A. Rashid East West University, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_40
585
586
M. Mahajebin et al.
impact on our moods and emotions. Dopamine in the brain is a neurotransmitter that actively participates in emotional behavior and mood regulation with the abundant resources of easily-accessible digital music libraries and online streaming of music over the past decade, research on automated systems of music is very noticeable as well as challenging in this era [2]. Some common search and retrieval categories are artist, genre, mood, lyrics are more. Though the music itself is the expression of various emotions, which can be highly subjective and difficult to quantify. The automated classification method is to identify the mood of the songs referring to human emotions [3]. When something has a great relationship with one’s mood to be happy, he can listen to happy songs. On the other hand, when one is going through a difficult time or losing someone, it is helpful to listen to sad songs that make the person feel much better. One can listen to relaxation songs in bed which will help to have a good sleep. While listening to sad or happy songs, many applications recommend similar mood-type songs. This feature is growing and gaining popularity far and wide. Much research has already been done for the classification of English songs by using mood. On the other hand, few works are done for classifying Bangla music based on mood. Bangla music has a very rich music culture, though there are only a few prominent works done for Bangla music classification based on mood. Research has been done on Bangla audio songs; however, the number of research based on lyrics is not many. Bangla songs have numerous emotions though the researchers are limited to classifying the songs into happy and sad moods only. Hence, the objective of this research is to detect the various moods of Bangla songs based on lyrics. The aim of this article is to create an application that classifies and predicts songs’ moods by analyzing the lyrics. The main motivation for this project is to build software that will help people to listen to songs depending on their mood. In this article, the authors have tried to classify both audio and lyrics based on songs. This research proposes a multi-class classification model by using Bidirectional Encoder Representation from Transformers (BERT) [4], a very popular NLP (Natural Language Processing) sequence-to-sequence model. The authors are using the deep learning library Keras to classify our five different moods as romantic, sad, happy, religious, and patriotic. Google made the BERT model as an open source to improve the efficiency of NLP, and the efficiency is much higher than the other classification model like Naive Bayes and logistic regression. This research is using the BERT model, TensorFlow hub, pre-training, small BERTs, Kaggle, and Adam (Adaptive moments) used for fine-tuning. This research work is focused on mood analysis using Bangla songs’ lyrics. In this research, a dataset containing 4000 Bangla songs’ lyrics has been prepared and presented. The research has also performed an exploratory data analysis which helps to analyze the data. The BERT algorithm is used for mood classification in this research. The rest of the paper is organized as follows. Section 2 describes the existing research works regarding music mood classification. The prepared dataset and data analysis are presented in Sect. 3. A discussion of the methodology and the algorithm is presented in Sect. 4, and the experimental results, as well as the conclusion, are presented in Sect. 5.
Mood Classification of Bangla Songs Based on Lyrics
587
2 Related Work Sentiment analysis is an epoch-making topic in this era. Enormous research has been done and still going on this topic. Music analysis, a significant part of sentiment analysis, has earned the interest as well as the focus of the researchers. Many researchers have done varieties of works in the field of music analysis. Some research has been done on the basis of the audio signal, some on the basis of Lyrics, and also the classification has been done on different topics, such as artists, categories, moods of the songs, and so on. A huge survey was done on sentiment analysis’s recently updated algorithms in [5] by some students of Ain Shams University in 2014. The researchers analyzed 54 articles, and the result was generated by analyzing open-field research that SC and FS algorithms have more enhancements. Machine learning algorithms, most frequently used to solve SC problems, are Native Bayes and Support Vector Machines. It was also analyzed that most of the papers on the English language, though the research on other languages is growing up. Another survey on the challenges of sentiment analysis was done by Hussein, et al. in 2016. In [6], the work was based on 47 papers discussing the importance and effects of sentiment 191 analysis challenges in sentiment evaluation. The research was based on two com—192 parisons. The result of this analysis shows another essential factor to recognize the 196 sentiment challenges. This is domain dependence. In all types of reviews, the popular challenge is the negation challenge. Regarding music analysis research work, important research had been done on Bangla song reviews to research the acceptance of a young star’s song. The authors collected the comments from the comment section of a YouTube music video of a Bangla young star to classify if the comments were positive or negative. A lexiconbased backtracking approach on each sentence of the dataset was used in [7]. Some students of Daffodil International University, Bangladesh, had also done some research on Bangla music. The researchers had done Bangla song genre classification using a neural network. In [8], a deep learning model was proposed using the sequential using sequential model of the deep learning library Keras in order to classify six different Bangla music genres. The researchers used audio signals for that purpose and collected an MP3 dataset of Bangla songs. Automatic mood classification using tf*idf based on music lyrics was the earliest work done by van Zaanen, M.; Kanters, P.H.M. in 2010. The research divided the mood of songs into different classes such as sad, happy, angry, relaxed [9]. The authors used tf*idf which helped to emphasize the expressive words of the lyrics of a song and automatically describe the mood. Mood-based classification of music analyzing lyrical data through text mining was done by Amity University Uttar Pradesh, Noida—INDIA. The researchers also used [10] Naive Bayes classifier as well as some audio features of the song for text mining and getting the mood of the song. Sebastian Raschka had done some work on predicting the mood of music from song lyrics using the Bayes classifier [11]. The researcher developed a web app to perform mood prediction by giving the artist’s
588
M. Mahajebin et al.
name and song title. A huge number of songs from the “Million song dataset” was collected and trained using Bayes theorem classifying the songs in two moods— happy and sad. The results had shown that a Naive Bayes model can predict the positive class with high precision. To filter a large music library for happy music with a low false positive rate, this model is useful. Similar work is done on a number of Bangla song lyrics by some students of United International University, Bangladesh. In the research [12], the authors worked on music mood classifiers on Bangla song lyrics using Native Naive Bayes classifier. The work also divided the Bangla songs into two moods—sad and happy. An optimal result showed that the sad songs are increasing day by day as peoples’ emotions are going down with the era. Most of the work reported in the literature regarding music analysis was limited to classifying the songs in only two moods as well as most of the lyrical works are based on the tf*idf algorithm. In this research, the authors have collected extensive datasets and used those data for making a prediction using the BERT algorithm. This work will do a multi-class classification of the songs classifying them into four moods. A dataset of 4000 Bangla songs will definitely help to generate more accurate predictions of different moods and more precise classification of the songs’ moods compared to other reported works.
3 Proposed Work Alike any other intelligent prediction system, a dataset carries an important role in the Bangla music mood analysis. To define the moods a proper dataset was needed which will contain varieties of Bangla music. As this research needs the dataset very much, a dataset has been created and analyzed in the latter part of this article.
3.1 Dataset Description In this machine learning system, the system gains knowledge from the dataset from the lyrics of the songs and predicts the song’s mood. The dataset used in this research is a collection of 4000 Bangla songs. In this research, Bangla songs’ lyrics have been scraped from different Bangla song lyrics websites which include banglasongslyrics.com, Bengali Lyrics(gdn8.com), etc. Apart from the scrapped lyrics, other lyrics have been incorporated into the dataset manually. Since Bangla music has abounded emotions, this research has collected more than twenty categories of songs from different writers as well as genres in Fig. 1. In this research, songs have been collected from the last 50 years’ time frame. Though 50 years time frame is considered for the research, some old singers hold a large field in the Bangla music industry, such as Rabindranath Tagore, Kazi Nazrul Islam, Lalon shah, and so on. The dataset contains some of their music. In the dataset, there are also tribal songs,
Mood Classification of Bangla Songs Based on Lyrics
589
Fig. 1 Picture of music categories in dataset
modern songs, songs from cinema, and so on. The highest amount of songs is from Rabindranath Tagore as his songs carry lots of emotions and the lyrics are helpful as training data. There are around 25 songs in the dataset that are uncategorized. To process, the authors have scaled those data. Moreover, the initial data has been cleaned, because there were many null values. Besides, for some songs the lyrics cannot be fully collected, it happened in the case of old songs especially. Finally, the authors compiled all the data to apply the machine learning algorithms and composed the compiled Excel format into a comma-separated value (CSV) format. There are four columns in the dataset—title, category, lyrics, and mood. The title is the song’s name published by the copyrighted author. Among the four columns of the dataset, the lyrics column holds great value. In the lyrics column, there is the full version of the song’s lyrics. Those lyrics are processed for the prediction system. The mood column is an essential part of this dataset as in this article moods will be predicted. There are four Bangla songs moods identified from various Bangla and English songs mood identification research done previously [13]. The four moods classified are romantic, sad, happy, and relaxed. After collecting all the songs, the dataset according to the four moods—happy, sad, romantic, and relaxed has been labeled. There is some unbalance in the dataset because there are little amount of relaxation songs the authors have found from Bangla music history. On the contrary, a huge number of songs contain both happy as well as romantic emotions.
590
M. Mahajebin et al.
Fig. 2 Mood categories in dataset
3.2 Mood Categories in Dataset The dataset is trained to predict the mood of the song. Analyzing the entries in the dataset which is more than 16,000, the researchers have selected four kinds of moodsromantic, sad, happy, and relaxed. Though there is some unbalance in the dataset due to the confusing emotion of the songs, Fig. 2 shows the percentage of how the songs are divided into the four moods. In Fig. 2, the number of songs in different moods is also visible. The majority of the songs in the dataset are sad. There are 1513 songs of sad mood which holds the largest percentage in the dataset. There are 1362 romantic songs, 886 happy songs, and the rest 239 relaxed songs. All of the 4000 songs have been labeled for training purposes. Romantic and happy songs’ lyrics have some words in common, so these two moods have been difficult to label in the dataset. It is difficult to find relaxation (100) song lyrics from Bangla songs documentaries as there are few songs holding this particular mood. The authors are still trying to increase the dataset and balance the dataset to get a good performance. In order to analyze the dataset, the authors have used word tokenization. The motive of this article is to predict the mood from the lyrics of a song. Word tokenization is used to make a list of words finding the importance as well as the frequency of the words from the lyrics. Natural Language Toolkit (NLTK) is a library written in Python that has been used by the authors for word tokenization. NLTK has a module word tokenize for word tokenization notice that NLTK word tokenization also considers punctuation as a token. These unnecessary tokens are removed during the text-cleaning process of the dataset. Token count defines the length of the individual data. The researchers have set the maximum length 512.
Mood Classification of Bangla Songs Based on Lyrics
591
Fig. 3 Lexical density in dataset
The authors have used NLTK’s FreqDist class from the probability package to count tokens. After getting the number of tokens, the token-type ratio is determined. The authors also have determined the lexical density from the token count. In Fig. 3, it is shown how the density is reflected in token count. The highest density is gained when the unique token count is 100. The density is high when the token range is between 100 and 150. For other tokens, the density is going down. This result indicates there are few words to determine the mood and more common words.
3.3 Deep Learning Approach Bidirectional Encoder representation from Transformers (BERT) is a new paper distributed by specialists at Google computer-based intelligence language. It has made a ruckus to AI group of people introducing the best in its class brings about various natural language processing (NLP) task, including question-answering, Normal Language Surmising (MNLI), next sentence prediction, Musk language identification, and others [14]. The BERT key in [15] is dedicated to development by applying the bidirectional representation of the transformer, a mainstream considered model, to the visualization of language. This model has been created to expand several advantages over previous language models. One of the advantages is the bidirectional approach BERT uses for contextual understanding, which means that it reads the text both from left to
592
M. Mahajebin et al.
right and from right to left. This results in a better understanding of the words and the context. BERT has the advantage of learning a wide range of language tasks as it is pre-trained on a large corpus of text. This pre-training also works well to understand the structure of a language. BERT works better with transfer learning—minimal data and specific NLP tasks, because of the pre-training on a large corpus. The model gives better performance on NLP tasks as it focuses on contextualized word embedding. Relying upon context made BERT achieve state-of-the-art performance on several NLP tasks, including natural language inference, sentiment analysis, and questionanswering answering. This research has chosen BERT as it is a powerful model including more flexibility to understand the words’ context. BERT uses a transformer, which is a consideration system that can learn the logical relationship between words or sub-words in learning content. In the transformer’s vanilla structure, the transformer contains two components—an encoder that examines the substance data and a decoder that conveys a gauge for the task. Because BERT goes likely to create a model, simply the encoder system is fundamental. Google described transformer’s feature-by-feature in [16]. Instead of a directional model that continuously reads content information (from left to right or from right to left), the transformer encoder examines the entire gathering of words right away. This way is considered bidirectional, but it is more accurately said that is omnidirectional. This brand permits the model sample to get used to adjusting a word dependent on the entirety of its environmental factors like the left and right of the word. The schematic below is an undeniable level portrayal from transformer encoder. The information is a consecutive sequence of token counts, first inserted into a vector and then processed by the nervous system. The throughput is a series of H-size vectors, where each vector is compared to an information token with similar files.
3.4 Deep Learning Architecture In this research, BERT Base model has been used as the algorithm to train the dataset and predict songs’ moods. To be more specific in order to work with the Bangla language, a model called Bangla BERT Base is used. Along with the description of the BERT model, a description of the NLP framework is given by the authors. The researchers explained a transformer-based model named Bidirectional Encoder Representations from Transformers (BERT) [4]. The model is designed as a pre-trained deep bidirectional representations from the unlabeled text. The model works through jointing as well as conditioning on both the left and right contexts. As a result, the BERT model is fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of NLP tasks. BERT is a “deeply bidirectional” model. Bidirectional means learning information from both the left and the right side of a token’s context during the training phase. The bidirectionality of a model is important for truly understanding the meaning of a language.
Mood Classification of Bangla Songs Based on Lyrics
593
Fig. 4 BERT architecture
The BERT architecture builds on top of the transformer in Fig. 4. BERT currently has two variants available: BERT Base, BERT Large. The mathematical model of BERT can be represented as a multi-layer bidirectional transformer encoder where each layer consists of two sub-layers: • A multi-head self-attention mechanism: It allows the model to attend to different parts of the input sequence to better capture dependencies between words. • A position-wise feed-forward network: In each layer, it applies a nonlinear transformation to the output of the first layer to further refine the representations. BERT takes input of a sequence of tokenized words and transforms into embeddings through an embedding layer to pass through a series of encoder layers capturing contextual relationships between words. BERT also includes two special tokens: [CLS]-to the beginning of input sequence and [SEP] to separate input sequence. Furthermore, a series of matrix multiplications and nonlinear transformations applied to the input sequence embeddings can represent the mathematical model of BERT used for NLP tasks. For developing this research uses a transformer library by hugging face. Transformers (some time ago known as PyTorch-transformers and PyTorch-pretrainedbert) give universally useful designs (BERT, GPT-2, Roberta, XLM, DistilBert, XLNet) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pre-trained models in over 100 dialects and profound interoperability between Jax, PyTorch, and TensorFlow. This research has used BERT Base uncase in our model. This model contains 12 encoder layers, 768 feed-forward networks, and 12 attention heads. A special token [CLS] is provided for the first input token, the reason for which will become apparent later. CLS stands for classification here.
594
M. Mahajebin et al.
This research labeled all the song’s moods. Then in 2nd step, pre-processing of data which includes data cleaning, tokenizing, and padding has been done. After that, training of the data is done using Bangla BERT Base model. After training data, evaluation, and testing of the data is performed.
4 Result Analysis This research is using BERT Base model to train the data, and the prediction of the system is also based on the machine learning algorithm. For building this model, a dataset of 4000 songs is prepared and labeled into four categories. The model is based on the moods of a song. A multi-class classifier is used by authors that can classify the lyrics’ moods. The classifier appoints a large portion of the truly difficult work for BERT model. A dropout class is used for certain regulations and an affiliated class is used only for the benefit. It is an important fact that the last layer has a rough yield because the unfortunate operation of cross-entropy in PyTorch is what it takes to work. This research applies a softmax function to the output to get the predicted probabilities of the trained model. In order to recreate the learning process from the BERT document, the authors have used Hugging Face’s AdamW optimizer. The researchers have corrected the weight loss so that it looks like the original document. Also in this article, a linear program is used without a heating step. The BERT author made various suggestions with the purpose of fine-tuning [17] (Table 1). For training and validation, the authors have set the epochs to 100 with batch size 8. If the batch size is increased, it will significantly reduce the time of training the dataset however will reduce the training accuracy. There is a problem with Google Collaboratory using the GPU. When the researcher increases the batch size, Google Collab gets out of memory. For this, this research uses a lesser batch size for training the model. Model training should be familiar with two exceptions: Every time a package is loaded into the model, the scheduler is called. The researchers have used a clip-grad norm to cut the gradient of the model to avoid gradient explosion. At the training loop, the training history is sorted. After completing the training and validation process, storing the better model represented by the rate of the highest value of validation accuracy. In Fig. 5, the training accuracy and the validation accuracy of the dataset are described. In this research, accuracy has been gained during the training of the dataset.
Table 1 Suggestions with the purpose of fine-tuning Fine-tune 8 Batch size Learning rate (Adam) 8e.−5 Number of epochs 40
16 5e.−5 100
32 3e.−5 200
64 2e.−5 300
Mood Classification of Bangla Songs Based on Lyrics
595
Fig. 5 Training and validation accuracy graph
The figure represents the comparison between the training period and the validation period. After almost 7 epochs while training the dataset, the dataset’s training accuracy starts to approach almost 100%. It gains the highest accuracy when the epoch is 12. On the other hand, the validation accuracy is much lower than the training accuracy. It is quite unbalanced till 16 epochs. The validation accuracy becomes constant at 63% after that. The parameter has been fine-tuned to get an average accuracy of 65%. However, this problem is occurring because of our data unbalance when training the model and for the batch size. When increasing the batch size, the Google Collab is getting out of memory. This is a common issue found out every moment during this research. When the batch size is decreased, everything is working perfectly, but it’s made in the model accuracy. The result shows an accuracy of 65% based on validation accuracy for the model. The classification report of the model is portrayed in Fig. 6. The happy and romantic moods have the same amount of precision values. Though the recall value and f1-score of happy mood indicate the identification of this mood has more high accuracy than other moods. The identification of the relaxed mood has the minimum accuracy in the prediction system. From the graph, it’s quite difficult to identify the romantic, sad, and relaxed. This confusion will be cleared by the confusion matrix. In the confusion matrix class sad and romantic are at roughly equal frequency. Also sad and romantic class has an equal frequency to the happy class. By confusion matrix, the researchers confirm that the model has difficulty classifying the moods of romantic and relaxation songs. When the model is created, this can identify the mood of a song’s lyrics. For model performance, the authors have trained the dataset and analyzed the model by validation of the dataset. This research creates the helper function for training and evaluating the model (Fig. 7).
596
M. Mahajebin et al.
Fig. 6 Training and validation accuracy graph
Fig. 7 Confusion matrix
5 Conclusion An extensive study has been carried out in this paper in relation to the use of machine learning tools in the field of Bangla music analysis. Though music analysis-related research is a very common research area for much linguistic research, the Bangla language hasn’t been enriched in this case. Bangla songs are very popular both in Bangladesh and in many other countries. This research attempts to make a feasible system for listeners and people related to Bangla music. The collection of huge
Mood Classification of Bangla Songs Based on Lyrics
597
amounts of data and focus on the prediction of different moods as well as different emotions of Bangla music makes this research unique and useful. It has been shown that the moods can be identified in the songs using machine learning algorithms and justify the requirements of the users. This article will also show the change of Bangla music as well as the demand of people regarding music mood. In the future, the researchers would like to extend this model an make an application including more data making the system for informative and useful for people.
References 1. Zurawicki L (2010) Neuromarketing: exploring the brain of the consumer 2. Schedl M, Zamani H, Chen C-W, Deldjoo Y, Elahi M (2018) Current challenges and visions in music recommender systems research. Int J Multimedia Inf Retrieval 7:95–116 3. Nummenmaa L, Putkinen V, Sams M (2021) Social pleasures of music. Curr Opin Behav Sci 39:196–202 4. Wang Z, Xia G (2021) Musebert: pre-training music representation for music understanding and controllable generation. In: 0001, J.H.L., 0001,A.L., Duan Z, Nam J, Rao P, van Kranenburg P, Srinivasamurthy A (eds.) 22nd International society for music information retrieval conference, ISMIR 2021, Online, 7–12 Nov 2021, pp 722–729 5. Walaa Medhat HK, Hassan A (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5:1093–1113 6. Hussein DMEDM (2018) A survey on sentiment analysis challenges. J King Saud Univ Eng Sci 30:330–338 7. Rabeya T, Chakraborty NR, Ferdous S, Dash M, Al Marouf A (2019) Sentiment analysis of Bangla song review—A Lexicon based backtracking approach, pp 1–7 8. Mamun MAA, Kadir IA, Rabby ASA, Azmi AA (2019) Bangla music genre classification using neural network. In: 2019 8th International conference system modeling and advancement in research trends (SMART), pp 397–403 9. Zaanen M, Kanters P (2010) Automatic mood classification using tf*idf based on lyrics, pp 75–80 10. Kashyap N, Choudhury T, Chaudhary D, Lal R (2016) Mood based classification of music by analyzing lyrical data using text mining, pp 287–292 11. Raschka S (2016) Musicmood: predicting the mood of music from song lyrics using machine learning. arXiv:abs/1611.00138 12. Urmi N, Ahmed N, Sifat MH, Islam S, Jameel A (2021) BanglaMusicMooD: a music mood classifier from Bangla music lyrics, pp 673–681 13. Çano E, Morisio M (2017) Music mood dataset creation based on last.fm tags 14. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:abs/1810.04805 15. Kaliyar RK (2020) A multi-layer bidirectional transformer encoder for pre-trained word embedding: a survey of Bert. In: 2020 10th International conference on cloud computing, data science & engineering (confluence), pp 336–340 16. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, et al (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv:1910.03771 17. Chi S, Qiu X, Xu Y, Huang X. How to fine-tune BERT for text classification?
A Machine Learning and Deep Learning Approach for Accurate Crop-Type Mapping Using Sentinel-1 Satellite Data Sanjay Madaan and Sukhjeet Kaur
Abstract Crop classification offers relevant data for crop management, ensuring food safety, and developing agricultural policies. Mapping the crops with high resolution has great significance in determining the position of the crop and effective agricultural monitoring. However, high data costs and poor temporal resolution of satellite data make it difficult to detect the different crops in the field. Therefore, the goal of this survey is to provide an effective analysis of various land cover maps in agriculture using a time series of C-band Sentinel-1 synthetic aperture radar (SAR) data. The various methods based on vertical transmit-vertical receive (VV) and vertical transmit—horizontal receive (VH) polarizations are analyzed to produce better accuracy value for different types of agricultural land. This survey analyzed the different types of existing classification methods such as machine learning and deep learning algorithm used in Sentinel-1 satellite data. The overall accuracy (OA), kappa coefficient, user accuracy (UA), producer accuracy (PA), and F1-score are considered key parameters for defining the effectiveness of crop-type classification in land cover types. This comprehensive research supports the researchers to obtain the best solutions for the current issues in crop-type mapping using Sentinel-1 SAR data. Keywords Sustainable agriculture · Synthetic aperture radar · Sentinel-1 · Crop mapping · Deep learning algorithm
S. Madaan (B) Computer Science and Engineering (UCOE), Punjabi University, Patiala, India e-mail: [email protected] S. Kaur Department of Computer Science, Punjabi University, Patiala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_41
599
600
S. Madaan and S. Kaur
1 Introduction The high-resolution images obtained from satellite helps in providing spatial distribution of plots to perform crop mapping in the applications related to smart agriculture [1]. Mapping the crops according to the environmental change plays a major role in obtaining the landscapes and helps to classify the different types of crops and examine the crop’s distribution for different types of land surfaces [2]. Additionally, classifying crops is an essential step in gathering information on agriculture, successfully managing crops, controlling biodiversity, ensuring food security, and developing agricultural policy. Currently, image processing and satellite data are used for effective crop classification in agriculture which helps to develop the food industrial field [3]. The use of remote sensing integrated with satellite images which obtain large geographic areas at a high temporal resolution and helps in the process of effective crop mapping [4]. Additionally, it has been extensively employed in agricultural monitoring, including phenology estimation, crop-type mapping, and yield projections [5]. Image segmentation is the technique that is used to segment an image into multiple resolutions of the function [6]. Many researchers performed segmentation in images like multi-resolution segmentation, simple linear integration clustering (SLIC), which are determined to evolve technologies for high-resolution images [7]. Based on segmentation, they mainly classified two types: one is the extract boundaries that deliver basic data for monitoring, and another one is super-pixels that separate images into small clusters of pixels [8]. However, the SLIC segmentation method is used in traditional classification technique for high-resolution satellite band categorization and to enhance the integrity of crop fields [9, 10]. The existing techniques based on crop mapping do not provide a proper crop mapping technique with lack of accuracy during the categorization of intercropped farming. Moreover, the accuracy is affected due to low-resolution images obtained from unstructured dataset. The time consumption is a major drawback which exist in the recent researches. Therefore, this manuscript is mainly focused on analysis of crop classification and segmentation of the relevant aspects in the time series of Sentinel-1 data-based information with better classification techniques. The remaining of the paper is structured as follows: Sect. 2 discusses the related works, the problems found from the existing researches are presented in Sect. 3, and the taxonomy of this survey is presented in Sect. 4. In Sect. 5, observations are summarized. Finally, Sect. 6 presents the overall conclusion of this research.
2 Related Works This section provides the literature survey about the different techniques used in crop classification using Sentinel-1-time-series data. The following presents the literature survey along with the advantages and limitations.
A Machine Learning and Deep Learning Approach for Accurate …
601
The ensemble techniques have high probability in providing better classification accuracies, to prove that, Tufail et al. [11] represented random forest (RF) classifiers based on accurate crop-type mapping based on the integral of the SAR Sentinel1-time-series database. In this research, RF has provided enhanced classification accuracies of crop-type mapping by using trained ensemble decision tree sample data. Consequently, the SAR data help to improve the accuracy of plotting including optimal parameters through, classification based on polarimetry indices, texture, and coherence of SAR data which decrease the accuracy of the organization process. The process of labeling the samples is an essential process to be followed while training the datasets. Moreover, the labeling process helps to improve the classification accuracy while detecting the type of land surface. In this, Wang et al. [12] developed a supervised crop mapping method based on long short-term memory (LSTM) and dynamic time warping (DTW). At first, SAR data were applied for each land cover to create the sample fields of data. The weakly samples were labeled with DTW distance, and then time-series figures construct multi-spectral bands along with training and testing datasets. Therefore, the method outperforms the crop identification in fact of insufficient sample fields as well as reduces the sample cost. However, the supervised crop mapping method requires a high amount of sample data. The deep learning techniques have more capability to segment the multi- temporal images. To prove this, Wei et al. [13] introduced the deep learning method to handle the segmentation problems and exploit the phenological similarity of crop production on large scales based on adapted U-Net technology. First, to segment, the Sentinel1 bands’ spatial characterizes were exploited from whole multi-temporal datasets. The calibration between the multi-temporal data to effective segmentation at last, U-Net was applied to combine the input images to create the crop map distribution. Therefore, the DL method for crop identification was performed well at large scales. However, low accuracy due to the dispersed crop plotting zones creates an imbalance of sample fields. The ensemble-based techniques have high probability to provide a reliable data and help in providing better classification. By knowing that, Wo´zniak et al. [14] demonstrated ensemble classifier-based crop classification using radar polarimetry. Additionally, multi-temporal indices were proposed to determine various crop-types based on 16 types of the crop in the whole of Poland. Therefore, the multi-temporal of extraction indices strengthens comparisons across crops and improves classification accuracy by using phonological indices based on signal polarization, signal power, and the scattering mechanism but the method has difficulties when applied because they were varying in crop nature. The mapping of various types of crops in a single iteration is a challenging task due to the cost complexity. So, Kpienbaareh et al. [15] represented detailed croptype determination and land cover mapping combined with time-series Sentinel-1 data and image processing technology. Thus, the data provide integrated data from various spectral sensors and temporal resolution to effectively identify the croptypes in land cover. The RF classifier was utilized in training data values for effective land cover mapping. Although the technology accurately provided the distribution of
602
S. Madaan and S. Kaur
Table 1 Methodology, merits, and demerits of related works Author
Methodology
Tufail et al. [11]
Random forest (RF) classifier to Since RF is a map the type of crops using combination of SAR Sentinel-1-time-series data decision tree, it helps to plot the individual parameters and helps to gain accuracy value
Merits
Demerits However, RF classifier is vulnerable to polarimetry indices, texture, and coherence of SAR data which affects the classification efficiency
Wei et al. [13] Deep learning method to handle the segmentation problems and exploit the phenological similarity of crop production on large scales based on adapted U-Net technology
The U-Net characterizes the spatial data and helps to detect the crop-types at large scale
However, plotting the crops in a dispersed area creates leads to imbalance and affects classification accuracy
Kpienbaareh et al. [15]
The provided data is integrated from various spectral sensors and temporal resolution to effectively identify the crop-types in land cover
However, the accuracy was affected while specifying the precise crop combinations that made up each intercropped farm
A detailed crop-type determination and land cover mapping combined with time-series Sentinel-1 data and image processing technology
crops using available optical data, it was unable to specify the precise combinations of crops that constituted each intercropped farm. The methodology, advantage, and its limitations of the existing related works are represented in Table 1 as follows:
3 Problem Statement The following problem is faced in the existing research for the satellite data for crop image segmentation and classification. • The main problem of the segmentation was low accuracy in the dispersed crop plotting zones and was caused by an imbalance of data between the amount of non-crop and crop samples. • It is difficult to identify the discrete types of crops due to three-dimensional and time-based dynamics which have a significant impact on this land cover at frequent intervals over the growing seasons. • Several ML and DL techniques are widely used in classification but since, they are time-consuming and enormous computational power to deliver the results of classification.
A Machine Learning and Deep Learning Approach for Accurate …
603
• Furthermore, classifying the type of crops using remote sensing technologies is a challenging endeavor due to the intrinsic features of farming locations.
4 Taxonomy This survey is about different Sentinel-1 satellite data segmentation and classification methods which is used in mapping the crops. This Sentinel-1 satellite consists of data preprocessing and extraction performed based on general super-pixel non-iterative clustering (SNIC) and object-based image analysis (OBIA) commonly applied for crop image segmentation method. Further, the analysis of various ML and DL classification models was done. The block diagram of the overview of the Sentinel-1 crop band segmentation and classification is shown in Fig. 1.
4.1 Data Acquisition The crop-type map has been created based on the availability of the earth resource satellite data with enhanced spatial and spectral resolution such as the European Space Agency provides Sentinel-1 satellite data, which delivers data from a dualpolarization C-band SAR instrument at 5.405 GHz [19]. SAR image obtains interferometry phase including in each scene comprises two polarization bands and have an additional band which contains the estimated angle from ellipsoid in every point. Fig. 1 Proposed Sentinel-1 satellite data classification
604
S. Madaan and S. Kaur
Moreover, SAR data is used to produce accuracy in land cover and enhance classification accuracy with class discrimination. After that, each scene was required to preprocess to develop a backscattering coefficient in each pixel for effective mapping in land cover types.
4.2 Image Preprocessing Preprocessing is the essential process to remove the irrelevant and redundant data, and it is calibrated to produce imageries along with their pixel values which are utilized to capture the agricultural land scene [12]. At first, required to subset of the original datasets was created by cropping the study area, and then intensity bands (VV and VH) were transformed to normalized radar coefficients based on sigma naught (σ 0 ) backscatter using radiometric calibration. Moreover, it includes radiometric correction, reducing the noise present in SAR data and determining the geographic location by applying terrain correction. The terrain correction is achieved using range Doppler terrain correction with Shuttle Radar Topography Mission (STRM). In STRM, the 10 m × 10 m pixels are created that is projected into Universal Transverse Mercator (UTM). After this, a multi-temporal speckle filtering is implemented on the corrected image to convert the linear intensity values to decibel units. Based on the terrain corrected values, the imagery was captured from the C-band data for effective crop-type mapping in various land cover types of the function.
4.3 Feature Segmentation After preprocessing, Sentinel-1 cropped images can offer more accuracy in the image categorization process. Super-pixel non-iterative clustering (SNIC) and object-based image analysis (OBIA) [19] segmentation algorithms are currently being used to conduct pixel-wise image segmentation on large-scale pictures, and the results have been highly promising. The utilization of fore helps to address the issues regarding the problems related to crop mapping and helps to spot the changes which occur frequently in the segmentation of the image in agricultural applications. Super-Pixel Non-iterative Clustering The techniques utilized in segmenting the pixel-based image include the fine-tuning of the algorithm’s parameters and the creation of the final segments for the upcoming exploration process. The Global Earth Engine’s super-pixel non-iterative clustering (SNIC) technique was utilized to spatially cluster pixels into objects. Additionally, the method develops segments by maximizing a measure that combines spectral and spatial distances around a grid of seeds [19, 20]. Moreover, the median composite images are obtained to implement the segmentation. The median filter minimizes the noise of the image to enhance segmentation accuracy. The SNIC algorithm’s primary inputs also include
A Machine Learning and Deep Learning Approach for Accurate …
605
image, size, and compactness. Aspects based on phantom, geometrical, textural, and background properties are also present in each item. There are issues with oversegmentation and under-segmentation when splitting the images into segments; thus, it will be helpful to segment crop Sentinel images as well as enhance the classification process of the function.
4.4 Classification Classification is the most significant task to effectively determine the various crops in land cover types. The traditional method was applied using the detected growth stages of VH, and VV and based on the development of the crops during the growing seasons. However, the issues aroused in time complexity and accuracy. Therefore, to overcome this problem many machine learning (ML) and deep learning (DL) algorithms have been utilized to identify the type of crops and various inputs of predictor data are classified accurately but the accuracy in mapping the crops remains reliant on the efficiency of classifying techniques. Therefore, the proposed ML and DL approaches are helpful to develop an effective mapping tool for crop-types in various land covers while taking into account the aforementioned issues. Machine Learning Classification Machine learning technique has become the main focusing method of the remote sensing image. Its application has important advantages in many areas and overcomes the various classification problems, etc. [19]. Among them, the relative features of random forest (RF) and support vector machine (SVM) have conquered the application of remote sensing. RFs are combination of multiple tree-based classifier to provide a single classification. The RF has the capability to address more number of input data without deleting the variables and evaluates the significance of variables at the time of categorization. There are essential parameters to train the RF model, and they are set up based on trial and error technique. Similarly, SVM is based on pixel-wise classification which outsource better classification results with simple training set. Moreover, it can solve the issues regarding binary classification by handling two cases such as hyperplane minimization and projecting data in high-dimensional feature space. The common ML classifiers treat input variables as independent features and time-series analysis with higher accuracy for classification; therefore, it will help to easily determine the features of the irrelevant crop to researchers. Deep Learning (DL) Classification The results of using DL approaches, which may extract intricate patterns and useful information from satellite picture data by hiding layers, are less comprehensible. Among them, long short-term memory (LSTM) is an RNN architecture that was created to prevent the issue of long-term dependencies. RNNs have a feedback connection that deals with the system’s temporal dynamic behavior, in contrast to feed-forward neural networks. In RNN, the output obtained from the previous stage is computed as input for the next time step. Thus, the past
606
S. Madaan and S. Kaur
outcomes have a long-term influence on future results. Consider x 1 , x 2 , x n as the input vectors, the output vectors of the hidden cell are denoted as h1 , h2 , hn and the result vectors are denoted as y1 , y2 , y n The equation to evaluate the result obtained from the input vectors is represented in Eqs. (1–3) as follows: θ∅(h t−1 ) + θ x x t
(1)
yt = θ y ∅(h t )
(2)
f tanhx =
e2x − 1 , e2x + 1
(3)
where the weights are denoted as θ , θ x , θ y , the activation function is denoted as ∅ and initiative stage to perform self-connection is denoted as 1. Moreover, processing data from long sequences frequently results in vanishing gradient problems for traditional RNNs [12]. In addition, the LSTM network helps to recognize various crops in land cover from supplementary land cover types of the function. Two LSTM layers, one fully connected layer, and two dropout levels of the system make up the model’s central stack. Finally, the output layer is used for the softmax activation function to predict the pixel of the image along with the land cover type of the function. Other Classification Object-based classification (OBC) categorizes the multiple and non-overlapping, homogenous objects. Additionally, it was carried out at the object level rather than the pixel level. Later, the segmentation algorithm’s output from each object feeds into classification. One of the best image segmentation algorithms in the OBIA field was this user-dependent, moderately difficult technique [19]. Each pixel is treated as a segment at the start of the process, and then pairs of nearby image objects are combined to generate bigger segments. To regulate the degree of phantom difference within substances and the magnitude of their outcomes, the scale parameter can specify the maximum standard deviation of the heterogeneity.
4.5 Comparative Analysis This section determines the detailed crop-types and land cover maps in agriculture using a time series of Sentinel-1 satellite data. The comparative analysis of the existing crop-types’ classification is presented in Table 2.
A Machine Learning and Deep Learning Approach for Accurate …
607
Table 2 Crop mapping classification comparative analysis Author
Methods
Advantages
Limitations
Performance Metrics
Tufail et al. [11]
Random forest (RF) classifiers based on accurate crop-type mapping based on integral of SAR Sentinel-1-time-series database
Consequently, the SAR data help to improve the accuracy of plotting including optimal parameters
The RF OA—0.97 classification Kappa technique was coefficient—0.97 based on polar metric indices such as texture and coherence of SAR data. This dependence on polarimetry decreases the OA of the classification process
Wang et al. [12]
Supervised crop mapping method based on the long short-term memory (LSTM) and dynamic time warping (DTW)
The weakly samples were labeled with DTW distance; thus, the method well performs insufficient sample fields also reduces the sample cost
The supervised crop mapping based on LSTM requires high sample data and execution time
PA—0.98 UA—0.96
Wei et al. [13]
The deep learning method handles the segmentation problems and exploits the phonological similarity of crop production on large scales based on adapted U-Net technology
The adapted U-Net technology combines the input images to create the crop map distribution. Thus, the DL method for crop identification was performed well even large scale also
There is lack in accuracy when the crops were plotted in unbalanced sample fields
PA—0.86 UA—0.74 OA—0.91 Kappa coefficient—0.74 F1 Score—0.80
(continued)
608
S. Madaan and S. Kaur
Table 2 (continued) Author
Methods
Advantages
Limitations
Performance Metrics
Wo´zniak et al. [14]
Ensemble classifier-based crop classification using radar polarimetry Additionally, multi-temporal indices were proposed to determine various crop-types
The multi-temporal of extraction indices strengthens comparisons across crops and improves classification accuracy by using phonological indices based on signal polarization, signal power, and the scattering mechanism
Ensemble classifier-based crop classification required highly time-consuming and enormous computational power to deliver the results of classification
OA—0.86–0.89 F1 score—0.73–0.99
Kpienbaareh Random forest and Sun [15] classifier-based detailed crop-type determination and land cover mapping combined with time-series Sentinel-1 data and image processing technology
The image processing technology based on RF is effective for cloud-dense, resource-poor localities and helps to obtain high-resolution data to achieve food security
The classification using RF classifier lacks accuracy while classifying the crop in intercropped agriculture
OA—0.85 Kappa coefficient—0.80
Wang et al. [16]
The DSF measure generated the best classification results in all conceivable bands because it was more precise and included more features than other similarity measures
The automatic OA—0.92 mapping method Kappa was unable to coefficient—0.84 specify the precise crop combinations that made up each intercropped farm
Object-based classification for automatic mapping using temporal phenology patterns extraction
(continued)
A Machine Learning and Deep Learning Approach for Accurate …
609
Table 2 (continued) Author
Methods
Advantages
Limitations
Performance Metrics
Adrian et al. [17]
Effective crop-type mapping in various land covers using deep learning technique
The deep learning technique attains promising results demonstrating that combining multi-temporal SAR and optical data presence
The deep learning technique finds difficulty in collecting and detecting high-quality training samples for large-scale functions
OA—0.91 Kappa coefficient—0.88
The deep learning method incorporates cyclic and convolution processes, and the spatial–temporal patterns and scattering mechanisms of crops were incorporated
The polar metrics of the deep learning method must be varied for different land scenes which are considered a time-consuming process
Average accuracy—0.90 OA—0.86 Kappa coefficient—0.78 F1-score—0.85
Qu et al. [18] A novel deep learning method to effective crop mapping from Sentinel-1 Polarimetric time-series information
5 Observation The outcomes of the related work based on its methods, findings and the analysis are presented in this section. In [11, 15], a machine learning classifier to map the crop-types which is obtained from Sentinel-1 dataset. Both the research utilized RF classifier to effectively classify the crop-types with optimal parameters, but it took more time to execute the outcome. In [17, 18], deep learning-based classification technique was introduced but the difficulty exists in training high-quality samples for large-scale functions. In [14], an ensemble-based classification model is introduced to categorize the crop-types. But, the time consumption was considered as a major drawback. By analyzing their researches, In the future, the improvisations can be made in minimizing the time consumption and enhancing the accuracy.
610
S. Madaan and S. Kaur
6 Conclusion This study evaluates the performance of commonly utilized nonparametric ML and DL classifiers using SAR data and incorporates the relevant information in the selective type of crops. In existing classification methods, Sentinel-based crop categorization was performed to detect the land scenes accurately but there are some drawbacks such as limited area coverage and low efficiency. Further, the survey demonstrates that the fusion of both polarizations adds additional information content that precisely captures the variations that present throughout the growth stages. This survey suggests the importance of preprocessing techniques where the spectral images were calibrated and the segmentation was accomplished for the selected features to obtain optimal crop mapping and thus enhance the various crop-type identifications in different land cover types while taking into account the aforementioned issues. In the future, a series of classification features can be developed and used as input in the classifier for further image classification.
References 1. de Moura NVA, de Carvalho OLF, Gomes RAT, Guimarães RF, de Carvalho Júnior OA (2022) Deep-water oil-spill monitoring and recurrence analysis in the Brazilian territory using Sentinel-1 time series and deep learning. Int J Appl Earth Obs Geoinf 107:102695 2. Gašparovi´c M, Klobuˇcar D (2021) Mapping floods in lowland forest using Sentinel-1 and Sentinel-2 data and an object-based approach. Forests 12(5):553 3. Snevajs H, Charvat K, Onckelet V, Kvapil J, Zadrazil F, Kubickova H, Seidlová J, Batrlova I (2022) Crop detection using time series of Sentinel-2 and Sentinel-1 and existing land parcel ınformation systems. Remote Sens 14(5):1095 4. Ren T, Xu H, Cai X, Yu S, Qi J (2022) Smallholder crop type mapping and rotation monitoring in mountainous areas with Sentinel-1/2 imagery. Remote Sens 14(3):566 5. Tetteh GO, Gocht A, Erasmi S, Schwieder M, Conrad C (2021) Evaluation of Sentinel-1 and Sentinel-2 feature sets for delineating agricultural fields in heterogeneous landscapes. IEEE Access 9:116702–116719 6. Mohamed ES, Ali A, El-Shirbeny M, Abutaleb K, Shaddad SM (2020) Mapping soil moisture and their correlation with crop pattern using remotely sensed data in arid region. Egypt J Remote Sens Space Sci 23(3):347–353 7. Yang H, Pan B, Li N, Wang W, Zhang J, Zhang X (2021) A systematic method for spatiotemporal phenology estimation of paddy rice using time series Sentinel-1 images. Remote Sens Environ 259:112394 8. Son N-T, Chen C-F, Chen C-R, Toscano P, Cheng Y-S, Guo H-Y, Syu C-H (2021) A phenological object-based approach for rice crop classification using time-series Sentinel-1 synthetic aperture radar (SAR) data in Taiwan. Int J Remote Sens 42(7):2722–2739 9. Mattia F, Balenzano A, Satalino G, Palmisano D, D’Addabbo A, Lovergine F (2020) Field scale soil moisture from time series of Sentinel-1 & Sentinel-2. In: 2020 Mediterranean and middle-east geoscience and remote sensing symposium (M2GARSS), 09–11 March, Tunis, Tunisia. IEEE, pp 176–179 10. Zhang H, Yuan H, Du W, Lyu X (2022) Crop ıdentification based on multi-temporal active and passive remote sensing ımages. ISPRS Int J Geo-Inf 11(7):388
A Machine Learning and Deep Learning Approach for Accurate …
611
11. Tufail R, Ahmad A, Javed MA, Ahmad SR (2022) A machine learning approach for accurate crop type mapping using combined SAR and optical time series data. Adv Space Res 69(1):331– 346 12. Wang M, Wang J, Chen L (2020) Mapping paddy rice using weakly supervised long short-term memory network with time series sentinel optical and SAR images. Agriculture 10(10):483 13. Wei P, Chai D, Lin T, Tang C, Du M, Huang J (2021) Large-scale rice mapping under different years based on time-series Sentinel-1 images using deep semantic segmentation model. ISPRS J Photogramm Remote Sens 174:198–214 14. Wo´zniak E, Rybicki M, Kofman W, Aleksandrowicz S, Wojtkowski C, Lewi´nski S, Bojanowski J, Musiał J, Milewski T, Slesi´nski P, Ł˛aczy´nski A (2022) Multi-temporal phenological indices derived from time series Sentinel-1 images to country-wide crop classification. Int J Appl Earth Obs Geoinf 107:102683 15. Kpienbaareh D, Sun X, Wang J, Luginaah I, Kerr RB, Lupafya E, Dakishoni L (2021) Crop type and land cover mapping in Northern Malawi using the integration of Sentinel-1, Sentinel-2, and PlanetScope Satellite data. Remote Sen 13(4):700 16. Wang L, Jin G, Xiong X, Zhang H, Wu K (2022) Object-based automatic mapping of winter wheat based on temporal phenology patterns derived from multitemporal Sentinel-1 and Sentinel-2 imagery. ISPRS Int J Geo-Inf 11(8):424 17. Adrian J, Sagan V, Maimaitijiang M (2021) Sentinel SAR-optical fusion for crop type mapping using deep learning and Google Earth Engine. ISPRS J Photogramm Remote Sens 175:215–235 18. Qu Y, Zhao W, Yuan Z, Chen J (2020) Crop mapping from Sentinel-1 polarimetric time-series with a deep neural network. Remote Sens 12(15):2493 19. Wang L, Jin G, Xiong X, Zhang H, Wu K (2022) Object-based automatic mapping of winter wheat based on temporal phenology patterns derived from multitemporal Sentinel-1 and Sentinel-2 imagery. ISPRS Int J Geo-Inf 11(8):424 20. Arias M, Campo-Bescós MÁ, Álvarez-Mozos J (2020) Crop classification based on temporal signatures of Sentinel-1 observations over Navarre province, Spain. Remote Sens 12(2):278
Size Matters: Exploring the Impact of Model Architecture and Dataset Size on Semantic Segmentation of Abdominal Wall Muscles in CT Scans Ankit P. Bisleri, Anvesh S. Kumar, Akella Aditya Bhargav, Surapaneni Varun, and Nivedita Kasturi Abstract Computed tomography (CT) scans are an excellent way of capturing the details of the abdominal muscles. Several abnormalities including hernia, tumour, and neuro-muscular diseases can be identified and estimated by CT scans, including the representation of muscle loss. Radiologists carefully segment and review every CT slice to monitor these abnormalities. This is a meticulous, painstaking, and timeconsuming process. This is especially true for the abdomen, which has a lot of variations and complexities in the wall and surrounding tissues. To solve this, we annotated 451 image-mask pairs from DICOM files of the abdominal muscles of size .512 × 512 pixels. In this work, the trade-off between computation cost and image loss is estimated by splitting the imageries into two categories of size: .128 × 128 pixels and .512 × 512 pixels. To sustain the original image size, the authors create a resized version of the base U-Net and train it from scratch. Furthermore, two dataset categories are created to show the data-intensive quality of the models (One double in size compared to the other). To train the system using these datasets, two architectures U-Net and resized U-Net are used which are then compared through the introduction of a new loss function—“complete loss”. The resized U-Net performs better than the standard U-Net by displaying a mean complete loss of 3.52 and 10.79% for the validation and test sets, respectively. The data dependency of models is clearly shown as well. Keywords Computed tomography · Convolutional neural networks · Fully convolutional network · Semantic segmentation · U-Net
A. P. Bisleri (B) · A. S. Kumar · A. A. Bhargav · S. Varun · N. Kasturi Department of Computer Science and Engineering, PES University, Bangalore, India e-mail: [email protected] N. Kasturi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_42
613
614
A. P. Bisleri et al.
1 Introduction Medical imaging techniques are used in radiology to obtain detailed internal images of the body for diagnostic purposes. CT scans are one such imaging technique used by professionals, which involves combining X-ray images taken from numerous angles using computer processing, providing detailed information about tissues, blood vessels, and bones in the form of image slices. They are noninvasive and invaluable resources to medical professionals, allowing them to visualise and analyse body parts that require medical attention. However, they are hard to interpret and can take time to read. This is because organ and tissue boundaries are pretty complex and drawing bounding boxes over them will not suffice. Furthermore, training data has to be manually annotated, and due to the complexity mentioned before, it is a tedious task, hence creating a scarcity of this kind of labelled data. Structures in the abdominal region have complex non-geometric overlapping boundaries making it difficult to distinguish and segment them accurately. Low contrast and being surrounded by multiple layers of tissue and organs add to the segmentation difficulty. In this region, we focused on specific abdominal wall muscles which include the left flank, left rectus, right flank, and right rectus. The abdominal muscles play a crucial role in medical imaging as they provide valuable information for diagnosing abnormalities such as hernias, tumours, and neuro-muscular diseases. Accurate segmentation of these muscles can provide important quantitative information about their size, shape, and location, which can help clinicians make more informed decisions and improve patient outcomes. This paper discusses the automated segmentation of specific abdominal wall muscles. This region of interest was chosen due to the low quantity of research done, difficulty in manual annotation, and application in hernia detection and diagnosis of abdomen wall diseases. The lack of quality publicly available datasets sparked the need to create a novel dataset. This novel dataset was created through careful manual annotations of abdominal CTs and has gone through several ground-truth cheques with professionals. Using this dataset, we investigate an excellent existing segmentation architecture, the standard U-Net [1] and fine-tuned it to the dataset. The standard U-Net produced good results, but with lower spatial resolution, affecting radiologists’ interpretation of these images [2]. To address this, the predicted mask was resized to the original resolution, but this increased the overall loss due to the addition of reconstruction loss. A resized U-Net was created by the authors to avoid the loss in spatial resolution of the image and was trained from scratch. A new loss function, “complete loss” was introduced for comparing the performance between resized U-Net and standard U-Net. Additionally, factors like dataset dependency, image size, and model complexity are also compared. The authors conclude with a discussion of the limitations of the chosen approaches and give an outlook on possible improvements.
Size Matters: Exploring the Impact of Model Architecture and Dataset Size …
615
2 Related Work Segmentation is the task of splitting an image into zones based on identical characteristics [3, 4]. These characteristics are determined by grey level, colour, texture, brightness, and contrast. It is considered beneficial as it draws out the region of interest (ROI) through a semi or fully automated procedure. Even though it is a research hotspot, this task is still quite taxing due to the non-geometric shapes of the needed ROI and varying anatomies across patients.
2.1 Convolutional Neural Networks and Fully Convolutional Networks Convolution neural networks introduced by LeCun et al. [5] like any other ordinary neural network have differentiable weights and biases associated with their nodes. The main point of difference is the blunt premise of considering the input to specifically be an image. Doing so enables the network to integrate specific properties into itself, lowering the required pre-processing and amount of parameters. Weight and bias values are shared across all neurons in a given layer implying all of them are searching for the same feature in the image. This lets the CNN identify similar features irrespective of their location. In addition to these exploits in its network, the ability to learn features or characteristics from an image on its own makes it an optimal architecture for medical imaging [6, 7]. Loss of spatial characteristics of an image, when fed into fully connected layers of the network, is one of the drawbacks of CNNs. However, spatial information is a key aspect of semantic segmentation tasks. Hence, the primary purpose was to create a framework for pixel-wise segmentation that was achieved by converting existing CNN to FCN [8]. FCNs encompass two parts encoders and decoders. Encoders are similar to traditional CNNs and are used to extract extensive condensed characteristics. In contrast, the decoders transform these characteristics into concentrated label maps whose size is the same as the given input. Decoders are responsible for generating precise boundary localisation and labelling results from characteristics obtained from the encoder. Similar networks have been applied to medical imaging for semantic segmentation [9]. Various aspects of FCN and its applications have already been explored [10, 11] with U-Net [1] being the most popular one.
2.2 U-Net U-Net can be seen as an evolution of FCNs, with improvements in architecture that allow it to preserve spatial information better and improve segmentation accuracy. It is called U-Net because the architecture is shaped like a “U”, with a symmetrical
616
A. P. Bisleri et al.
structure where the downsampling path (also called the “encoder”) is mirrored by an upsampling path (also called the “decoder”). The encoder brings out high-level features from the given input image, while the decoder is responsible for mapping these features back to the pixel space to generate the segmentation mask. The encoder comprises several convolutional layers, with each layer decreasing the feature maps’ spatial resolution by a factor of 2. This is achieved by using a stride of 2 in the convolutional layers and/or using max-pooling layers. As a result of this downsampling, the feature maps in the encoder capture increasingly more abstract and high-level information about the image. The decoder, on the other hand, comprises transposed convolutional layers that increase the feature maps’ spatial resolution by an aspect of 2. This aspect enables the network to “zoom in” on the details of the image. In addition, the decoder also concatenates the feature maps from the corresponding layer in the encoder with the upsampled feature maps. This is called the “skip connections”, and it permits the network to use both low-level and high-level information deriving from the input image. The U-Net model is trained to predict a segmentation mask for each pixel in the input image. The final layer of the network is a .1 × 1 convolutional layer followed by a softmax activation function [12], which produces a probability map for each class in the segmentation task, and the class with the highest probability for each pixel is chosen as the final segmentation. U-Net’s [1] architecture is well suited for image segmentation tasks, especially for medical images. Its ability to handle small objects and fine details, as well as its robustness to noise and partial occlusion, make it a popular choice for tasks such as cell, organ, and lesion segmentation.
3 Proposed Work 3.1 Novel Dataset The Raw CT images originated from a research study for hernia detection. The CT imaging data was obtained on 16 adult patients whose ages range from 18 to 60. Each CT slice was observed from an axial view, and each slice’s dimensions were .512 × 512 pixels. Abdominal muscles in each CT were manually segmented with the help of ITK-snap [13]. In each image, the four abdominal muscles of interest: Left Flank, Left Rectus, Right Rectus, and Right Flank were delineated from the rest of the image and were considered the ground truth (Refer to Fig. 1). The created dataset was reviewed by radiologists and medical professionals. The patients and their corresponding CTs were separated into three groups: global, local, and test (Refer to Table 1). The total number of manual annotations was 451 images and mask pairs. We had a total of four separate test cases which included images that were not going to be trained or validated by the model. Note that in Table 1, the heading of the third column “Image Count” implies the total number of images-mask pairs.
Size Matters: Exploring the Impact of Model Architecture and Dataset Size … Table 1 Dataset overview Dataset name Global Local Test
Number of CTs
Image count
16 7 4
451 226 21
617
Fig. 1 Raw CT slice, annotated CT slice, mask (left to right)
Fig. 2 CT image before and after contrasting (left to right)
3.2 Pre-processing The original low contrast of the CT images was manually enhanced using ITK-Snap to values ranging from .−318 to 300 (Refer Fig. 2). This was done to illuminate subtle differences which helped in distinguishing boundaries between the abdominal wall and other overlapping organs. Image variability was introduced by selecting annotations from each CT with a step size of 3 or 5, depending on the total slices in a particular CT. For each model, two different categories of image dimensions are maintained, namely .512 × 512 pixels and .128 × 128 pixels. During the conversion of the image into a NumPy array, it was also converted into greyscale and normalised to ensure similar data distribution and reduce strain on the computation. Training and validation were created with an 80:20 random split of the dataset (global and local).
618
A. P. Bisleri et al.
3.3 Loss Function The dice coefficient is a measure between the original image and the predicted mask for relative overlap, in our case between the image and the predicted mask. It addresses the class imbalance in datasets by disregarding the background class. The values the coefficient can take range from 0 to 1. The higher the coefficient, the better the relative overlap of the two samples. Instability caused by an indication of a low dice coefficient at the beginning of the training phase can be handled by batch normalisation and ReLUs. Note that, only dice loss was used in the training phase for all model-dataset combinations. Dice coefficient was used as the loss function for training the U-Net architectures (model-fitting phase) for all the datasets. Reconstruction loss was used when an image is resized from its actual dimension into a target dimension. This was done via interpolation more specifically the bilinear interpolation method [14]. This method inserts values for a point in a two-dimensional space from its surrounding pixels (four) values to estimate a new pixel value. Complete loss was a metric introduced by the authors to indicate the total loss after considering a change in image size. Even though images of two different dimensions were used, the loss was calculated by the comparison of the resulting predicted mask with the original mask of size .512 × 512 pixels. To do this for the .128 × 128 images used by the standard U-Net, the predicted mask needed to be resized back to .512 × 512 pixels and then compared. For the resized U-Net which used .512 × 512 images as input, the reconstruction loss was 0. To simplify this comparative study and to evaluate all the outputs with a common metric, complete loss was used. Complete loss is the sum of the reconstruction loss and dice loss. Note that it was used after training and during comparison. It was not used as the loss function in the training phase. Complete Loss = Reconstruction Loss + Dice Loss
(1)
3.4 Models In this study, comparisons were made between two U-Net architectures, standard UNet (base U-Net) [1] and Resized U-Net, using two different datasets which differ in sizes and image dimensions (.128 × 128 pixels and .512 × 512 pixels). The standard U-Net model’s main objective was to utilise the encoder and decoder architecture to create a complete and accurate mask for the given abdominal CT. It took an image of .128 × 128 pixels and consisted of a 5-layered architecture coupled with dice loss (architecture similar to Base U-Net [1]). Both datasets, global and local, were tested on this model. It was time efficient and saved computational costs, i.e the model only had around 2 million parameters. Though this model gave a reasonable accuracy, the reconstruction loss (resizing it back to .512 × 512 pixels) significantly increased the complete loss which made the image blurry and inaccurate. To overcome this,
Size Matters: Exploring the Impact of Model Architecture and Dataset Size …
619
Fig. 3 Resized U-Net model
Fig. 4 Flow process of proposed work
the authors built on this architecture through the addition of two intermediate layers each to the encoder and decoder, which resulted in an overall of 6 layers each (Refer Fig. 3). This was done to maintain the spatial resolution of the image, avoid image resizing, and reduced the reconstruction loss to 0. Each layer of the custom U-Net model also consists of two convolution layers, one max pool layer, and a dropout layer. To standardise values and make the network faster, batch normalisation was done for each convolution layer. Long skip connections from the encoder to the decoder were maintained [15, 16]. Note that this model was trained from scratch. This model achieved an improved complete loss and generalised better across all datasets. But this came at a serious increase in computation cost (around 34 million parameters). Hyperparameter tuning was done to improve performance and make the models a better fit for the dataset. The best parameters for each model and dataset combination were considered and compared to analyse the effects of image and dataset size. The flow process of the proposed work is shown in Fig. 4.
620
A. P. Bisleri et al.
Fig. 5 Original image, ground-truth mask, predicted mask (from left to right). Images with complete loss 8.147 and 20.831% from the validation and test sets respectively for the local standard U-Net (top to bottom)
4 Results The performance was indicated by the mean complete loss which was depicted as a percentage change for comparison purposes. Comparisons were made after hyperparameter tuning for each model-dataset combination (Refer to Figs. 5, 6, 7, and 8 for some visualisations).
4.1 Optimal Hyperparameters Hyperparameter tuning was done as it was necessary to find the optimal values of hyperparameters and achieve the best possible performance for each of the modeldataset combinations (Refer to Table 2).
Size Matters: Exploring the Impact of Model Architecture and Dataset Size …
621
Fig. 6 Original image, ground-truth mask, predicted mask (from left to right). Images with complete loss 2.718 and 8.555% from the validation and test sets respectively for the local resized U-Net (top to bottom)
Fig. 7 Original image, ground-truth mask, predicted mask (from left to right). Images with complete loss 4.546 and 8.449% from the validation and test sets respectively for the global standard U-Net (top to bottom)
622
A. P. Bisleri et al.
Fig. 8 Original image, ground-truth mask, predicted mask (from left to right). Images with complete loss 3.272 and 7.203% from the validation and test sets respectively for the global resized U-Net (top to bottom) Table 2 Optimal hyperparameters Local Std. U-Net Batch Size Epochs Dropout
4.2
16 100 30
Resized U-Net
Global Std. U-Net
Resized U-Net
16 100 50
16 80 30
32 80 10
Data Size Dependence (Global vs. Local Dataset for Each Model)
The global dataset (451) was almost double in size compared to the local dataset (226). Both models displayed better performance with the global dataset for both validation and test sets (Refer to Tables 3 and 4). 1. Standard U-Net with the global dataset outperformed its local version by a 14.99% decrease for the validation set and 18.61% decrease for the test set. 2. Resized U-Net with the global dataset outperformed its local version by a 3.29% decrease for the validation set and a 5.18% decrease for the test set.
Size Matters: Exploring the Impact of Model Architecture and Dataset Size … Table 3 Local dataset Standard U-Net Dataset Mean Max. Training Validation Test
0.0692 0.0727 0.13675
0.0952 0.1133 0.2083
Table 4 Global dataset Standard U-Net Dataset Mean Max. Training Validation Test
0.0586 0.0618 0.1113
0.1022 0.0839 0.167
623
Min.
Custom U-Net Mean Max.
Min.
0.0493 0.0517 0.0896
0.0294 0.03648 0.1138
0.0191 0.022 0.0511
Min.
Custom U-Net Mean Max.
Min.
0.0382 0.0414 0.0844
0.0233 0.0352 0.1079
0.0154 0.0184 0.072
0.0464 0.0728 0.201
0.0442 0.0698 0.2009
4.3 Model Differences (Standard vs. Resized U-Net for Each Dataset) The resized U-Net can deal with the original dimensions of the image (.512 × 512), whereas the standard U-Net needs a resized version. Predicted masks of both images were compared with the original image (original size as well). These models were run for both datasets (global and local) (Refer to Tables 3 and 4). 1. For the local version, the resized U-net outperformed the standard U-Net by 49.93% decrease for the validation set and by 16.75% decrease for the test set. 2. For the global version, the resized U-net outperformed the standard U-Net by 43.04% decrease for the validation set and by 3.05% decrease for the test set.
4.4 Other Interesting Comparisons 1. The resized U-Net which used the global dataset was the best model-dataset combination. It had the lowest mean complete loss with 3.52% for the validation set and 10.79% for the test set. 2. The standard U-Net which used the local dataset was the worst model-dataset combination. It had the highest mean complete loss with 7.27% for the validation set and 13.67% for the test set. 3. Interestingly, the standard U-Net which used the global dataset outperformed the resized U-Net by a 41.10% decrease for the validation set and 2.24% decrease for the test set (Refer to Tables 3 and 4).
624
A. P. Bisleri et al.
5 Discussion 5.1 Optimal Hyperparamters 1. Optimiser and learning rate: Adam [17] optimiser with a minimum learning rate of 0.00001 was found to be the best for all dataset-model combinations. The adaptive learning rate and use of momentum helped it achieve robustness from noisy gradients and be computationally effective. 2. Batch size: Generally, smaller batch sizes provide more noise and variation in gradients. This helped the model explore the parameter space more effectively and learnt from a more diverse set of examples. This held well for the local dataset and the standard U-Net versions when a batch size of 16 was selected. However, this did not work for the resized U-Net which used the global dataset as it took a lot of time for computation and gave poor results. This was due to a large number of parameters and data. So a larger batch size like 32 helped exploit parallelism in the computation, as the hardware could process multiple examples in parallel. This lead to faster training times and more efficient use of computational resources. Additionally, the larger batch size resulted in more stable gradients. 3. Dropout: The optimal dropout rate varied for the dataset-model combinations. In some, experimenting with a higher dropout rate led to a loss of important information while in other cases a lower dropout led to overfitting. A dropout of 30% was usually ideal. The resized U-Net which used the global dataset has a surprisingly low dropout even though it had the highest parameters. This low dropout improved the model’s ability to learn from the training data but also slightly overfit the data. 4. Epochs: Generally, a higher number of epochs can allow the model to learn more complex representations of the data, leading to better performance. However, training for too many epochs can also lead to overfitting and reduced performance on new data. In this case, looking at the graph plotted for training versus validation loss helped the authors settle on the number of epochs (Refer to Fig. 9). Training for 100 epochs was optimal because it allowed the model to learn complex representations of the data without overfitting. However, for some dataset-model combinations, this led to overfitting and reduced performance on new data. Hence, this led to the choice of 80 epochs.
5.2 Dataset Dependence In both models, increasing the dataset size lead to better results for both validation and test sets for several reasons:
Size Matters: Exploring the Impact of Model Architecture and Dataset Size …
(a) Standard U-Net with Local dataset
(b) Resized U-Net with Local dataset
(c) Standard U-Net with Global dataset
(d) Resized U-Net with Global dataset
625
Fig. 9 Log loss for training and validation versus epochs for all model-dataset combinations
1. More diverse data: With the global dataset, there is more variation in the images or data processed. This helped the model to better capture the range of features and patterns that they need to learn to accurately segment images. 2. Increased generalisation: The global dataset also helped the models to generalise better to new, unseen images leading to better results on the test set. Since the model had more examples to learn from and reduced the impact of individual data points on the final results. Thus, models could better understand the underlying patterns and relationships between features.
626
A. P. Bisleri et al.
3. Reduced overfitting: With a larger dataset, there was less chance of the model overfitting to the training data, as there was more variation in the data. But for the local dataset, the models become too specialised for the training data and failed to generalise well to test data.
5.3 Model Comparison For both datasets, the resized U-Net performed better than the standard U-Net on the global dataset. This was due to the following reasons : 1. Since the input and original dimensions were the same for the resized U-Net, it had access to more information and was better able to learn from this type of data to make more accurate predictions on the validation and test sets. 2. The increased number of layers and filters in the resized U-Net helped it capture finer and more intricate details of the underlying structure, which lead to better results. 3. The larger number of trainable parameters in the resized U-Net model allowed for more flexibility in the learned representations, potentially leading to a better generalisation of the test set.
5.4 Other Interesting Comparisons 1. The resized U-Net that used the global dataset was the best model-dataset combination. This was because of the higher number of layers which helped it capture important and complex features. Also, the use of the global dataset gave a wider variety of data to learn from which increased its ability to learn and generalise. 2. The standard U-Net which used the local dataset was the worst model-dataset combination. This was because of the simple network with less amount of parameters that were unable to capture all the details and relationships in the image. The use of a local dataset significantly retards the learning process as the model was too narrowly focused on the low quantity of data which made it unable to generalise. 3. The standard U-Net which used the global dataset performed better than the resized U-Net which used the local dataset. This was because the local dataset used for training the resized U-Net model was too small, and the model was not able to learn the important features needed for accurate segmentation. The model was too complex relative to the amount of available data, it learnt to recognise noise or other irrelevant features in the data instead of the true underlying patterns. In contrast, the standard U-Net that used the whole dataset was able to achieve better performance by leveraging all available data.
Size Matters: Exploring the Impact of Model Architecture and Dataset Size …
627
6 Conclusion The scarcity of high spatial resolution images and the tedious manual annotation process have hindered this sector’s progress. While deep learning models have shown promising results, the complex anatomy, variability in shape and size, noise, and class imbalance of the region hindered their ability to learn. These problems were tackled by the creation of a novel reliable dataset and a resized U-Net architecture to avoid loss of spatial resolution in images. The best-fit resized U-Net was compared with its likewise base version through the introduction of a new loss metric. The effect of model architectures and image size on the dataset was explored. The dataset was created in two phases to study the dataset dependency of the models. The resized UNet performs the best when fed with the whole dataset and generalises well too. It not only maintains the original spatial resolution of the images but also provides highquality results with a limited amount of data. Limitations of this study include the low quantity of input data, limited model architectures used in the comparative study, and mediocre generalisability of the models. One of the major issues we faced was the strain on computation. The use of only CPUs resulted in a maximum training time of 14 hours. Access to GPUs and parallel processing can significantly reduce training time In future work, the quantity of data can be increased through more annotation or the use of semi-supervised approaches. Other techniques could be explored for this task, such as dilated convolution, transformers, and the performance of these models can be compared. The authors plan on broadening the current model by using it for a multi-class approach where the loss for each class could be found. This can propel the work from wall segmentation to a hernia detection approach.
7 Applications The process of data collection and creation in this domain is laborious. Using the created novel dataset from this paper, semi-supervised approaches can lead to the creation of much bigger datasets. This would tremendously be helpful since the abdominal imaging domain lack large datasets. The use of the resized U-Net models developed in this study can provide high-quality results with a limited amount of data, which can be used as a stepping stone for hernia detection. This would further enable more promising avenues like the identification of hernias, and diagnosis of abdominal wall diseases such as wall hernias, rectus diastasis, abdominal wall lipomas, etc. The research also has implications for surgical planning, where surgeons can benefit from identifying the precise shape and location of the abdominal walls to determine the optimal approach. Acknowledgements This paper and the research would not have been possible without the help of my exceptional mentor Maniappan R and Curium Life for providing us with the datasets. His valuable suggestions during the planning and development of this research work helped us progress as scheduled.
628
A. P. Bisleri et al.
References 1. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234–241 2. Boita J, van Engen RE, Mackenzie A, Tingberg A, Bosmans H, Bolejko A, Zackrisson S, Wallis MG, Ikeda DM, Van Ongeval C, Pijnappel R, VISUAL Group Jansen F, Duijm L, de Bruin H, Andersson I, Behmer C, Taylor K, Kilburn-Toppin F, van Goethem M, Prevos R, Salem N, Pal S (2021) How does image quality affect radiologists’ perceived ability for image interpretation and lesion detection in digital mammography? Eur Radiol 31:5335–5343 3. Gonzalez RC, Woods RE, Eddins SL (2004) Digital image processing using MATLAB. Pearson Education India 4. Pal NR, Pal SK (1993) A review on image segmentation techniques. Pattern Recogn 26(9):1277–1294 5. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 6. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics, and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298 7. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90 8. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431– 3440 9. Zhou Y, Xie L, Shen W, Wang Y, Fishman EK, Yuille AL (2017) A fixed-point model for pancreas segmentation in abdominal CT scans. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 693–701 10. Roth HR, Oda H, Hayashi Y, Oda M, Shimizu N, Fujiwara M, Misawa K, Mori K (2017) Hierarchical 3D fully convolutional networks for multi-organ segmentation. arXiv:1704.06382 11. Chlebus G, Schenk A, Moltz JH, van Ginneken B, Hahn HK, Meine H (2018) Automatic liver tumor segmentation in CT with fully convolutional neural networks and object-based postprocessing. Sci Rep 8(1):1–7 12. Jaeger PF, Kohl SA, Bickelhaupt S, Isensee F, Kuder TA, Schlemmer HP, Maier-Hein KH (2020) Retina U-net: embarrassingly simple exploitation of segmentation supervision for medical object detection. In: Machine learning for health workshop. PMLR, pp 171–183 13. Yushkevich PA, Piven J, Hazlett HC, Smith RG, Ho S, Gee JC, Gerig G (2006) User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 31(3):1116–1128 14. Kirkland EJ, Kirkland EJ (2010) Bilinear interpolation. In: Advanced computing in electron microscopy, pp 261–263 15. Roth HR, Shen C, Oda H, Oda M, Hayashi Y, Misawa K, Mori K (2018) Deep learning and its application to medical image segmentation. Med Imaging Technol 36(2):63–71 16. Rister B, Yi D, Shivakumar K, Nobashi T, Rubin DL (2020) CT-ORG, a new dataset for multiple organ segmentation in computed tomography. Sci Data 7(1):1–9 17. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980 18. Sharma N, Aggarwal LM (2010) Automated medical image segmentation techniques. J Med Phys 35(1):3
Proactive Decision Making for Handover Management on Heterogeneous Networks A. Priyanka and C. Chandrasekar
Abstract Future innovations rely on rapid technological advances to boost downlink strength. 5G ought to constitute one of the foundational technologies for an interconnected future. However, there is no contest that the end user’s venturing style is committed to handover (HO) management. The common structure-based measurement of the target cell to clarify the handover processes requires a frequent measurement gap (MG), and these approaches must build a relationship to performance degradation. The deployment of ultra-dense small cells (UDSC) within a macro-cell layer contributes to multi-tier networks that are referred to as heterogeneous networks. (HetNets). The frequency of HOs and radio link failures (RLF) did, however, drastically grow along with the network’s UDSC. As a result, in order to enhance the operation of the system, the management of mobility has grown to be a highly crucial task. Both the frequency of HOs and the percentage of HO failures (HOF) are objectives of the suggested approach. A simulation using 4G and 5G networks is carried out to determine how well the suggested technique performs. The simulation findings demonstrate that the suggested approach, when compared to existing algorithms from the literature, considerably lowers the typical levels of HO ping-pong (HOPP) and HOFs. Keywords 4G · 5G · Handover · Handover control parameter · Heterogeneous networks · Measurement gap · Proactive decision making
1 Introduction The incredible fulfillment of human existence is made possible by technology’s quick development. IoT-based gadgets serve as a prime illustration of their unparalleled performance. IoT devices are physical things that are wirelessly linked and have sensors built in to keep connectivity. Unbreakable connections are necessary A. Priyanka (B) · C. Chandrasekar Department of Computer Science, Periyar University, Salem, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_43
629
630
A. Priyanka and C. Chandrasekar
to safeguard ongoing device management. The amazing technology of 5G makes it plausible that our ideas were created for Internet of things (IoT) gadgets. Online gaming, virtual reality applications, self-driving cars, and other things use 5G technology. Low-latency data transmission is offered continuously via 5G. In comparison with earlier generations, 5G’s potential to provide accommodations is quite great. The number of mobile users is growing by the day, and end users (EU) will be spread among a large number of tiny cells in 5G, which will eventually outnumber 4G. When switching from 4 to 5G networks, a 5G HO is a crucial component [1]. With a 5G handover, the device will be effortlessly switched from a 4G LTE network to a 5G network for the best possible speed and performance. It’s anticipated that the switch from 4 to 5G will go smoothly, with little disruption and downtime. The number of HO situations will rise in order to transmit data without interruption. A mobile device shifts from one cell site to another using the HO method [2]. This occurs if the user moves between sectors covered by several cell towers or if the device leaves the serviced cell’s coverage area. When a device switches from one carrier to another within the same spectrum, it also happens in the context of 5G. The EU’s mobility will pose a serious obstacle to maintaining a steady and dependable link. Therefore, the relevance of HO management is seen as an issue and given the proper treatment by the researchers. However, one drawback of certain present methods is HO production that is ineffectual. Excessive data traffic for connection management may result from unnecessarily often initiating HO. Compared to the earlier iterations of mobile technology, 5G delivers much faster speeds [3]. This implies that 5G can provide a far superior experience for consumers whether streaming films or downloading data. 5G has the additional benefit of using a lot less energy than earlier mobile technology generations. Several factors make this crucial, including lowering your carbon impact and lowering your monthly cell cost. Improved security features and the capacity to connect to a larger variety of devices are just a couple of the additional benefits of 5G. Decision making on the HO technique is critical if the EU is working in a diverse area. The signal strengthbased HO mechanism may heterogeneously recommend erroneous choices. Due to the nature of cells, the apparent signal intensity range for 4G and 5G may be the same. But 5G offers a higher data rate than 4G. The methodology’s commitment is to pick the best target cell among a variety of cells. While the device is in circulation, the HO process has undeniable procedures in place to achieve data transformation. Network selection is a crucial first step in choosing the best service provider. Traditional approaches always take time to evaluate the best cell among the different cells from different perspectives. This is the disadvantage of all techniques because these MGs occur often [1]. The ability of the present cell to manage the connection throughout this measurement time may be constrained by its performance. The suggested solution must thus take the essential steps to ensure that the procedure is completed as quickly as possible. The goal of this effort is to close the measurement gap and improve HO management.
Proactive Decision Making for Handover Management …
631
1.1 Research Objectives Most countries are planning to upgrade to 5G for technological growth. HO management is a crucial task of every telecommunication system for continuous, interruption-free communication. Unnecessary HO management was observed from earlier fresh weight-based network assessment schemes. The fluctuation performance of every cell, like the data speed, could suddenly go up, and the next second it could go down, which is the nature of networking technologies. Therefore, fresh weight consideration-based HO execution can operate nonessential HO processes. MG is a problem realized from the old methods, but 5G needs a minimum time-absorbing schema. To select an almost suitable cell, the measurement of the parameter values has been required, and the measuring period at the time of HO initiation with multiple attributes will consume more time. The objective and motivation of this research are to reduce the number of unnecessary handover attempts (HOA) and achieve the minimum HOF.
1.2 Contribution The existing methodologies for the network selection process can frequently need a measurement gap (MG) to estimate the cell performance. The degradation of connection quality is met in the course of this measuring length. This paper presents the new HO management method with machine learning concepts. With this methodology, latency is reduced by bringing compute capabilities into the network, closer to the end user. The user-experienced data collection and execution of the HO process primarily based on the history of the dataset is advised and designed in this work. The maximum-suited predictive mechanism has been taken into consideration for the assessment of the cell, and it requires the collection of a dataset to initiate the HO process. A data collection technique is introduced from this HO-responsible assignment to preserve the history of the cell. The concept of proactive decision making (PDM) is the core theory of this proposed work. The implementation of this encouraging approach will work well for both homogeneous and heterogeneous networks.
2 Related Work XGBoost has been one of the ensemble methods to assess different techniques [1]. Predictive-based HO management is applied to the problem of MG, which becomes the main contribution of this recommended technique. The data collection is processed and among 100% of the data, 70% is considered as training samples, and the remaining 30% is considered as test samples. The intelligent technique works
632
A. Priyanka and C. Chandrasekar
faster and more accurately, and it eliminates the delay. The achieved handover success rate (HSR) of this methodology is 100%. The different network environments like UMTS, GPRS, WLAN, 4G, and 5G are constructed within the algorithm [2]. From the heterogeneous platform, the network download rate is found as a final target output to select the suitable cell. Attributes like user speed, maximum transmission rate, minimum transmission delay, SINR, bit error rate, and packet loss rate have participated in the neural network method. This method keeps the collection of fresh weights for cell selection. HO management believes fresh weights can give the worst connection by selecting the worst cell. Because of the fluctuation attitude observed by the UE, the weights can change at any time, so the methodology can select the best cell and the worst cell also. So the drawback of this approach is that it uses fresh weights to find the network download rate and MG to select the most suitable cell. Machine learning (ML) will be an adaptable technology for generations like 5G, 6G, and so on. The survey delivers the comfort of the management of various popular and distinctive ML methods to run the HO process [3, 4]. Making everything under the dominating technologies of ML is the most important goal for many researchers, and the world is currently going through the support of ML concepts. The recurrent neural network (RNN) method is handled in this work [5] to forecast the access point’s receiving signal strength. From the hidden layer, the long shortterm module (LSTM) algorithm is processed for the prediction of the data series. The new sequence of received signal strength indicator (RSSI) is predicted and the HO procedure is performed based on the forecasted series in vehicular ad hoc network (VANET). Ant colony optimization (ACO) is studied for HO management with the concept of adaptive pheromone evaporation [6]. The problem with this algorithm is its complexity of implementation in the network selection process. The best cell selection process is the main motive of the various research outcomes. Based on the optimal techniques, there is a chance to select an average performer who will provide a medium-quality connection. The schema for the cell selection process, called MADM methods [7, 24], will conclude the process by absorbing the MG. because performing a number of calculations at the time of the handover process is a drawback. And believing in fresh weights can suggest a low-quality connection. The suggested models, like fuzzy MADM methods [8, 11, 13, 27], will also yield a measurement delay. It won’t consider the fresh weights; instead it can assess the range of weights. However, examining the different cells collectively at the time of HO initiation will always take more time. Another problem with fuzzy-based methods is the chance of selecting an average-performance issuing cell. Let’s look at an example to see how. Consider the following two cells: C1 and C2. The range of the parameter value (data rate) for both cells is 50–150 Mbps. The range is fixed with 100 fluctuated measured values, which means the data points are in between the range. Now C1 has 60% of data points between the range of 50–100 Mbps and 40% of data points between the range of 100–150 Mbps. C2 is different from C1. C2 has 40% of data points between the range of 50–100 Mbps and 60% of data points
Proactive Decision Making for Handover Management …
633
between the ranges of 100–150 Mbps. When we go to the fuzzy methods for the cell selection process, these two cells will have the same priority. So the wrong cell selection may happen due to this wrong prediction of the fuzzy-based methods. The optimization problem of [9] is the cell reselection with channel allocation and is formulated with particle swarm optimization (PSO) and a modified version of the genetic algorithm (GA). The price that users must pay to access the top network is decreased by formulating an optimization problem to limit the amount of cumulative interference generated by verified people. The objective is to satisfy the interference requirements of any network with open channels while yet offering EU a specific QoS at a lesser cost. Optimization algorithms will provide a near-optimal solution, not an exact solution. MG is the main issue of these traditional existing methodologies. A request-based handover technique (RBHS) is proposed to increase user perception and acquire the best allocation of resources, and a caching method based on user queries is suggested. Signal inference-to-noise ratio (SINR)-based HO strategy is recommended in [10] to relieve the burden of the source cell. The reduced data traffic is achieved from this methodology for traffic-free connection management. But an unpredictable increase in the level of noise is the main problem in a real environment. The fundamental contribution of this research is to cluster the IEEE 802.11 access point APs of an SDN-based wireless network. The clustering will be carried out using hybrid artificial intelligence (AI) approaches, K-means, and genetic algorithm (GA), to aggregate the optimal APs according to wireless network factors such as distance, fading, and noise in [12]. A distance measure from the EU to the cell Euclidean distance calculation method is used. The objective of this research is to reduce the HO delay and HO failure in IEEE 802.11 SDN-based distributed mobility management. The problem of network selection from 5G the reinforcement learning controller is considered [14], and the whole process is modeled as Markov decision process with a novel state-space aggregation approach. For network resource allocation, the load balancing algorithm is advised for a telecommunication system. The cuckoo search method is used to improve under the constraint of meeting the QoS guarantee of users, and the best allocation scheme is found by iterating several times. When compared to the classic genetic method, the computation complexity can be lowered. The available spectrum from all networking channels is obtained, and the cuckoo search algorithm is applied to check the quality of the service optimally [15]. Based on the optimal results obtained from the model, the source will be allocated to the concerned users. Reinforcement learning with Markov decision process is availed for network selection and HO management [16]. Based on the number of slices, it is the EU’s responsibility to choose the best connection slice. Deep Q-Learning uses experience replay training as a method to enhance learning. When training a neural network, it is vital that the data in the training set have an independent and identical distribution. The training concentration data that the reinforcement learning sample has discovered, however, are connected. The handover control parameter (HCP) settings are configured manually to optimize the cell selection process [17]. The impact of different HCP settings on 5G
634
A. Priyanka and C. Chandrasekar
gives great results which is what the study of this methodology presents. The high ping-pong handover probability for all mobile speed situations is a notable disadvantage of lower HCP settings, in contrast to higher HCP values. The simulation findings show that if one of these systems is used, midrange HCP settings may be the best option. Adopting automated self-optimization (ASO) features is the best user experience-related method, according to this study. The survey told a deep discussion of a variety of HO decision algorithms [18– 20] like self-optimization, intelligent HO technique, mobility prediction-based HO management, user speed-based HO decision, and so on. The controversy provides the results of how machine learning methods are used for HO decision problems in 5G UDSC. The authors proposed a velocity-based self-optimization algorithm to adjust the HCP in 4G/5G networks [21]. The algorithm utilizes the user’s received power and speed to adjust the HO margin during the user’s mobility. This study considers three types of HCPs: TTT, measurement interval, and hysteresis, all of which are changed based on the number of HOPP done in a measurement interval. To increase HO performance, a three-layer filter mechanism is used, which section that includes the mobility condition and speed status of the UE. The results of the suggested algorithm achieve a remarkable reduction in ping-pong HOs and radio link failure. This paper presents an auto-tuning optimization (ATO) algorithm for HO management that utilizes user speed and received signal reference power [22]. The proposed algorithm’s aim is to achieve the reduction of the number of frequent HOs and HOF ratios. The presented energy-efficient vertical HO management technique is based on vector normalized preferred performance-based VIKOR algorithm [23]. The obtained results are lesser energy consumption in the presence of three interfaces WLAN, WIMAX, and cellular interface operating simultaneously. The baseline VVPP method’s energy-saving features have been examined for four congestion classes and contrasted with alternative normalized techniques. Additionally, the power used in each traffic class has been examined, and it has been determined that, for a variety of quality of service parameters, the V-VPP technique resulted in lower power usage than alternative normalized methods. The proposed method is based on fuzzy logic and the parameters considered are received signal strength, channel capacity, velocity of the user, and distance from the base station [25]. Fuzzy logic-based network selection can give you average results instead of rich results discussed in the result section. The ensemble method called XGBoost is a machine learning method that is considered in this work [26] for the network selection process. This method provides an automatic HO execution service to all users when the UE needs to establish the connection to the new cell. The data gathered via the sample window is used to enhance the XGBoost classification algorithm’s prediction of the handover success rate. Integrating machine learning prediction model with a traditional handover execution models decreases communication costs while increasing handover success rates. Different IoT devices are used to conduct the experiment. Limitations of the existing work are shown in Table 1.
Proactive Decision Making for Handover Management …
635
Table 1 Limitations of existing works References
Limitations of the existing works
[1]
Lack of standardization for handover management in 5G networks
[2]
Insufficient coordination between different network elements during handovers
[3]
Lack of automation in handover management leading to delays and errors
[4]
Inadequate handover decision algorithms leading to poor network performance
[5]
Poor handover planning and resource allocation leading to suboptimal network utilization
[6]
Limited support for heterogeneous networks and devices in handover management
[7]
Inconsistent handover policies across different network operators and regions
[8]
Lack of feedback mechanisms to improve handover management based on network performance
[9]
Inadequate monitoring and troubleshooting tools for handover management in 5G
[10]
Limited support for advanced handover techniques such as dual connectivity and mobility management
[11]
Insufficient testing and validation of handover management mechanisms in 5G networks
[12]
Inconsistent handover management between different radio access technologies (RATs)
[13]
Poor integration between core network and radio access network (RAN) in handover management
[14]
Inadequate support for vertical industries and services with specific handover requirements
[15]
Lack of standardization for handover-related performance metrics and KPIs
[16]
Limited awareness among network operators and vendors about the importance of proactive handover management
[17]
Insufficient training and education for network engineers and technicians on handover management
[18]
Inadequate coordination between different stakeholders in the 5G ecosystem for handover management
[19]
Limited involvement of end-users in handover management and optimization
[20]
Lack of incentives for network operators and vendors to invest in proactive handover management
[21]
Poor scalability of handover management mechanisms in 5G networks
[22]
Inadequate support for high-speed and low-latency handovers in 5G
[23]
Insufficient security mechanisms for handover management in 5G
[24]
Limited use of machine learning and artificial intelligence in handover management
[25]
Inconsistent handover management between different deployment scenarios (e.g., indoor vs. outdoor, rural vs. urban)
[26]
Lack of real-time monitoring and optimization of handovers in 5G
[27]
Insufficient testing and validation of handover management mechanisms in 5G networks
636
A. Priyanka and C. Chandrasekar
3 Proposed Model The working model for simulation considers a HetNet architecture that incorporates numerous 4G macro-cells and a 5G small cell in which three small cells are uniformly disbursed in every macro-cell. Each macro-cell is designed similarly to three-sector hexagonal layouts, and each sector contains a single small cell. Figure 1 describes the deployment of the proposed network scenario with three small cells allocated to one macro-cell. R is the radius of the macro-cell, and r is the radius of the small cell. The macro-cells can be 4G LTE and can operate at frequency bands below 5 GHz, while the small cells can operate in mm-wave bands. The EUs are generated randomly everywhere with random mobility models in all macro- and small cells, and they can move anywhere in the geographical area during time t. We determine the EU as E = {e1 , e2 ,…, ep }. Meanwhile, E represents the set of EUs where e ∈ E can travel in a random direction Θe ∈ [0, 2π ] where the average velocity of the EU V e ∈ [vmin , vmax ]. The description for the macro-cell is M = {m1 , m2 , …, mq }, and the small cell is S = {s1 , s2 , …, sr }. The execution of HO procedure can be processed in the same or different network when the EU changes its position from the serving cell to target cell. Here, the serving cell is the decision maker for HO problems based on the measurement report given by the EU to enlighten the HO process. Within their own transmission range, each macro-cell and small cell provides a high-quality radio link to the EU. The quality of the radio link is calculated with the value of reference signal received power (RSRP). According to the RSRP, the network download rate (NDR) is manually fixed in every EU. Handover control parameter (HCP) settings are configured manually in all macro- and small cells properly and individually. Here, HCP from all cells has played an important role in the network assessment process. Hyper-connected future is the goal of many researchers to connect everything to make a smart home, smart cities, [2] etc. 5G is the solution to make the dreams come true. For seamless data delivery, HO is a valuable mechanism of 5G and other generations. In this methodology, the proposed work enlightens with four strategies. Fig. 1 HetNets model
Proactive Decision Making for Handover Management … Table 2 Dataset
637
S. No.
ASU
RSRP (dBm)
NDR (Mbps)
1
1
−140
0–10
2
2
−139
0–20
…
…
…
…
93
93
−48
450–920
94
94
−47
450–930
95
95
−46
450–940
96
96
−45
450–950
• HO trigger • Problem formulation and cell selection • HO process.
3.1 Dataset Description The dataset is generated on simulation in this study and might be interpreted as a hypothesis wherein, for each RSRP value, a random NDR value is selected from a certain range of NDR stated in Table 2. To put it another way, if the RSRP value is −92 dBm, the NDR rate can be in the range from 200 to 480 Mbps. As a result, during the simulation, a randomized NDR value is assigned between the ranges specified from table.
3.2 HO Trigger The gadgets can move away from the transmission range of the ongoing cell, and they will be responsible to enhance the connection by triggering the HO progress. While the equipment is on the journey in any other direction away from the contemporary cell, the received signal strength can go worse. The current ongoing data delivery process may be affected due to low signal strength [4]. So, the HO is necessary here to overcome the performance degradation. The threshold value is the boundary to predict the low-quality link. If the measurement of the signal strength is gone below the threshold value, the equipment will automatically trigger HO management. The HO process will be initiated by applying two conditions from the EU side. If the received radio link quality of the serving cell goes below the threshold value considered in this work, the HO process will be triggered. Another condition is when the EU perceives the neighbors’ NDR, which means HCP value is greater than the source cell, the HO process will be started.
638
A. Priyanka and C. Chandrasekar
3.3 Problem Formulation and Cell Selection The EU of all cells receiving a radio link from the serving cell can measure the RSRP [7]. An arbitrary strength unit (ASU) is an integer value proportional to the RSRP measured by the mobile phone. ASU is a measurement map for RSRP. The performance of the EU always depends on the quality of the radio link. According to the RSRP, the NDR is fixed for all EUs. Quantitative treatment in problem formulation on handover management using HCP involves the use of mathematical and statistical tools to analyze and optimize the handover process. Here are some steps that can be taken to apply quantitative treatment in this context: • Define the Problem: The first step is to define the problem that needs to be solved. This could be calculation of the HCP value to minimize handover latency or maximize network capacity. • Data Collection: The next step is to collect data on HCP, such as ASU and NDR. This data can be obtained from every EU. • Data Analysis: The collected data can then be analyzed using statistical methods such as second-degree parabolic regression. This can help identify patterns and trends in the data and provide insights into the relationships between handover control parameters and network performance. Based on the dataset, the NDR is predicted to conclude the network performance. For the prediction process, the second-degree parabolic regression method is used. ASU is an independent value, and NDR is a dependent value. Depending on ASU, the NDR will be estimated. Minimum, average, and maximum values of ASU are considered to forecast the NDR. Finally, the average of the forecasted NDR will be considered as the HCP value, and the network selection process will be concluded based on this value. • Validation: Finally, the HCP can be validated through simulation to ensure that they deliver the expected improvements in network performance. When the EU changes its position, the cell reselection process will be initiated by consideration of the two conditions of the HO trigger process. The EU can collect the HCP value from the cells of those who are the neighbors of the EU. Based on the HCP value, the EU can select the top service provider as a target cell for HO management. Here Table 3 shows the HCP value received by the EU for the network reselection process. Here, the EU perceives five cells as neighbors to reselect the best cell. The measurement report of the five cells is assessed, and the top-scoring cells will be selected for the HO process. Here, S 2 (small cell) is the best-performing cell, and the EU selects S 2 as a target cell. Table 3 Handover control parameter values Cells
M1
M2
S1
S2
S3
HCP value (Mbps)
44.18
67.12
199.26
212.14
201.78
Proactive Decision Making for Handover Management …
639
3.4 HO Process Access and mobility management function (AMF) is a control plane function in 5G core network. The main functions and responsibilities of AMF are: • • • •
Registration Management Reachability Management Connection Management Mobility Management.
The user plane function (UPF) is a function that does all of the work to connect the actual data coming over the radio area network (RAN) to the Internet. Being able to quickly and accurately route packets to the correct destination on the Internet is key to improving efficiency and user satisfaction. • From Fig. 2, the EU is committed with source gNODEB (gNB) and it forwards the measurement report to the source gNB. Measurement report contains the neighbor’s necessary information. • Until then to establish a connection into the target gNB the downlink (DL) and uplink (UL) process is engaged with the source gNB. • Radio resource control (RRC) protocol is an in-network layer for connection establishment and release function. • The source gNB decides to hand over the EU depending on the measurement report. • The source and target gNB have active Xn connection. Over Xn interface the Xn: Handover request is delivered to the target gNB. This message carries the transparent RRC container with the HO preparation information and RRC message along with your target cell ID and other information. • The target gNB takes into account the information received in the RRC container.
Fig. 2 HO process
640
A. Priyanka and C. Chandrasekar
• While the target gNB decides to approve the handover, it sends the HO request acknowledgment message to the source gNB which includes a transparent container to be sent to the UE as an RRC message to perform the handover. • The source gNB initiates the handover by forwarding the RRC reconfiguration message to the UE, containing the information to access the target cell. • The EU has successfully connected to the target gNB, by sending the RRC reconfiguration complete message to target gNB. UE starts uplink data with the help of the target gNB. • The target gNB sends the path switch request message to AMF to switch the downlink data path. This message also contains a list of protocol data unit (PDU). • AMF confirms the path switch request message with the path switch request acknowledgment. This message contains a list of PDU sessions that have been switched and are to be released. • Based on the path request message acknowledgment, the target gNB sends the EU context release message to the source gNB. Furthermore, the source gNB then releases the resources associated with the EU. Algorithm 1.
begin
2.
Setup the simulation
a.
EU (E = {e1 , e2 ………….ep })
b.
Macro-cell (M = {m1 , m2 ………….mq })
c.
small cell (S = {s1 , s2 ………….sr })
3.
Start Simulation
a.
Simulation time(t)
4.
Start mobility (EU)
a.
EU speed V e ∈ [vmin , vmax ]
5.
EU – collects HCP(N)
6.
Select the target cell (Proactive Decision Making)
7.
Apply RSRP measurement
8.
if RSRP(S) < RSRP(δ)
a.
HO trigger
b.
Update HOM based on the selected cell
9.
else
a.
HO Decision – false
b.
Keep connection to SOURCE CELL
10.
end
When the simulation starts, the EU can rottenly change its position by applying a random mobility model. The EU can roam in the same cell or between different cells. The time duration is fixed for the minimum and maximum velocities of the EU. The EU will collect the HCP values of all neighbors to choose the target cell.
Proactive Decision Making for Handover Management … Table 4 Simulation parameters
641
Parameter
Value Small cell
Macro-cell
Number of cells
156
52
Cell radius (m)
100
300
System bandwidth (MHz)
500
20
Simulation area
8 × 8 km2
Number of EUs
300
Mobility model
Random waypoint model
Simulation time (s)
600
EU speed (m/s)
20, 40, 60, 80, 100
Thermal noise density (dbm/Hz)
−174
Prediction models
Second-degree parabolic regression
The top-scoring HCP provider will be selected as a target cell by the EU proactively. The RSRP measurement of the source cell from the EU side is applied every second. The HO trigger process will be started when the RSRP (Source cell) is less than RSRP (threshold), and the EU will forward the target cell information along with a measurement report to the source cell and the source can process the HO management.
4 Simulation Environment The proposed technique is implemented in OMNET++. This research work is an extension of our previous work, and there are some drawbacks observed when we move with the fuzzy-based MADM model to 5G HetNets. The presented new model works well for future technologies. We consider a HetNet that consists of 52 macrocells and 156 small cells with an area of 8 × 8 km2 . Each macro-cell is built with three small cells that are located in the middle of each macro-cell’s sector. The random waypoint mobility model is the movement model of the mobile nodes. Dynamic source routing (DSR) is considered to route the data packets between the mobile nodes (Table 4).
5 Performance Evaluation To analyze the accomplishment of the proposed user-experienced performance-aware network selection process, the simulations are conducted by considering different EU speeds. The performance of the proposed model is compared with that of MADM [7],
642
A. Priyanka and C. Chandrasekar
Average HOA
HOA 20 15 10 5 0
20(m/s)
40(m/s)
60(m/s)
80(m/s)
100(m/s) 12.96
Neural Network
9.06
10.96
10.26
11.96
MADM
9.75
9.95
11.56
13.79
15.67
Fuzzy MADM
8.56
8.95
8.76
9.98
10.78
Proposed model
5.3
6.2
7.05
8.35
9.12
Fig. 3 Average HOA
neural network [2], and fuzzy MADM [8] network selection methods. The performance metrics considered in this model from the overall simulation time are HOA, HOPP, RLF, HOF, and HO delay.
5.1 HOA HOA is the measurement of how frequently HO happens between the source cell and target cell. The probability of interchanging links between different cells from all EUs is observed and the average HOA is taken for comparative analysis, where p is the total number of users. ∑p HOA (1) HOA = i=1 p Figure 3 shows the average HOA versus different EU speeds achieved from the simulation. The achieved results of an average HOA from the recommended model are less than the other models. The examined model significantly reduces the average HOA under all speeds in the EU.
5.2 HOPP HOPP is the representation of unnecessary HO happening between the source and target cell. When the EU disconnects its radio link from the source cell, establishes a new connection with another cell, and then bounces back to the previous source cell, it is considered HOPP.
Proactive Decision Making for Handover Management …
643
HOPP
Fig. 4 Average HOPP Neural Network
MADM
Fuzzy MADM
Proposed model
7.8
5.6
3.5 1.1 HOPP
HOPP =
NHOPP , NRHO
(2)
where N HOPP is number of occurred HOPP from the overall simulation and N RHO could be number of HOs requested. Figure 4 indicates the average HOPP realized from the simulation. The observed effects of average HOPP from the advocated model turn out to be less than the alternative fashions. The tested model appreciably reduces the common HOPP.
5.3 RLF The connection loss of an EU from the source cell can occur due to low-quality links during the HO process. Failure of the radio link concurrently with the activity of the HO process, which disrupts the communication, is treated as RLF. However, the primary source of RLF includes HOF circumstances or disconnections in the communication link. The average probability of RLF from all the EU is: ∑p RLF =
RLF . p
i=1
(3)
Figure 5 suggests the common RLF as opposed to different EU speeds accomplished from the simulation. The achieved outcomes of common RLF from the endorsed version turn out to be less than the other models. The examined version notably reduces the common RLF under all speeds of the EU.
5.4 HOF The lack of target resource availability is one of the reasons for HOF. When the HO process is initiated, the EU will search the neighbors for target cell selection. There can be a chance that insufficient resource availability will lead to the HOF. Another reason for HOF is that EU can move out of the network coverage area of the target
644
A. Priyanka and C. Chandrasekar
Average RLF
RLF 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0
20(m/s)
40(m/s)
60(m/s)
80(m/s)
100(m/s)
Neural Network
0.07
0.12
0.17
0.25
0.29
MADM
0.09
0.15
0.18
0.27
0.28
Fuzzy MADM
0.04
0.09
0.15
0.17
0.19
Proposed model
0.02
0.05
0.07
0.1
0.1
Fig. 5 Average RLF
HOF Neural Network
MADM
Proposed model
Fuzzy MADM 3.68 3.5
1.56
0.95
HOF Fig. 6 Average HOF
cell before completely establishing the HO process. HOF =
NHOF , NHOA
(4)
where N HOF is count of HOF and N HOA is number of HO attempted. Figure 6 shows the unusual HOF perceived from the simulation. The complete consequences of commonplace HOF from the considered model are less than the alternative models. The examined model significantly reduces the common HOF.
5.5 Handover Delay The time difference between the connection in the old cell and a new cell is considered as a delay to finish the HO process. To select a cell, the methodology must take a
Average HO delay (ms)
Proactive Decision Making for Handover Management …
645
HO DELAY 40 35 30 25 20 15 10 5 0
20(m/s)
40(m/s)
60(m/s)
80(m/s)
100(m/s)
Neural Network
29
28
30
31
33
MADM
33
32
34
33
34
Fuzzy MADM
28
29
30
32
33
Proposed model
15
15
16
16
17
Fig. 7 Average HO delay
minimum time otherwise the delay becomes huge. The time of UE in the new cell (T new cell) and the time of UE in the old cell (T old cell) have been found to measure the handover delay. Handover Delay = Tnew cell−Told cell
(5)
The proposed methodology runs the HO process with very little time compared to the existing work. Figure 7 shows how much the number of milliseconds is reduced using the proposed methodology. Cell selection is the most valuable process for observing the highest throughput. High throughput is the reason for fast data delivery. There is a chance to get higher throughput by the existing methods when it selects the excellent cell. If not, the old technique-based methodology will supply the average throughput. The degradation of throughput will contribute to the delay in data transmission. However, the proposed method-based cell selection technique will always select the most excellent service provider.
5.6 Limitations of the Proposed Methodology • Inadequate network coverage: Handover decisions based on download rate alone may not consider the coverage area of the network. As a result, handovers may be initiated even if the target network has poor coverage in the area where the user is located, leading to call drops. • Delay in handover initiation: If handover control parameters are not updated frequently, there may be a delay in initiating handovers. This could lead to a poor user experience, especially for applications that require real-time data transfer.
646
A. Priyanka and C. Chandrasekar
• When handover control parameters are initialized in each base station, frequent updates may result in high signaling overhead, causing network congestion, and degraded performance. • Issues with compatibility: Handover control parameters initialized in one network may not be compatible with another. This could result in compatibility issues during handover, leading to call drops and poor network performance.
6 Conclusion When devices move fast, mobility management is a meaningful task to strengthen the seamless data delivery process. And the whole process of mobility management should be done with minimal latency. From the existing three methodologies, we learn that the estimation of network performance or the score calculation task of the networks is handled at the moment of HO triggered by the EU. The time consumption to select the target cell by the existing methodology is higher than the proposed methodology. In this recommended model, the role of HCP being all cells is the considerable mechanism to execute the process with minimum latency. PDM is crucial in handover management in the context of mobile networks. The complex and dynamic nature of mobile networks requires a proactive approach to handover management to ensure seamless connectivity and high-quality user experiences. PDM involves anticipating and addressing issues before they occur. It involves analyzing network data to identify patterns and trends, detecting network anomalies, and taking corrective action to prevent service disruptions or degradation. It also involves leveraging advanced technologies such as artificial intelligence, machine learning, and automation to improve decision making and optimize network performance. By embracing PDM in handover management, network operators can enhance the reliability and availability of their networks, reduce operational costs, and improve customer satisfaction. It also enables them to take advantage of the full potential of 5G technology, such as ultra-low latency, massive machine-type communications, and high data rates. In summary, PDM is critical in handover management in 5G networks, and it can help network operators to achieve their business goals and deliver exceptional user experiences. The use of automation in handover management is likely to increase in the future. With the use of AI and machine learning, networks will be able to detect anomalies and potential issues automatically and take corrective action without human intervention. Acknowledgements The first author sincerely acknowledges the financial support (University Research Fellowship) provided by the Department of Computer Science, Periyar University, under the grant: PU/AD-3/URF Selection Order/016175/2020.
Proactive Decision Making for Handover Management …
647
References 1. Majid SI, Shah SW, Marwat SNK, Hafeez A, Ali H, Jan N (2021) Using an efficient technique based on dynamic learning period for improving delay in AI-based handover. Mob Inf Syst 2. Tan X, Chen G, Sun H (2020) Vertical handover algorithm based on multi-attribute and neural network in heterogeneous integrated network. EURASIP J Wirel Commun Network 3. Mollel MS, Abubakar AI, Ozturk M, Kaijage SF, Kisangiri M, Hussain S, Imran MA, Abbasi QH (2021) A survey of machine learning applications to handover management in 5G and beyond. IEEE Access 4. Tanveer J, Haider A, Ali R, Kim A (2022) An overview of reinforcement learning algorithms for handover management in 5G ultra-dense small cell networks. Appl Sci 5. Aljeri N, Boukerche A (2019) A two-tier machine learning-based handover management scheme for intelligent vehicular networks. Ad Hoc Netw 6. El Fachtali I, Saadane R, El Koutbi M (2017) Improved vertical handover decision algorithm using ants’ colonies with adaptive pheromone evaporation rate for 4th generation heterogeneous wireless networks. Int J Wirel Mob Comput 12(2) 7. Preethi GA, Gauthamarayathirumal P, Chandrasekar C (2019) Vertical handover analysis using modified MADM method in LTE. Mob Netw Appl 8. Mansouri M, Leghris C (2020) A use of fuzzy TOPSIS to improve the network selection in wireless multi access environments. J Comput Netw Commun 9. Ul Hasan N, Ejaz W, Ejaz N, Kim HS, Anpalagan A, Jo M (2016) Network selection and channel allocation for spectrum sharing in 5G heterogeneous networks. IEEE Access 10. Jia F, Zheng X (2018) A request-based handover strategy using NDN for 5G. Wirel Commun Mob Comput 11. Chamodrakas I, Martakos D (2011) A utility-based fuzzy TOPSIS method for energy efficient network selection in heterogeneous wireless networks. Appl Soft Comput 12. Basloom S, Akkari N, Aldabbagh G (2019) Reducing handoff delay in SDN-based 5G networks using AP clustering. In: Procedia computer science, 16th international learning and technology conference 13. Priya B, Malhotra J (2019) 5GAuNetS: an autonomous 5G network selection framework for Industry 4.0. Soft Comput 14. Priscoli FD, Giuseppi A, Liberati F, Pietrabissa A (2020) Traffic steering and network selection in 5G networks based on reinforcement learning. In: 2020 European control conference (ECC), 12–15 May 2020 15. Ai N, Wu B, Li B, Zhao Z (2021) 5G heterogeneous network selection and resource allocation optimization based on cuckoo search algorithm. Comput Commun 16. Wu Y, Zhao G, Ni D, Du J (2021) Dynamic handoff policy for RAN slicing by exploiting deep reinforcement learning. Eur J Wirel Commun Netw 17. Saad WK, Shayea I, Hamza BJ, Mohamad H, Daradkeh YI, Jabbar WA (2021) Handover parameters optimisation techniques in 5G networks. Sensors 18. Shayea I, Ergen M, Azmi MH, Çolak SA, Nordin R, Daradkeh YI (2020) Key challenges, drivers and solutions for mobility management in 5G networks: a survey. IEEE Access 19. Tayyab M, Gelabert X, Jäntti R (2019) A survey on handover management: from LTE to NR. IEEE Access 20. Ahmad R, Elankovan A, Sundararajan AK (2020) A survey on femtocell handover management in dense heterogeneous 5G networks. Telecommun Syst 21. Alhammadi A, Roslee M, Alias MY, Shayea I, Alquhali A (2020) Velocity-aware handover self-optimization management for next generation networks. Appl Sci 22. Alhammadi A, Roslee M, Alias MY, Shayea I, Alraih S, Mohamed KS (2019) Auto tuning self-optimization algorithm for mobility management in LTE-A and 5G HetNets. IEEE Access 23. Baghla S, Bansal S (2018) An approach to energy efficient vertical handover technique for heterogeneous networks. Int J Inf Technol 24. Singh P, Agrawal R (2019) AHP based network selection scheme for heterogeneous network in different traffic scenarios. Int J Inf Technol
648
A. Priyanka and C. Chandrasekar
25. Goutam S, Unnikrishnan S (2020) Algorithm for vertical handover in cellular networks using fuzzy logic. Int J Inf Technol 26. Nayakwadi N, Fatima R (2021) Automatic handover execution technique using machine learning algorithm for heterogeneous wireless networks. Int J Inf Technol 27. Dhand P, Mittal S, Sharma G (2021) An intelligent handoff optimization algorithm for network selection in heterogeneous networks. Int J Inf Technol
A Comprehensive Review on Fault Data Injection in Smart Grid D. Prakyath, S. Mallikarjunaswamy, N. Sharmila, V. Rekha, and S. Pooja
Abstract Nowadays, power generation at the utility side and transfer to the demand side have been controlled by the smart grid. Day-by-day entire power distribution process has moved in multiple directions and connects more residential and industrial sectors. Due to these phenomena, more monitoring, and security processes have been adopted in smart grid to control fault data injection, cyber-attack, and physical side attackers in smart grids. This research study analyzes the fault data injection in smart grid with respect to the malicious data, signal, and connectivity process. As a part of this research study, a survey has been done on various techniques to control the faults in smart grid. The analysis carried out in this study is very helpful to identify and determine the suitable method to control the fault in smart grid. Along with these, a countermeasure against the FDI is also summarized on the cyber-attack and physical attack. Keywords Smart grid · Fault data injection attack (FDIA) · Cyber-attack · Machine learning D. Prakyath Department of Electrical and Electronics Engineering, SJB Institute of Technology, Bengaluru, Karnataka, India S. Mallikarjunaswamy (B) Electronics and Communication Engineering, JSS Academy of Technical Education, Bengaluru, Karnataka, India e-mail: [email protected] N. Sharmila Electrical and Electronics Engineering, JSS Science and Technology University, Mysore, Karnataka, India V. Rekha Department of Computer Science and Engineering, CHRIST (Deemed to Be University), Bengaluru, India e-mail: [email protected] S. Pooja Department of Electronics and Communication Engineering, KS Institute of Technology, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_44
649
650
D. Prakyath et al.
1 Introduction An efficient energy transmission and consumption are achieved through the smart grid, which provides the desirable infrastructure by integrating the information networks and traditional power grids. Since many tasks such as price information, state measurements and actions of control are performed through the network of information and many threats such as malicious programs, information tampering, eavesdropping, and information network are the main threats to the security and smart grid stability. The fundamental architecture of smart grid and FDIA is shown in Fig. 1. In the present research work, FDI attack is defined as a cyber-physical attack with the following aspects: (1) at the physical side, bad data is constructed by the attackers to bypass the system for error detection. (2) At the cyber-side, bad data is injected into the information system. The foundation of the FDI attack is the based on the attack technologies. Attackers are used some cyber-techniques to revise meter readings and access to smart meters. The construction of the bad data on the physical side is not referred to as the physical modification of electronic circuits. The cyber-intrusion is required for the FDI attack for bypassing bad data detection and intrusion into the communication networks. The law of bad data detection [1–9]. In conventional power grid system, control centers are protected and moderately isolated. In smart grid, the infrastructures of advanced metering are widely
Fig. 1 Fundamental architecture of smart grid and fault data injection attack (FDIA)
A Comprehensive Review on Fault Data Injection in Smart Grid
651
distributed, and communication network are interconnected together. The control of the electrical terminals is not difficult with modern cyber-attack technologies [10–12]. The smart meter readings in the billing system can modify by hackers and may be controllable [13]. The FDI attack in the field of industry disturbs the balance required between supply and demand, and this leads to take wrong decisions and increases the energy distribution cost [3, 14]. The BDI attack can cause the damage to the national or local power infrastructure for the terrorists [13]. Liu and Ning [15] have presented the concern about attacks of fault data injection (FDI) in smart grid and proved how attackers use the techniques to bypass and for measurement bad data in power system and how it impacts on certain state variables. Further, several researchers have proposed many algorithms based on physics laws as mentioned below in details. The FDI attack is crucial in both the cyber side and the physical side in the two perspectives views, and corresponding countermeasures are proposed. On the physical side, several BDI attacks in the smart grid are presented [3, 4, 7, 16–19]. The smart grid functioning is ensured through all the perspectives, which is essential and communication is secured [11, 20]. An authentication of the information security is protected, and cyber-technologies dynamic key is used [21]. However, the countermeasures discussed are in view of the physical-orientation or cyber-orientation in physical attacks in complex cyber problems. The conventional objective of countermeasures and power grid tight coupling and communication network are introduced. To detect FDI attacks, the cyber-physical fusion is the best solution in the present field.
2 Fault Data Injection Attack (FDIA) The attacks of BDI-related methods are summarized in this section. In smart grid, the attack of BDI is a combination of physical side and cyber-side which is presented in various works.
2.1 Cyber-Side The cyber-side methods are the main techniques for FDI attack foundation. To obtain the authorization, cyber-attack basic target is to perform invalid operations on network communication or smart meters. • Failure attack of device: Objective is to develop the smart meters through the response of distributed denial of service(DDoS) or denial of service (DoS) attack. The smart meter connection maximum capacity is extremely limited in home area networks (HANs) [22].
652
D. Prakyath et al.
• Password-cracking attack: It is the traditional technique for the device access. An authentication is required for smart meter reading modification. But password mechanisms in smart meters are not complex because of limited computational resources. • Authentication-identifying attack: It is the alternate method for using smart meter [22] by adopted with complex password mechanisms. The protocols used for the communication are DNP 3.0/TCP, Modbus/TCP. • Worm attack: In smart meter, it is used for function sending/receiving the data. Due to impractical and inconvenient for every terminal user manual firmware update, the service is provided by the aggregator server to connect meters directly. • (Hybrid) Attack graph: In smart grid, communication networks deployed intrusion detection system (IDS) [23]. The hybrid attack graph contains both the physical parameters and cyber-parameters as compared to the traditional attack graph.
2.2 Physical Side The cyber-attack in the communication network meter readings can be changed and send the data to control center. Attackers follow the physical model of the smart grid to construct insufficient data and bypass the detection of bad data. Based on the power grid view, a smart grid with N buses is represented in Eq. 1: p = f (i ) + g
(1)
‘ p’ is represented the fault data injection in active or reactive power flow for each bus. ‘f ’ is described the state variable and computation function, ‘g’ is identified as Gaussian noise with mean value of ‘0’ and ‘i’ is represented voltage and phase angle variable. At the physical side, the measurements are maliciously tampered by attackers and are represented in Eq. 2. p = f (i ) + g + v,
(2)
where ‘v’ is presented as vector attack.
2.2.1
Undetectable BDI
The first undetectable FDI was proposed BDI against the detection of traditional system by Liu [15]. The attack vector in DC model is constructed with the basic idea of the undetectable FDI as shown in Eq. 3. p = Fi + g + v = F(i + c) + g,
(3)
A Comprehensive Review on Fault Data Injection in Smart Grid
653
where ‘F’ is represented as the Jacobian matrix of f (i ). There exists a linear relationship between state variables and measurements in DC model. The vector of attack satisfies the v = Fc ; therefore, the true value and the injected measurements will never surpass the threshold detection.
2.2.2
FDI Network Topology
The power grid static model is based on the basic idea of undetectable FDI. The f (i) function model is the means of time-invariant. A dynamic process in smart grid process contains the switch changes status. In such situation, two types of measurements are received by the control centers: one is the status of breakers and another is the meter measurement. The meter data and breaker both statuses are modified simultaneously in the work.
2.2.3
FDI Time Stamp/Synchronization
To measure value more precisely, the smart meters are employed with the phasor measuring unit (PMU). In smart grid, the crack of the time synchronization mechanism is a new idea for BDI. The several attacks such as time stamp were proposed by Gong. The hackers get access of communication networks without access by sending the forged signal to the GPS receiver. However, this scheme of attack leads to the PMU’s basic application of system failure, including the regional distributed event location, voltage stability monitoring, and transmission line fault detection [26], is nevertheless brought about by this attack plan.
3 Simulation of Attack This section presents the attacks that combine the methods of both physical and cyber-sides. It is developed to show how this will be integrated into launch a BDI attack and smart meters.
3.1 Smart Meter Intrusion The BDI attack basis is the cyber-techniques. The objective of the cyber-attacks is to obtain an authorization to create harmful function on network communications or smart meters. Figure 2 shows the experimental setup for the vulnerability assessment to build and represent the change measurement and through cyber-techniques to smart meters bypassing the energy conservation test.
654
D. Prakyath et al.
Fig. 2 Experimental setup for the vulnerability assessment
The power system bad data detection in traditional method values measurements was residual between the real values and represents the attack vector. The communication protocol for the smart meters is used such as DNP 3.0/TCP or Modbus/TCP. In the simulation attack, we first scan the network segment all hosts and devices are found with opened Port 502 or 20,000. Further, product types of these devices are collected through communication to understand they are smart meter. To intrude into smart meters, vulnerability on plaintext transmission was explored. The traffic flow is monitored to find the smart meters critical operations, the firmware, and IP addresses are updated. If identified any authentication information, the password is seized and smart meters access is obtained. The energy conservation test was conducted bypassing and with successful intrusion. The parameters such as reactive power and active power values are used in most of the smart meters which are readable form. But current transformer (CT) ratio in some settings is writable form. Based on the reactive power and active power values, the proportion of K values changes. The proper CT ratio in the simulation for three meters is 5:1. The smart meters (I-III) active power measurements are 20, 2000, and 2020 W. The Meter I and Meter II CT ratios are 100 and 500:1 respectively. These measurement values may be reversible for Meter I and Meter II. In Meter III, sum of Meter (I) and Meter (II) is 2.02 W. The smart meter intrusion simulation shows that CT ratios of smart meters can be reversed.
A Comprehensive Review on Fault Data Injection in Smart Grid
655
3.2 IEEE 14-Bus System BDI Attack Figure 3 shows the simulation platform for IEEE 14-bus system that injected the bad data. The smart meters active power is changed by falsifying the CT ratio. The traditional bad data detection is bypassed, and it is illustrated in this case. The load value of bus5 and bus4 is 7.60 MW and 47.80 MW, respectively. The transmission line L5,4 power flow is 61.16 MW. In the case of attack, bus5 to bus4 power load tries to move 60.96 MW by hacker. The power flow of the load on bus5 and bus4 values is modified for L5, 4 to −53.56, 108.96, and 122.32 MW. These results show the conventional bad data detection system is inadequate to measure the bad data.
Fig. 3 IEEE 14-bus structure in attacking case
656
D. Prakyath et al.
4 Bad Data Injection Countermeasures In recent years, a lot of efforts have been put to defend the smart grid against FDI attacks. The discussion starts with cyber-attack detection work with the help of present information technologies. Bad data detection becomes even more critical due to the increased complexity and reliance on advanced technologies for monitoring, control, and communication.
4.1 Cyber-defense Against FD, the cyber-defense is mainly concentrated on access authentication and data transmission strict authentication is important for smart grid communication networks. To confirm the communication entity, it is required to authenticate data fed to the control center. For this type of problems, many information techniques are used such as Secure Sockets Layer (SSL), Transport Layer Security Protocol (TLS), and Hash Message Authentication Code (HMAC) [24, 25]. Dynamic key management is the best possible solution for the replay attack and traffic monitoring in smart grid. The encryption method based on dynamic secret is proposed by Liu [21]. This work uses data retransmission and package loss to increase the accuracy of fault data detection. WSN-featured defense in smart grid has given more importance to wireless sensor networks (WSNs). The physical-layered security was proposed against the BDI in WSN. The basic actions like RECEIVE, AWAKE, and SLEEP are used in normal sequence cases.
4.2 Physical Defense Fault data injection (FDI) is proposed by Liu [15] as a cybersecurity attack that targets the integrity and reliability of a smart grid. In this type of attack, malicious actors intentionally inject false or misleading fault data into the grid’s monitoring and control systems [6–9].
4.2.1
Quickest Detection
Quickest detection is the process in power grid measurements that uses the traditional detection mechanism for the detection of bad data at each sampling time and for state estimation. They detect the abrupt changes are often equipped in smart meters measurements. A defense mechanism-based CUSUM test was proposed by Huang [2] against the stealth FDI.
A Comprehensive Review on Fault Data Injection in Smart Grid
4.2.2
657
Dynamic-Based Detection Model
In the research of power system, the power flow through different buses is described as a load-flow equation. The detection of bad data and state estimation are constructed on this static model. But, in smart grid the dynamic features are concerned further in the physical model.
4.2.3
Correlation Analysis-Based Detection
To make real-time decisions, the optimal power flow (OPF) has commonly used for energy dispatch. The following a regular relationship has been used for the input measurements. Principal component analysis (PCA) was used to separate the variability power flow.
4.2.4
Distributed Detection
The detector’s tolerance was used by the attackers for a large-scale smart grid in a normal cumulative observation noise in power grid of bad data detection [3]. During the detection, the computational complexity and storage consumption are also considered as the major part. A distributed intrusion detector was proposed by Zhang [18] for deploying and developing module to perform intelligent function.
4.2.5
Cyber-Physical Fusion Defense
The power grid interactive reaction and intelligence are introduced in smart grid for bad data attacks. An alarm triggers or anomaly traffic by cyber-attacks is the invalid access to some infrastructures. Smart metre readings are violated physically during any attack during power conservation. Hence, cyber-physical fusion method is proposed for the FDI attacks as the better solutions. In the present research work, fusion strategy is proposed for cyber-physical. The basic idea in the present work is to detection on traffic flows and fusion of consistency check flows in the abnormality case. Figure 4 shows the three modules framework for the cyber-network. To detect abnormal packets and monitor, the network traffic is deployed for the physical system. The physical inconsistent measurement and alerts of information network are for fusion module that detects the attacks.
658
D. Prakyath et al.
Fig. 4 Cyber-physical fusion main framework
5 Conclusion This research study has reviewed the smart grids and their control techniques, monitoring scheme, and security issues in power communication between utility providers and consumers. Also, various cyber-attacks, malicious behaviors, and fault detection in smart grid are analyzed and the conclusion is that when more paths are connected to the smart meter, automatically more faults get added to a smart meter, and it starts acting maliciously, resulting in more inaccurate results. Machine learning techniques are used in the smart grid to improve accuracy, control cybercrime, and physical attacks as a result of integration. Future scope: Nowadays, people are using more electrical vehicles, and due to this, there is a scarcity in electrical power. As electrical devices are exponential increasing, smart grid and smart meters are adopted in all sectors. These technologies are very useful to control and monitor fault data injection in both utilities and consumer’s side.
References 1. Umashankar ML, Mallikarjunaswamy S (2023) A survey on IoT protocol in real-time applications and its architectures. In: Proceedings of the 3rd international conference on data science, machine learning and applications, pp 141–147. https://doi.org/10.1007/978-981-19-5936-3 2. Mahendra HN, Mallikarjunaswamy S (2022) An Efficient classification of hyperspectral remotely sensed data using support vector machine. Int J Electron Telecommun 68(3):609–617. https://doi.org/10.24425/ijet.2022.141280 3. Pasdar AM, Sozer Y, Husain I (2013) Detecting and locating faulty nodes in smart grids based on high frequency signal injection. IEEE Trans Smart Grid 4(2):1067–1075. https://doi.org/ 10.1109/TSG.2012.2221148 4. Shivaji R, Nataraj KR, Mallikarjunaswamy S, Rekha KR (2022) Implementation of an effective hybrid partial transmit sequence model for peak to average power ratio in MIMO OFDM system. InICDSMLA 2020. Lecture notes in electrical engineering, vol 783. Springer, Singapore. https://doi.org/10.1007/978-981-16-3690-5_129
A Comprehensive Review on Fault Data Injection in Smart Grid
659
5. Siddiqui IF, Lee SU-J, Abbas A, Bashir AK (2017) Optimizing lifespan and energy consumption by smart meters in green-cloud-based smart grids. IEEE Access 5:20934–20945. https://doi. org/10.1109/ACCESS.2017.2752242 6. Savitha AC, Jayaram MN (2022) Development of energy efficient and secure routing protocol for M2M communication. Int J Perform Eng 18(6):426–433. https://doi.org/10.23940/ijpe.22. 06.p5.426-433 7. Wang W, Chen H, Lou B, Jin N, Lou X, Yan K (2018) Data-driven intelligent maintenance planning of smart meter reparations for large-scale smart electric power grid. In: 2018 IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp 1929–1935. https:// doi.org/10.1109/SmartWorld.2018.00323 8. Venkatesh DY, Mallikarjunaiah K, Srikantaswamy M (2022) A comprehensive review of low density parity check encoder techniques. Ingénierie des Systèmes d’Information 27(1):11–20. https://doi.org/10.18280/isi.270102 9. Al-Eryani Y, Baroudi U (2019) An investigation on detecting bad data injection attack in smart grid. In: 2019 International conference on computer and information sciences (ICCIS), pp 1–4. https://doi.org/10.1109/ICCISci.2019.8716414 10. Dayananda P, Srikantaswamy M (2022) Efficient detection of faults and false data injection attacks in smart grid using a reconfigurable Kalman filter. Int J Power Electron Drive Syst (IJPEDS) 13(4):2086~2097. https://doi.org/10.11591/ijpeds.v13.i4 11. Yu Z-H, Chin W-L (2015) Blind false data injection attack using PCA approximation method in smart grid. IEEE Trans Smart Grid 6(3):1219–1226. https://doi.org/10.1109/TSG.2014.238 2714 12. Thazeen S, Mallikarjunaswamy S, Saqhib MN, Sharmila N (2022) DOA method with reduced bias and side lobe suppression. In: 2022 International conference on communication, computing and Internet of Things (IC3IoT), pp 1–6. https://doi.org/10.1109/IC3IOT53935.2022.9767996 13. Aziz IT, Jin H, Abdulqadder IH, Imran RM, Flaih FMF (2017) Enhanced PSO for network reconfiguration under different fault locations in smart grids. In: 2017 International conference on smart technologies for smart nation (SmartTechCon), pp 1250-1254. https://doi.org/10. 1109/SmartTechCon.2017.8358566 14. Mahendra HN, Mallikarjunaswamy S, Nooli CB, Hrishikesh M, Kruthik N, Vakkalanka HM (2022) Cloud based centralized smart cart and contactless billing system. In: 2022 7th international conference on communication and electronics systems (ICCES), pp. 820–826. https:// doi.org/10.1109/ICCES54183.2022.9835856 15. Goyal H, Kikuchi A (2022) Faulty feeder identification technology utilizing grid-connected converters for reduced outage zone in smart grids. In: 2022 IEEE power &energy society innovative smart grid technologies conference (ISGT), pp 1–5. https://doi.org/10.1109/ISG T50606.2022.9817553 16. Mallikarjunaswamy S, Sharmila N, Siddesh GK, Nataraj KR, Komala M (2022) A Novel architecture for cluster based false data injection attack detection and location identification in smart grid. In: Mahanta P, Kalita P, Paul A, Banerjee A (eds) Advances in thermofluids and renewable energy. Lecture notes in mechanical engineering. Springer, Singapore. https://doi. org/10.1007/978-981-16-3497-0_48 17. Klaer B, Sen Ö, van der Velde D, Hacker I, Andres M, Henze M (2020) Graph-based model of smart grid architectures. In: 2020 International conference on smart energy systems and technologies (SEST), pp 1–6. https://doi.org/10.1109/SEST48500.2020.9203113 18. Manjunath TN, Mallikarjunaswamy S (2021) An efficient hybrid reconfigurable wind gas turbine power management system using MPPT algorithm. Int J Power Electron Drive Syst (IJPEDS) 12(4):2501–2510. https://doi.org/10.11591/ijpeds.v12.i4.pp2501-2510 19. Thazeen S, Mallikarjunaswamy S, Siddesh GK, Sharmila N (2021) Conventional and subspace algorithms for mobile source detection and radiation formation. Traitement du Signal 38(1):135–145. https://doi.org/10.18280/ts.380114
660
D. Prakyath et al.
20. Mallikarjunaswamy S, Nataraj KR, Rekha KR (2014) Design of high-speed reconfigurable coprocessor for next-generation communication platform. In: Sridhar V, Sheshadri H, Padma M (eds) Emerging research in electronics, computer science and technology. Lecture notes in electrical engineering, vol 248. Springer, New Delhi. https://doi.org/10.1007/978-81-322-115 7-0_7 21. Cheng B-C, Li KS-M, Wang S-J (2012) De Bruijn graph-based communication modeling for fault tolerance in smart grids. In: 2012 IEEE Asia Pacific conference on circuits and systems, pp 623–626. https://doi.org/10.1109/APCCAS.2012.6419112 22. Khushi F, Motakabber SMA, Hamida BA, Azman AW, Bhattacharjee A (2021) Smart microgrid approach for distributed power generation of renewable energy. In: 2021 8th International conference on computer and communication engineering (ICCCE), pp 72–77. https://doi.org/ 10.1109/ICCCE50029.2021.9467240 23. Al-Abdulwahab AS, Winter KM, Winter N (2011) Reliability assessment of distribution system with innovative smart grid technology implementation. In: 2011 IEEE PES conference on innovative smart grid technologies—Middle east, pp 1-6. https://doi.org/10.1109/ISGT-Mid East.2011.6220780 24. Umashankar ML, Ramakrishna MV, Mallikarjunaswamy S (2019) Design of high speed reconfigurable deployment intelligent genetic algorithm in maximum coverage wireless sensor network. In: 2019 International conference on data science and communication (IconDSC), pp 1–6. https://doi.org/10.1109/IconDSC.2019.8816930 25. Mahendra HN, Mallikarjunaswamy S (2019) Performance analysis of different classifier for remote sensing application. Int J Eng Adv Technol (IJEAT) 9(1):7153–7158. https://doi.org/ 10.35940/ijeat.A1879.109119 26. Durgvanshi D, Singh BP, Gore MM (2016) Byzantine fault tolerance for real time price in hierarchical smart grid communication infrastructure. In: 2016 IEEE 7th power india international conference (PIICON), pp 1–6. https://doi.org/10.1109/POWERI.2016.8077386
JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural Network Mohammad Salah Uddin and Md Yeasin Munsi
Abstract Identifying plant diseases is typically difficult without the assistance of professionals. The disease disrupts the natural growth of the plant. Sometimes, the disease partially infects the plant; sometimes, it infects the whole plant. Several researchers around the world have proposed neural network-based solutions for classifying plant diseases. This research paper presents a method for identifying jute leaf disease (JuLeDI) by utilizing a four-layer-based convolutional neural network (CNN). The CNN model consists of four convolutional layers and four max-pooling layers, followed by two fully connected layers. The CNN-based model was trained using a dataset of 4740 jute leaf images collected from various jute fields in Manikganj, Bangladesh. The dataset included 1600 leaf images of yellow mosaic disease, 1540 leaf images of powdery mildew disease, and 1600 images of healthy jute leaves. The images were captured using a mobile camera. Our proposed convolutional neural network (CNN) model was able to achieve an average accuracy of 96% in classifying jute leaves into three classes: two disease classes (yellow mosaic and powdery mildew) and a healthy class. Compared to state-of-the-art methods such as GPDCNN and SVM, the proposed CNN model achieves the highest accuracy in identifying jute leaf diseases. Keywords JuLeDI · Jute leaf diseases detection · Classification of plant diseases · Classification or identification of image · Convolutional neural network
1 Introduction In Bangladesh, jute is known as golden fiber. It is also known as the main cash crop of the country. Bangladesh exports jute and jute products to the foreign market. The local market also demands jute. Jute is used for developing environmentally friendly products; as a result, the demand for jute is high. Jute is used to make ropes, M. S. Uddin (B) · M. Y. Munsi Computer Science and Engineering Department, East West University, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_45
661
662
M. S. Uddin and M. Y. Munsi
bags, mats, cloths, napkins, carpets, baskets, curtains, and much more. Some of them are listed in Fig. 1. Jute is a kind of edible leafy vegetable belonging to the genus Corchorus. The leaves of jute plants are 6–10 cm in length and 3.5–5 cm in width. The leaves are typically light green in color and have a bitter flavor. Freshly picked jute leaves are delicious and soft; older leaves, on the other hand, tend to be woodier and more fibrous, making them less suitable for consumption. Jute leaves provide a particular taste to cuisine, but they also offer nutritional value and may be used as a thickening in soups, stews, and sauces, among other things. In spite of their bitter taste, jute leaves are an excellent source of nutrients, vitamins, and minerals. A cup (87 g) of jute potherb, boiled without salt, provides 94 µg of Vitamin K, 0.4496 mg of Vitamin B6, 2.73 mg of Iron, 225 µg of Vitamin A, 28.7 mg of Vitamin C, and 0.222 mg of Copper. Jute is rich in amino acids, as shown by the presence of 0.021 g of Tryptophan, 0.113 g of Threonine, 0.152 g of Isoleucine, 0.2666 g of Leucine, 0.151 g of Lysine, and 0.0444 g of Methionine are also found in 87 g of cooked jute leaves. More information about the nutritional value of jute leaf is found in Ref. [1]. Bangladesh is the second-largest exporter of jute after India. Jute export earnings reached $116.14 crore in the financial year 2020–2021, breaking all previous records of the last 12 years [2, 3]. Jute and jute-related export earnings of Bangladesh are shown in Fig. 2. In addition to Bangladesh and India, which are the largest producer of jute in the world, there are several other countries that also produce significant amounts of jute. These countries include Nepal, China, South Sudan, Uzbekistan, Egypt, Zimbabwe, Brazil, Vietnam, and others. Jute grows well in areas with at least 1500 mm of rainfall per year and a minimum of 250 mm in March, April, and May, respectively. The optimal temperature range of between 18 and 33 °C also helps to promote rapid growth and high yields of jute fiber. Jute is typically cultivated during the monsoon
Fig. 1 Different types of jute products (generated using Google Photos)
JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural …
663
Fig. 2 Export earnings from jute and jute goods of Bangladesh [3]
season. Several leaf diseases restrict jute cultivation, including yellow mosaic disease, powdery mildew, and leaf curling [4, 5]. Powdery mildew is typically known as a fungal disease. This disease affects the leaves of the jute plant. The leaf surface is covered by a fine white powder resulting in the fall leaves, flowers, and fruits. Leaf mosaic disease, also known as yellow mosaic disease, is caused by a virus that lives in the plant’s leaves. It is also known as the viral disease of jute. This disease has a negative impact on both the quantity and quality of jute fiber. Additional information regarding jute leaf diseases is available in Refs. [4, 5]. Plant diseases can have a detrimental impact on overall agricultural yield and productivity. The economic damage caused by plant diseases is estimated at $30 to $50 billion per year [6]. Jute production is severely affected by leaf disease. Disease (leaf) limits jute cultivation [7, 8]. As a result, farmers are losing interest in the cultivation of jute. This may lead to a jute fiber crisis in the future. To minimize the damage, early detection of plant diseases is needed. Farmers sometimes fail to identify the disease properly. In agricultural practices, plant diseases pose a significant threat to the production of crops. It is crucial to diagnose the disease accurately and take prompt action to prevent its spread and minimize its impact on crops. In many cases, local agriculture agents or agencies are responsible for identifying and diagnosing plant diseases. However, the process can take several days and in some cases, even weeks, which can significantly affect the crops. The delay in the identification of the disease may lead to further spread of the disease and increased damage to the crops, ultimately resulting in significant losses for farmers. Therefore, there is a pressing need for efficient and accurate methods for the identification and diagnosis
664
M. S. Uddin and M. Y. Munsi
of plant diseases, which can help farmers take appropriate measures promptly to minimize their losses. Machine learning, a branch of artificial intelligence, utilizes statistical models and algorithms to allow computers or machines to learn from data and make decisions or predictions. In recent years, machine learning has been increasingly applied in various fields, including plant disease detection and diagnosis. One popular approach is to use machine learning algorithms, such as convolutional neural networks (CNNs), to analyze images of plant parts and identify patterns that indicate the presence of a particular disease. Another approach involves analyzing data collected from sensors or devices that measure various plant characteristics. By analyzing this data, machine learning algorithms can identify patterns that may indicate the presence of a plant disease. The use of machine learning in plant pathology has the potential to significantly improve the accuracy and efficiency of disease detection and diagnosis. Among machine learning techniques, CNNs are highly regarded for their ability to dynamically learn the information necessary for identifying images. They have been successfully employed in large-scale image classification tasks, such as handwritten digit recognition, recognition of traffic signs, digit detection, license number recognition, emotion classification, and fruit classification. In this paper, we have proposed an algorithm or model for diagnosing two common diseases of jute leaves (powdery mildew and yellow mosaic disease) based on a CNN training schema that enhances the system’s learning ability. The goal of this system is to detect these diseases effectively, and we believe it will be a valuable tool in achieving this objective. The remaining portions of the paper are organized as follows: Sect. 2 provides a review of related literature on plant disease detection, Sect. 3 presents the materials and methods in detail, Sect. 4 elaborates the experiments, obtained results, and related discussions along with comparisons, and Sect. 5 concludes the paper. An appendix section has been included, which contains supplementary information for further reference.
2 Review of Related Literature Several researchers have developed machine learning-based plant disease detection systems, some of which are presented in Refs. [9–11]. These systems use various techniques from machine learning, such as image processing, deep learning, and computer vision, to automatically identify and diagnose plant diseases. Among those, a small amount of research has been proposed to identify jute disease. A support vector machine (SVM)-based algorithm has been proposed by Reza et al. for detecting jute plant disease with an accuracy ranging from 80 to 86% [12]. Hasan et al. developed a CNN model using jute leaf images collected from various Internet sources to classify leaf diseases of jute with a 96% classification accuracy [13]. A CNNbased algorithm was presented by Zhang et al. effectively identified six common leaf diseases of cucumber [11]. Image feature extraction-based tomato leaf disease classification method has been developed by a team of researchers from the Hindustan
JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural …
665
Table 1 Brief comparison of plant disease identification studies Study
Dataset
Model
Accuracy (%)
A transfer learning-based CNN approach for identifying Soybean plant diseases [9]
Soybean leaf images
CNN
98.50
Global pooling dilated CNN for identifying Cucumber leaf images CNN the disease of cucumber leaves [11]
94.65
SVM and image processing-based methods Jute leaf images for detecting disease of the jute plant [12]
Image processing with SVM
80–86
Classifying jute disease by analyzing leaf images with convolutional neural network [13]
Jute leaf images
CNN
96.00
Convolutional neural networks for tomato plant disease identification [14]
Tomato leaf images
CNN
98.12
Leaf disease detection of various plants by applying machine learning model [15]
Various plant leaf images (PlantVillage dataset)
CNN
95.81
Convolutional neural networks for Cassava leaf disease detection [17]
Cassava leaf images (custom dataset)
CNN
85.30
Institute of Technology and Science, Chennai, India, with an average accuracy of 98.12% [14]. Jay Trivedi et al. proposed a convolutional neural network method for identifying plant leaf disease using PlantVillage dataset [15]. The dataset used in their study comprises 38 classes of plant leaf images, including both healthy and unhealthy leaves from various plants such as apples, grapes, corn, and cucumber. The PlantVillage dataset, which was utilized in this study, contains 54,305 plant leaf images [16]. The details of various methods are presented in a comparison table for ease of analysis and comparison (see Table 1).
3 Materials and Methods CNNs are able to learn and extract features from images by applying convolutional filters to identify patterns such as edges, corners, shapes, and textures. These learned features can then be used for classification or other image recognition tasks. Additionally, the use of pooling layers in CNNs can help to reduce the spatial dimensionality of the feature maps, making the model more efficient and less prone to overfitting. During the feature extraction process, the CNN model recognizes various visual features and textures present in the jute leaf images. These features and textures are important for disease identification and are extracted by applying various convolutional filters within the CNN. The complete process of building a method for
666
M. S. Uddin and M. Y. Munsi
Fig. 3 Architectural view of the system
diagnosing jute leaf disease using CNN is discussed in more detail throughout this section. The block diagram of our proposed model is illustrated in Fig. 3. The input is received by the input layer, followed by the convolutional layer, which is responsible for extracting features. The connected layer combines all the extracted features, and finally, the softmax layer is used for classification.
3.1 Description of Convolutional Neural Network We designed a convolutional neural network for classifying jute leaf diseases based on leaf images (the architectural view is demonstrated in Fig. 5). It consists of an input layer, four convolutional layers, four pooling layers, and two fully connected layers. The input image, denoted as X, has a size of W × H × 3 (width × height × channels), where 3 represents the RGB channels. An input layer before the CNN layers takes X as input and passes it on to the convolutional layers. The CNN has four convolutional layers, each of which is parameterized by a set of learnable filters Q, a bias term B, and a nonlinear activation function f . The filters are applied to the input image in a sliding window fashion, and the output of the convolutional layer is computed as: Z (m) = f Q (m) ∗ A(m−1) + B (m)
(1)
where A(m−1) is the input, Z (m) is the output, Q(m) is the filter, and B(m) is the bias term of the mth convolutional layer. The activation function f is applied element-wise to the output of the convolution operation. After each convolutional layer, a pooling layer is used to downsample the previous layer’s output. The pooling layer reduces the spatial dimensions of the input while keeping the most important features. Maxpooling, which takes the maximum value from each pooling window. The output of the last pooling layer is flattened into a one-dimensional vector and passed through two fully connected layers. The first fully connected layer has n1 neurons, followed by a rectified linear unit (ReLU) activation function. The second fully connected
JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural …
667
layer has n2 neurons, followed by a softmax activation function. Finally, the softmax layer takes the output of the second fully connected layer. It applies the softmax function to generate the probability distribution over the different classes (healthy, yellow mosaic, and powdery mildew), which is mathematically described as: ea j P(Y = j|X ) = j k=1
e ak
, j = 1, 2, . . . , J
(2)
where aj represents the input to the jth neuron in the softmax layer, and J is the total number of output classes. The softmax function outputs the probability of the input image belonging to each of the J classes. During training, we used the optimizer and implemented dropout to reduce overfitting. The model was trained by reducing a cross-entropy loss function using stochastic gradient descent (SGD) optimization algorithm. The formula for defining the cross-entropy loss Loss is: Num 1 yi log yiˆ Loss = − Num i=1
(3)
where Num represents the total sample images in the training dataset, yi is the true label for each sample i, yiˆ is the predicted probability distribution for sample i. The formula for calculating the accuracy is: Accuracy =
Number of samples that is classified correctly Number of samples in total
(4)
In the case of a classification problem with multiple classes, the accuracy can be calculated as the average of the accuracy of each class. Mathematically, if we have a dataset with J classes and Num samples, and yi is the true label of the ith sample and f (xi ) is the predicted label of the ith sample, the accuracy can be calculated as: 1 δ(yi , f (xi )) Num i=1 Num
Accuracy =
(5)
provided that if the value of yi = f (xi ), then the value of δ(yi , f (xi )) is 1 and 0 otherwise.
3.2 Description of the Dataset Our jute leaf image dataset consists of images of jute leaves collected from several fields in Manikganj District of Bangladesh with the help of jute experts. The dataset consists of jute leaf images belonging to three different classes: healthy, yellow mosaic, and powdery mildew. The data were collected using a mobile camera, and
668
M. S. Uddin and M. Y. Munsi
Fig. 4 Images of jute leaf (from left healthy, powdery mildew, and yellow mosaic)
the lighting conditions were taken into consideration during the image capture. The jute leaves were selected at random, and care was taken to avoid capturing images of the same leaf multiple times. The healthy class images consist of jute leaves that do not show any visible signs of disease or damage. The yellow mosaic class images contain jute leaves that exhibit yellow mosaic disease symptoms, characterized by yellow patches or spots on the leaves. The powdery mildew class images show jute leaves with white powdery fungal growth on the leaf surface. The dataset contains a total of 4140 leaf images, with 1400 images belonging to the healthy class, 1400 images belonging to the yellow mosaic class, and 1340 images belonging to the powdery mildew class. The images have a resolution of 256 × 256 pixels and are in JPEG format. The images were preprocessed using the ImageMagick tool [18], which was used to erase the backgrounds of each image. Some of the collected images are shown in Fig. 4. We have used augmentation to increase the size of our dataset, which reflects the accuracy and the biasness of the results. To increase the size of the data, rotation, brightness, and zoom augmentation were applied. The data increased from 4140 to 4740 after the augmentation process. The statistical summary of our dataset is presented in Table 2.
3.3 CNN Model Training The proposed CNN architecture in this experiment consists of four convolutional layers and two fully connected layers and uses the ReLU activation function for all layers (demonstrated in Fig. 5). The convolutional layers also have batch normalization and max-pooling with a pool kernel size of 2 × 2. Dropout is implemented at different levels (25% after the first three convolutional layers, 30% after the fourth convolutional layer, and 30% in the dense layer) to reduce overfitting. The model is trained for 60 epochs using the SGD optimizer, and the classification is performed using the softmax activation function. The dataset used for this experiment consisted of 4740 jute leaf images with a size of 256 × 256 and was partitioned into three sets, namely training, validation, and testing set. The data distribution among training,
JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural …
669
Fig. 5 Architecture of proposed convolutional neural network model for ıdentification of jute leaf diseases
Table 2 Dataset summary Class
Training
Validation
Testing
Total
Healthy
1280
192
128
1600
Powdery mildew
1232
184
124
1540
Yellow mosaic
1280
192
128
1600
Total
3792
568
380
4740
validation, and testing sets were 80%, 12%, and 8%, respectively. The CNN was trained using a desktop computer that was equipped with an Intel i7 10th generation processor. The processor is one of the latest offerings from Intel, known for its high performance and efficiency. In addition to the processor, the computer also had 16 GB of RAM which provides sufficient memory to support the training process. Furthermore, the computer was equipped with an 8 GB GPU, providing additional support for graphics-intensive applications and allowing the model to take advantage of parallel processing capabilities for faster training. The training process itself took
670
M. S. Uddin and M. Y. Munsi
approximately 45 min to complete, which is relatively fast compared to other models that may require several hours or even days to train.
4 Results and Discussions During the training process of convolutional neural network, the training and validation accuracy was calculated at each epoch to evaluate the model’s performance. Additionally, training and validation losses were also observed to detect any overfitting or underfitting behavior of the model. This helps to determine the appropriate number of epochs required for training the model and make necessary adjustments to prevent overfitting or underfitting. The accuracy and loss for both training and validation sets were calculated using Eqs. 5 and 3, respectively, during each epoch. These values were graphically presented in Figs. 6 and 7, respectively. In order to evaluate the performance of our CNN model, we employed widely used evaluation metrics, including accuracy, precision, recall, and F1-score. These metrics are standard measures for assessing the performance of CNN models. The detailed performance assessment results of our classification model are presented in Table 3. In Table 3, HL represents healthy leaves, PM and YM represent powdery mildew, and yellow mosaic, respectively. The appendix section provides a detailed calculation of the performance metrics. Fig. 6 Training and validation accuracies comparison graph over multiple epochs during the training process of our CNN model. Accuracy is computed by using Eq. 5
JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural …
671
Fig. 7 Comparison of training loss and validation loss during the training process of our CNN model. Eq. 3 is used for computing the loss
Table 3 Performance assessment results of the CNN Accuracy
(%)
HL (Healthy leaf) (%)
PM (Powdery mildew) (%)
YM (Yellow mosaic) (%)
Avg. precision
95.70
Precision
94.80
95.60
96.80
Avg. recall
96.10
Recall
96.10
95.20
97.10
Avg. F1-Score
95.90
F1-Score
95.40
95.40
96.90
4.1 Discussions The proposed CNN model was trained and evaluated on an image dataset of 4740 jute leaf images, with 1600 healthy leaf images, 1600 yellow mosaic images, and 1540 powdery mildew images. The model achieved an overall accuracy of 96% on the test set, with precision, recall, and F1-score of 95.57%, 96.10%, and 95.90%, respectively. The confusion matrix of the model is shown in Fig. 8. The proposed CNN model achieved a high accuracy of 96% on the test set, demonstrating its effectiveness in detecting jute leaf diseases. The precision, recall, and F1-score were also high, indicating that the model is able to accurately identify each class of jute leaf disease. The confusion matrix indicates that the model faces minimal challenges in distinguishing between the yellow mosaic and powdery mildew classes and the healthy class. This observation implies that there might exist some similarity in the visual characteristics of these classes, which can be studied further in the future research. To improve the model’s accuracy, future studies may concentrate on refining the model architecture, experimenting with diverse data augmentation techniques, and investigating the visual features that help to distinguish healthy leaves from those affected by yellow mosaic and powdery mildew.
672
M. S. Uddin and M. Y. Munsi
Fig. 8 Confusion matrix of our trained CNN model with three distinct classes of jute leaves
4.2 Comparisons Table 4 presents a comprehensive comparison between our proposed CNN model and other existing models for the classification of jute leaf diseases. Table 4 showcases the superiority of our model by comparing the recall, precision, accuracy, and F1score, which suggests that our model outperforms other existing models for jute leaf disease detection. The proposed CNN model outperforms all other existing models in terms of F1score, precision, accuracy, and recall. The SVM with HOG features achieved an accuracy of 83.28%, which is significantly lower than the proposed CNN model. The global polling dilated CNN (GPDCNN) model achieved higher accuracy than the SVM model, but lower than the proposed CNN model. It is important to note that the dataset used in each study may differ in terms of size, quality, and variety of jute leaf diseases, which can affect the performance of the models. Nevertheless, the proposed CNN model shows promising results and has the potential to be a useful tool/method for jute leaf disease identification. Table 4 Comparative analysis of our proposed CNN model with other existing models for identifying jute leaf diseases Model
Accuracy (%)
Precision (%)
Recall (%)
F1-score (%)
SVM with HOG features [12]
83.28
82.37
83.28
82.58
GPDCNN [11]
93.12
93.01
93.12
92.98
Proposed CNN model
96.00
95.70
96.10
95.90
JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural …
673
5 Conclusion Early detection of plant diseases is a critical aspect of crop management and gardening. Timely identification of a plant disease can help to prevent significant damage to crops and reduce the spread of diseases. It also helps to reduce the need for more intensive and costly treatment methods. There are several approaches for automated or computer vision-based plant disease detection and recognition; however, this study area remains underdeveloped. Moreover, there are currently no commercial solutions available on the market that deals with the recognition of plant diseases based on leaf images. This paper explored a CNN method to classify and identify jute plant diseases from leaf images. The CNN model has four convolution layers, max-pooling layers, and two fully connected layers. The leaf images were collected from various jute fileds located in Bangladesh. The proposed model was able to distinguish between healthy and faulty leaves of the jute. Several experiments were carried out in order to evaluate the performance of the newly developed model. Our CNN model performed better than all other existing methods. This indicates that the CNN model is more effective in accurately identifying and diagnosing plant leaf diseases. Our CNN model has the potential to significantly impact the agriculture industry by providing a more accurate and efficient means of diagnosing plant leaf diseases. Acknowledgements Several individuals, farmers, and villagers helped us during the data collection process. Data collection was difficult without their cooperation. We sincerely thank them.
Appendix Leaf diseases of Jute Jute is an important fiber crop that is grown for the production of burlap, hessian, and twine. However, like any plant, jute is susceptible to a variety of diseases that can significantly reduce crop yield and quality. Here are some common jute diseases and their remedies: Yellow mosaic is a viral disease that affects jute plants, as well as other plants in the family Tiliaceae, such as mango and linden. It is caused by the jute yellow mosaic virus (JYMV), which is transmitted by aphids. The symptoms of yellow mosaic of jute include yellowing and mottling of the leaves, stunted growth, and reduced yield. The leaves may also develop necrotic lesions and become curled or distorted. There is no cure for yellow mosaic of jute once a plant is infected. The best way to prevent the disease is to control the aphid vectors that transmit the virus. This can be done through the use of chemical insecticides or by using natural predators such as ladybugs, lacewings, and parasitic wasps. In addition, avoiding planting jute in areas where the disease has been previously reported and practicing good sanitation
674
M. S. Uddin and M. Y. Munsi
by removing and destroying infected plants can help to prevent the spread of the disease. Powdery mildew is typically known as a fungal disease that affects the leaves and stems of jute plants. It is characterized by a white or gray powdery growth on the plant’s surface. The fungus that causes powdery mildew is called Oidium juli. Powdery mildew is a common disease of jute, particularly in humid, warm conditions. It can reduce the photosynthetic capacity of the plant and lead to reduced growth and yield. In severe cases, it can cause the death of the plant. To prevent and control powdery mildew of jute, it is important to follow good cultural practices such as proper watering, fertilization, and crop rotation. In addition, it may be necessary to use chemical fungicides to control the disease. Fungicides containing the active ingredients propiconazole or mancozeb are effective at controlling powdery mildew of jute. Computation of Performance Evaluation Metrics To evaluate the performance of our CNN model for jute leaf disease identification, we used several assessment metrics such as precision, accuracy, F1-score, and recall. By calculating these metrics for each class, we can better understand how well our model performs in identifying each class. The mathematical expressions describing these metrics are presented below: Accuracy: Accuracy measures the overall accuracy of the model, taking into account both true positive and true negative predictions. The formula for the accuracy of three class classifications is: Accuracy =
TPHL + TPYM + TPPM TotalSamples
(6)
where TotalSamples represents the total number of sample images. The true positive of Healthy Leaf, Yellow Mosaic, and Powdery Mildew classes are denoted by TPHL , TPYM , and TPPM , respectively. Precision: The precision metric is used to measure the proportion of true positive predictions made by the model compared to the total number of positive predictions. For a three-class classification problem, precision is calculated for each class separately using the following formulas: PrecisionHealthy Leaf =
TPHL TPHL + FPYM + FPPM
(7)
PrecisionPowdery Mildew =
TPPM TPPM + FPYM + FPHL
(8)
PrecisionYellow Mosaic =
TPYM TPYM + FPHL + FPPM
(9)
where FPHL , FPPM , and FPYM denote the false positive of Healthy Leaf, Powdery Mildew, and Yellow Mosaic, respectively.
JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural …
675
Recall: The recall metric evaluates the percentage of true positive predictions among all positive observations in the dataset. It can be calculated using the following formula: Recalli =
TPi TPi + FNi
(10)
where the value of, i = {Healthy Leaf, Yellow Mosaic, Powdery Mildew}. The FNi − denotes the false negative of ith class. F1-Score: The F1-score, which combines precision and recall into a single metric, is often used as a summary statistic for evaluating a model. It is calculated using the following formula: Recalli =
2 × PrecisionIndex × RecallIndex PrecisionIndex + RecallIndex
(11)
where the value of, Index = {Healthy Leaf, Yellow Mosaic, Powdery Mildew}. For instance, PrecisionHealthy Leaf —represents the precision value of healthy leaf class. In the context of multi-class classification with the classes “Healthy leaf,” “Powdery Mildew,” and “Yellow Mosaic,” the following terms have the following meanings: 1. True positive (TP): A sample that is correctly classified as positive for a particular class. For example, if a leaf sample is infected with powdery mildew and the model correctly predicts that it belongs to the “Powdery Mildew” class, then this is a true positive for the “Powdery Mildew” class. 2. True negative (TN): A sample that is correctly classified as negative for a particular class. For example, if a healthy leaf sample is classified as “Healthy leaf” by the model, then this is a true negative for the “Powdery Mildew” and “Yellow Mosaic” classes. 3. False positive (FP): A sample that is incorrectly classified as positive for a particular class. For example, if a healthy leaf sample is misclassified as “Powdery Mildew” by the model, then this is a false positive for the “Powdery Mildew” class. 4. False negative (FN): A sample that is incorrectly classified as negative for a particular class. For example, if a leaf sample infected with powdery mildew is misclassified as “Healthy leaf” by the model, then this is a false negative for the “Powdery Mildew” class. These metrics are important for evaluating the performance of a multi-class classification model, as they measure the model’s ability to accurately identify samples from each individual class. Healthy Leaf Class From the confusion matrix, it was found that out of the total 128 samples, 123 samples were correctly classified as belonging to the healthy class. • Accuracy = (123/128) ∗ 100% = 96.00% • Precision = 123/(123 + 2 + 3) = 0.948
676
M. S. Uddin and M. Y. Munsi
• Recall = 123/(123 + 0 + 5) = 0.961 • F1-score = 2 ∗ (0.948 ∗ 0.961)/(0.948 + 0.961) = 0.954 Eqs. 6, 7, 10, and 11 are used for calculating precision, accuracy, recall, and F1-score, respectively. Powdery Mildew Class According to the confusion matrix, in the powdery mildew class, the total number of samples is 124; among them 119 samples are correctly identified as powdery mildew. • • • •
Accuracy = (119/124) ∗ 100% = 96.96% = 96.00% Precision = 119/(1 + 119 + 4) = 0.956 Recall = 119/(2 + 119 + 3) = 0.952 F1 - score = 2 ∗ (0.956 ∗ 0.952)/(0.956 + 0.952) = 0.954
Eqs. 6, 8, 10, and 11 are used for calculating precision, accuracy, recall, and F1-score, respectively. Yellow Mosaic Class According to the confusion matrix, in the yellow mosaic class, the total number of samples is 128; among them 123 samples are correctly identified as powdery mildew. • • • •
Accuracy = (123/128) ∗ 100% = 96.00% Precision = 124/(1 + 3 + 124) = 0.968 Recall = 124/(3 + 1 + 124) = 0.971 F1-score = 2 ∗ (0.968 ∗ 0.971)/(0.968 + 0.971) = 0.969
Eqs. 6, 9, 10, and 11 are used for calculating precision, accuracy, recall, and F1-score, respectively.
References 1. Choudhary SB, Sharma HK, Karmakar PG, Kumar AA, Saha AR, Hazra P, Mahapatra BS (2013) Nutritional profile of cultivated and wild jute (‘corchorus’) species. Aust J Crop Sci 7(13):1973–1982 2. The-Business-Standard. Jute exports rise record 31% in fy21. https://www.tbsnews.net/eco nomy/jute-exports-rise-record-31-fy21-291742 3. Export Promotion Bureau, B. Export statistics reports. http://www.epb.gov.bd 4. De RK. Jute diseases: diagnosis and management. https://crijaf.icar.gov.in/pdf/tb_04_2019.pdf 5. Biswas S (2021) Diseases and pests of fibre crops: ıdentification, treatment and management. Taylor & Francis Group 6. Sastry KS (2013) Introduction to plant virus and viroid diseases in the tropics. In: Plant virus and viroid diseases in the tropics. Springer, pp 1–10 7. Ghosh R, Palit P, Paul S, Ghosh SK, Roy A (2012) Detection of corchorus golden mosaic virus associated with yellow mosaic disease of jute (corchorus capsularis). Ind J Virol 23(1):70–74 8. Das S, Khokon MAR, Haque MM, Ashrafuzzaman M (2001) Jute leaf mosaic and its effects on jute production. Pak J Biol Sci (Pakistan) 9. Wallelign S, Polceanu M, Buche C (2018) Soybean plant disease identification using convolutional neural network. In: The thirty-first international flairs conference 10. Wu Q, Chen Y, Meng J (2020) Dcgan-based data augmentation for tomato leaf disease identification. IEEE Access 8:98716–98728
JuLeDI: Jute Leaf Disease Identification Using Convolutional Neural …
677
11. Zhang S, Zhang S, Zhang C, Wang X, Shi Y (2019) Cucumber leaf disease identification with global pooling dilated convolutional neural network. Comput Electron Agric 162:422–430 12. Reza ZN, Nuzhat F, Mahsa NA, Ali MH (2016) Detecting jute plant disease using image processing and machine learning. In: 2016 3rd ınternational conference on electrical engineering and ınformation communication technology (ICEEICT), pp 1–6 13. Hasan MZ, Ahamed MS, Rakshit A, Hasan KMZ (2019) Recognition of jute diseases by leaf image classification using convolutional neural network. In: 2019 10th International conference on computing, communication and networking technologies (ICCCNT), pp 1–5 14. Ashok S, Kishore G, Rajesh V, Suchitra S, Sophia SG, Pavithra B (2020) Tomato leaf disease detection using deep learning techniques. In: 2020 5th International conference on communication and electronics systems (ICCES). IEEE, , pp 979–983 15. Trivedi J, Shamnani Y, Gajjar R (2020) Plant leaf disease detection using machine learning. In: International conference on emerging technology trends in electronics communication and networking. Springer, pp 267–276 16. Hughes D, Salathe M et al (2015) An open access repository of images on plant health to enable ´the development of mobile disease diagnostics. arXiv:1511.08060 17. Surya R, Gautama E (2020) Cassava leaf disease detection using convolutional neural networks. In: 2020 6th International conference on science in ınformation technology (ICSITech), pp 97–102 18. Taylor D (2014) Work the shell: framing images with imagemagick. Linux J 2014(238):7
Utilizing Deep Learning Methodology to Classify Diabetic Retinopathy Vivek Kumar Prasad, Ved Nimavat, Kaushha Trivedi, and Madhuri Bhavsar
Abstract A complication of diabetes is a disease called diabetic retinopathy (DR). Diabetic retinopathy is one of the most serious eye diseases and can cause the loss of vision in people suffering from diabetes. It is especially dangerous because it frequently goes unnoticed and, if not caught in time, can result in severe damage or even the loss of eyesight. There have been many advancements in computer science and image processing that are effective in detecting DR by classifying retinal images from patients. Such a method typically relies on huge and carefully described dataframes. Hence, we propose a comparative analysis of the few of the different approaches for classifying and detecting DR through CNN. Keywords Diabetic retinopathy · Classification · Image processing · Deep learning · CNN
1 Introduction Diabetic retinopathy (DR), the most common and deceptive microvascular complication of diabetes, can grow asymptomatically until an abrupt loss of vision occurs. Diabetic retinopathy is currently one of the most dangerous causes of blinding eye illnesses, and it has become the leading cause of blindness in persons aged 20 to 74 worldwide. Globally, the number of individuals with diabetes-related diabetic retinopathy and visual impairment is increasing as the frequency of the disease V. K. Prasad (B) · V. Nimavat · K. Trivedi · M. Bhavsar Nirma University, Ahmedabad, Gujarat, India e-mail: [email protected] V. Nimavat e-mail: [email protected] K. Trivedi e-mail: [email protected] M. Bhavsar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_46
679
680
V. K. Prasad et al.
Fig. 1 Image shows how diabetic retinopathy is different from the normal retina [4]
rises [1]. In the event of a diabetic person, it is regarded as a serious vision-threatening problem. Diabetes can affect your eyes; it harms the blood vessels in the fragile tissue at the rear of your retina, which can result in blindness. Medical experts currently use the fundus photographs of their patients to make the majority of their clinical diagnoses. As automated technology advances, additional methods established using deep learning and machine learning are employed to identify diabetic retinopathy that performs well [2]. By 2025, there will be almost 592 million diabetic retinopathy victims worldwide, which is presently 382 million. Hence, due to these complications, there are numerous advantages to building an automated system for detecting diabetic retinopathy. Early diabetic retinopathy diagnosis is crucial for successful treatment because later stages of the disease are more challenging to treat and can result in blindness [3]. Diabetic retinopathy detection is extremely important, and it takes experts to operate the machinery and produce reliable results. The goal is to increase and optimize the identification of diabetic retinopathy using current breakthroughs in the technology involved, for instance, machine learning, deep learning, and artificial intelligence. Figure 1 shows how the image of diabetic retinopathy is different from the normal retina. Hence, these differences must be identified by the artificial approaches.
2 Diabetic Retinopathy Diabetic retinopathy could affect persons with any kind of diabetes, which includes type 1, type 2, and diabetes mellitus (a case of diabetes that will occur due to pregnancy). The older you suffer with diabetes, the higher the risk. Over the time, diabetic retinopathy will affect over half of those who have the disease. The great news is that by managing your diabetes, you can reduce the chances of developing diabetic retinopathy [5].
Utilizing Deep Learning Methodology to Classify Diabetic Retinopathy
681
2.1 What is Diabetic Retinopathy? The two primary causes of diabetic retinopathy (DR), a disorder that damages the blood vessels in the eyes’ retina and can eventually result in blindness, are high blood sugar and high blood pressure. The impact caused by diabetes is serious and can lead to blindness. No matter what kind of diabetes a person has, it can nevertheless have an impact on them. The stages and severity of DR can be divided into five classes/stages in accordance with international protocol—0 to 4: no retinopathy (0), mild non-proliferative DR (NPDR) (1), moderate NPDR (2), severe NPDR (3), and proliferative DR (4). We can divide diabetic retinopathy into five different classes/stages: Stage 0: No Diabetic Retinopathy. In this class, the patient is not infected by the disease and represents a healthy eye retina. • Stage 1: Mild non-proliferative diabetic retinopathy. It is the very beginning of DR. This stage is characterized by microaneurysms, which are microscopic inflammations in the eyes’ blood vessels. It is conceivable at this point for a tiny amount of fluid to get into the retina and cause the macula to enlarge. • Stage 2: Moderate diabetic retinopathy. Due to swollen blood vessels and blockages in the pathways for blood to reach the retina, nutrition cannot reach the retina at this stage. Blood and other fluids are accumulated in the macula via this mechanism. • Stage 3: Severe Diabetic Retinopathy. The number of blood vessels that are blocked at this point has increased, greatly decreasing blood flow. New blood vessels are currently developing in the retina. • Stage 4: Proliferate diabetic retinopathy. Because the retina has grown to contain a substantial number of tiny blood vessels, this stage is considered to be the most dangerous. Currently, fluid leakage and anomalies of the eyesight, such as blurriness, a reduced field of vision, and even blindness, are all possible results.
2.2 Importance of Diabetic Retinopathy Diabetic retinopathy is a severe disease that can cause blindness. Diabetic people, victims of high blood sugar, can be affected by the fluid that gets accumulated in the lens which is located inside our eyes that controls our focusing, and it can change the curvature of our lens, thus affecting our vision. There are frequently no symptoms in the early stages of diabetic retinopathy. Some people have visual changes that make it difficult for them to read or perceive faraway things. These adjustments could come and go [6]. If moderate DR is not treated early on, it eventually turns into proliferative DR. All diabetic patients are at risk of developing DR, which affects about one-third of those with the disease. Diabetic patients do not understand DR well enough or have
682
V. K. Prasad et al.
the requisite knowledge. Although the level of knowledge varies greatly by location, nation, and how long a person has had diabetes, it is generally low on a worldwide scale. Additionally, visual symptoms are minimal (or perhaps nonexistent) in the early stages of the start, which is a significant barrier to consciousness. Accurately assessing diabetic retinopathy requires time from medical experts and may be challenging for beginning ophthalmology residents. The best option to prevent complete loss of vision is to identify diabetic lesions as soon as possible [7]. Therefore, it is very important to determine whether a diabetic patient is affected by this disease because in case if it is untreated, then it can be a threat to the patient’s sight. Through evidence-based treatment, the risk of blindness can be significantly decreased; medical findings indicate that a risk reduction of over 90% is feasible [8].
2.3 Introduction to Deep Learning A form of machine learning known as “deep learning” (DL) uses ranked layers of nonlinear processing steps to identify patterns and learn features that are unsupervised. Categorization, segmentation, identification, extraction, and image registration are just a few of the applications of deep learning used in the analysis of medical images. Deep learning may even be categorized as a subcategory of machine learning. Deep learning is a domain that is made on self-learning via the analysis of computer algorithms. While machine learning relies on more simple concepts, deep learning makes use of artificial neural networks (ANN), that are created to mimic how people think, learn, and reason. Similar to how the neurons in the human brain are arranged in layers, so too are neural networks. Each layer’s nodes are linked to those in the following layer. The depth of the network is determined by the number of layers present. One neuron in the human brain gets several signals from surrounding neurons. Signals are sent in between nodes, and weights are assigned in an artificial neural network. The influence of a node with more weight. A node with a higher weight will have a greater impact on the nodes in the layer below it. The output is produced by combining the weighted inputs in the last layer. Because deep learning algorithms handle a lot of data and carry out numerous intricate mathematical computations, they demand strong hardware. Deep learning (DL) allows the development of computer models with several processing layers which can learn how to represent the data at different levels of abstraction. A deep learning neural network designed for analyzing organized arrays of data, such as images, is known as a convolutional neural network or CNN. This feedforward ANN is used most frequently in computer vision and machine learning, two fields that focus on visual imagery. Fully connected layers (FC) calculate class scores, pooling layers execute downsampling, rectified number of linear unit (ReLU) layers apply element-wise activation function, and convolutional layers do convolution. These four layers make up the architecture of CNN. Recently, improved deep learning approaches, notably region-based CNN (RCNN), have been used to successfully perform a number of tasks in machine learning,
Utilizing Deep Learning Methodology to Classify Diabetic Retinopathy
683
analytics, and object recognition, including object identification, picture classification, image segmentation, etc. CNNs are more often utilized in the analysis of medical images than other techniques, and it is highly effective. Convolution layers (CONV), pooling layers, and fully linked layers make up the three basic levels of CNN architecture (FC). Depending on the author’s vision, the CNN’s size, number of filters, and layer count can change. In the CNN design, each layer has a distinct function. To extract an image’s features for the CONV layers, several filters convolve the image. In order to decrease the size of feature maps, the pooling layer often comes after the convolution layer. The FC layers are one of the most useful features to describe the full input picture [9]. On ImageNet datasets, several pretrained CNN architectures, including AlexNet, Inceptionv3, and ResNet, are accessible. While some research constructs their own CNN from start for classification, some studies transfer learning from these pretrained networks to accelerate training [10].
2.4 Primary Challenges of DR • Learning without supervision: The deep learning model is one of the machine learning models that consumes the most data. To perform at their peak and give us the caliber of service we need from them, they need massive amounts of data. Additionally, despite the fact that we may have a lot of data on a certain topic, this data is typically unlabeled, which prevents us from using it to train any supervised learning systems [11]. • Managing data: Data is dynamic; it changes as a result of numerous factors, including time, place, and numerous other circumstances. Machine learning models, which can also include deep learning models, are built using a predefined dataset (referred to as the training set). They are efficient in performing predictions when the data used for prediction is drawn from the same distribution as the data used to create the system [12]. • Implementing logic: It includes a few rule-based knowledge, allowing for the formalization of knowledge through the application of sequential reasoning and logical processes. Despite the fact that these scenarios can be coded for, machine learning algorithms typically do not include sets or rules as part of their knowledge. • Fewer data and greater efficiency are required: Deep learning owes its success to the capability of adding multiple layers to models, which helps to test a vast number of linear and nonlinear parameter combinations. However, this process of adding layers increases the complexity of the model and the time required to learn and process data, which necessitates a substantial amount of data for proper training and functioning of the model.
684
V. K. Prasad et al.
3 Methodology A CNN is a deep neural network that is widely used for image processing. Its main function is to automatically extract significant features from images through a sequence of convolutional and pooling layers. These extracted features can then be used for the purpose of object identification or classification within the image. CNNs are highly efficient for image processing tasks due to their ability to handle large amounts of complex data, as well as identify patterns and features that may be difficult for humans to discern. Furthermore, CNNs have the capacity to learn and adjust to new types of images, which makes them a valuable resource for a wide range of applications such as medical imaging and self-driving cars.
3.1 Improved Diabetic Retinopathy Detection Through Deep Learning Deep learning has become increasingly popular recently, showing promise in a variety of applications, most notably in bioinformatics, computer vision, and clinical image analysis. Through producing advancements in applications for screening, detecting, segregating, forecasting, and categorization across numerous healthcare sectors, including those pertaining to the abdominal, heart, pathology, and retina, additionally, DL algorithms have made a significant beneficial difference. Deep learning approaches, algorithms, and methodologies have the ability to provide useful systems that perform segmentation, prediction, and classification as well as serve as the foundation for decision support software that improves the performance of crucial DR diagnosis activities. In general, gathering the dataset and doing the necessary preprocessing to enhance and improve the images are the first steps in the procedure used to detect and classify DR images using DL. The DL approach is then used to extract the features and categorize the images from this data. In recent years, DR detection and classification have made extensive use of DL. Even with the integration of numerous heterogeneous sources, it may still learn the characteristics of the supplied data. There are various DL-based techniques, including convolutional neural networks (CNNs), autoencoders, restricted Boltzmann machines (RBMs), and sparse coding. Contrary to machine learning approaches, these methods perform better as the amount of training data increases since the number of learned features increases. Additionally, deep learning techniques did not need manually created feature extraction. Comparatively speaking, CNN-based techniques have outperformed other deep learning algorithms.
Utilizing Deep Learning Methodology to Classify Diabetic Retinopathy
685
3.2 Proposed Approach As we know, deep learning is used to improve the detection of diabetic retinopathy. To see and compare different scenarios, we implemented a CNN model which classifies different image datasets into five different classes: No_DR stands for class 0, mild stands for class 1, moderate stands for class 2, severe stands for class 3, and proliferate_DR stands for class 4. After reading the dataset, we found out that our dataset is imbalanced: for class 0, we have a count of 1805, for class 1, we have 370, then for class 2, we have a count of 999, for class 3, a count of 193, and finally, for class 4, we have a count of 295. If we consider whether diabetic retinopathy will impact this person or not, no DR will have a 50% chance of happening, compared to a 50% chance for the other 4 classes taken together. Thus, to have an accurate model, our first task is to handle this imbalanced dataset. For that, we have used different methods such as oversampling, undersampling, and data augmentation using ImageDataGenerator with different parameters. We will begin with the first method that is being used: ImageDataGenerator So, ImageDataGenerator is a module used for data augmentation which is a technique to handle imbalanced datasets. Thus, for this first scenario, we used the ImageDataGenerator with the parameter: “rescale = 1./255”; this simply divides each pixel value by 255; after setting up our train_batches, validation_batches, and test_batches, we build our CNN model using different layers such as Conv2D, MaxPooling2D, and Dense, and we fit our batches in this model. For this scenario, we got an accuracy of 75.09%. The second method we opted for is ImageDataGenerator using the horizontal_flip parameter. For this scenario, we used many other parameters such as rotation_range will randomly rotate the images by the range mentioned, zoom_range is randomly used to scale the image by the range mentioned, width_shift_range and height_shift_range will randomly translate the image by the range mentioned, horizontal_flip will randomly flip the images horizontally. After setting up our train_batches, validation_batches, and test_batches, we build our CNN model using different layers such as Dense, and we fit our batches in this model. For this scenario, we got an accuracy of 49.27%. For the third technique, we used ImageDataGenerator using the vertical_flip parameter. We used many other parameters such as rotation_range will randomly rotate the images by the range mentioned, zoom_range is used to randomly scale the image by the range mentioned, width_shift_range and height_shift_range will randomly translate the image by the range mentioned, vertical_flip will randomly flip the images vertically. After setting up our train_batches, validation_batches, and test_batches, we build our CNN model using different layers such as dense, and we fit our batches in this model. For this scenario, we got an accuracy of 49.27%. For this last data augmentation method, we used ImageDataGenerator using different parameters like rescale which will basically divide each pixel value by 255, zoom_range is used to randomly scale the image by the range mentioned, width_shift_range and height_shift_range will randomly translate the image by
686
V. K. Prasad et al.
the range mentioned. After setting up our train_batches, validation_batches, and test_batches, we build our CNN model using different layers such as dense, and we fit our batches in this model. For this scenario, we got an accuracy of 92.54%. After exploring the data augmentation technique, we explored oversampling; now oversampling is used to handle imbalanced datasets, and it basically duplicates the dataset. Here, we have a maximum count of 1805; thus, we will be increasing the count of the other classes and matching it to the No_DR class count. After resampling the dataset, the class count is now equal to 1805; we build our CNN model using different layers such as Conv2D, MaxPooling2D, and Dense, and we fit our batches in this model. For this scenario, we got an accuracy of 84.87%. After exploring oversampling technique, we explored undersampling; undersampling, which essentially merges and deletes the dataset, is used to address an unbalanced dataset. Here, we have a minimum count of 193; thus, we will be decreasing the count of the other classes and matching it to the severe class count. After resampling, the dataset of all the class count is now equal to 193; we build our CNN model using different layers such as Conv2D, MaxPooling2D, and dense, and we fit our batches in this model. For this scenario, we got an accuracy of 65.46%. After analyzing these different scenarios, we obtained the best accuracy using ImageDataGenerator with different parameters and dense layers.
4 Results This section first mentions the dataset and then discusses the implementation of the CNN approach to accurately identify the type of the diabetic retinopathy and classifies the images as this could belong to the 0—no_DR, 1—mild, 2—moderate, 3—severe, 4—proliferate_DR.
4.1 Dataset Description We obtained the dataset for our project from the Kaggle Code Competition called APTOS 2019 Blindness Detection. To identify and categorize diabetic retinopathy, we used the Gaussian filtered retina scan images from this dataset. We then reduced the size of these images to 224 by 224 pixels so that they could be used with various pretrained deep learning models. All of the images are sorted into their respective folders based on the severity/stage of diabetic retinopathy, using the train.csv file that was provided. The folders are named 0—no_DR, 1—mild, 2—moderate, 3—severe, and 4—proliferate_DR, and they contain images of the respective stages. Sample images of each stage can be seen in Figs. 2, 3, 4, 5 and 6.
Utilizing Deep Learning Methodology to Classify Diabetic Retinopathy Fig. 2 Sample image that represents about the 0—no_DR
Fig. 3 Sample image that represents about the 1—mild
687
688 Fig. 4 Sample image that represents about the 2—moderate
Fig. 5 Sample image that represents about the 3—severe
V. K. Prasad et al.
Utilizing Deep Learning Methodology to Classify Diabetic Retinopathy
689
Fig. 6 Sample image that represents about the 4—proliferate
4.2 Result Discussions Diabetic retinopathy is a medical condition that impacts the eyes of people with diabetes and can cause vision loss if not diagnosed and treated early. Convolutional neural networks (CNN), a type of deep learning technique, have shown potential in detecting diabetic retinopathy from retinal images. To use CNNs for diabetic retinopathy identification, a general approach involves collecting a large dataset of retinal images of patients with diabetic retinopathy and preprocessing the images to ensure they are of the same quality, size, and format. A CNN architecture can be designed, with popular pretrained models such as VGG, ResNet, or Inception being fine-tuned for the specific task. Here, the dataset which has been used is explained in the previous section, i.e., 4.1. The model is then trained with preprocessed data using a sufficient batch size and epochs, with the training process being monitored for overfitting. The performance of the trained model is evaluated on a separate test set, with metrics such as accuracy to assess its effectiveness. Once the model is deemed effective, it can be deployed in a clinical setting to identify diabetic retinopathy from retinal images. It is important to note that the specific details of each step may vary based on the dataset and model architecture used, and medical professionals should be involved in validating the results and interpreting the model outputs. To identify diabetic retinopathy using CNN, the following steps can be taken: Collect a dataset of retinal images of patients with diabetic retinopathy, which can be split into a training set, a validation set, and a test set.
690
V. K. Prasad et al.
Table 1 Parameters used for the experiment for the diabetic retinopathy Values Parameters Training ratio Batch size Epochs Learning rate
85:15 18 50 1e.−5
Preprocessing Preprocess the images in the dataset to ensure that they are of the same size, quality, and format by applying a preprocessing function . f to . X , denoted by .Y = f (X ) (Table 1). CNN Architecture Design a CNN architecture for diabetic retinopathy identification with model parameters .θ, denoted by . M(Y, θ ). Training Train the model with preprocessed data .Ytrain and corresponding labels .Ytrainlabels , optimizing the loss function . L with .θ ∗ = argmin θ L(M(Ytrain , θ ), Ytrainlabels ). Validation Validate the model’s performance on the validation set .Yval and corresponding labels ∗ .Yvallabels , denoted by . L val = L(M(Yval , θ ), Yvallabels ). Testing Test the model’s performance on the test set .Ytest and corresponding labels .Ytestlabels , denoted by . L test = L(M(Ytest , θ ∗ ), Ytestlabels ). Deploy the trained model in a clinical setting to identify diabetic retinopathy from retinal images. • Approach 1: ImageDataGenerator using rescale parameter Accuracy equals 75.09% • Approach 2: ImageDataGenerator using horizontal_flip parameter Accuracy equals 49.27%
Fig. 7 Accuracy achieved after implementing the CNN algorithm with different preprocessing techniques
Utilizing Deep Learning Methodology to Classify Diabetic Retinopathy
691
• Approach 3: ImageDataGenerator using vertical_flip parameter Accuracy equals 49.27% • Approach 4: ImageDataGenerator using different parameters Accuracy equals 92.54% • Approach 5: Oversampling Accuracy equals 84.87% • Approach 6: Undersampling Accuracy equals 65.46%. After imposing the CNN algorithms to classify correctly the image classification class, and by using various preprocessing types, it results into the following accuracy values. The same has been represented in Fig. 7, which indicates the visual representations of the results, i.e., accuracy achieved after implementing the CNN algorithm with different preprocessing techniques.
5 Conclusion Diabetes is indeed an untreatable condition that has become a worldwide problem. Timely disease identification and preventative efforts to lessen its consequences are the only ways to deal with this issue. In this study, automated techniques are recommended for detecting diabetes from imaging, and CNN model is the technique recommended for detecting from the imaging of diabetic retinopathy. The various stages of the diabetic retinopathy have been taken into the consideration with respect to the various levels of retinopathy. These level-based images are used to detect the early or late detection of the retinopathy. This is useful technique to identify the retinopathy at any stage, and once the stage is identified, this can be reduced or any other methods can be used to reduce this. Hence, the deep learning CNN has been used to solve the problem statement. The accuracy of the CNN has been found to be accurate and performing well with the accuracy of 93%.
References 1. Anant KA, Ghorpade T, Jethani V (2017) Diabetic retinopathy detection through image mining for type 2 diabetes. In: 2017 International conference on computer communication and informatics (ICCCI). IEEE, pp 1–5 2. Gautam AS, Jana SK, Dutta MP (2019) Automated Diagnosis of Diabetic Retinopathy using image processing for non-invasive biomedical application. In: 2019 International conference on intelligent computing and control systems (ICCS). IEEE, pp 809–812 3. Valarmathi S, Vijayabhanu R (2021) A survey on diabetic retinopathy disease detection and classification using deep learning techniques. In: 2021 Seventh international conference on bio signals, images, and instrumentation (ICBSII). IEEE, pp 1–4 4. https://neoretina.com/blog/diabetic-retinopathy-can-it-be-reversed/. Last accessed 10 Jan 2023 5. Mewada A, Gujaran R, Prasad VK, Chudasama V, Shah A, Bhavsar M (2020) Establishing trust in the cloud using machine learning methods. In: Proceedings of first international conference
692
6.
7. 8. 9. 10.
11.
12.
V. K. Prasad et al. on computing, communications, and cyber-security (IC4S 2019). Springer, Singapore, pp 791– 805 Zhang X, Saaddine JB, Chou C-F, Cotch MF, Cheng YJ, Geiss LS, Gregg EW, Albright AL, Klein BEK, Klein R (2010) Prevalence of diabetic retinopathy in the United States, 2005–2008. JAMA 304(6):649–656 Singer DE, Nathan DM, Fogel HA, Schachat AP (1992) Screening for diabetic retinopathy. Ann Internal Med 116(8):660–671 Jenkins AJ, Joglekar MV, Hardikar AA, Keech AC, O’Neal DN, Januszewski Andrzej S (2015) Biomarkers in diabetic retinopathy. Rev Diabetic Stud: RDS 12(1–2):159 Prasad VK, Tanwar S, Bhavsar MD (2021) Advance cloud data analytics for 5G enabled IoT. In: Blockchain for 5G-enabled IoT. Springer, Cham, pp 159–180 Lee AY, Yanagihara RT, Lee CS, Blazes M, Jung HC, Chee YE, Gencarella MD et al (2021) Multicenter, head-to-head, real-world validation study of seven automated artificial intelligence diabetic retinopathy screening systems. Diabetes Care 44(5):1168–1175 Zhang Y, Shi J, Peng Y, Zhao Z, Zheng Q, Wang Z, Liu K et al (2020) Artificial intelligenceenabled screening for diabetic retinopathy: a real-world, multicenter and prospective study. BMJ Open Diabetes Res Care 8(1):e001596 Rahhal D, Alhamouri R, Albataineh I, Duwairi R (2022) Detection and classification of diabetic retinopathy using artificial intelligence algorithms. In: 2022 13th International conference on information and communication systems (ICICS). IEEE, pp 15–21
Mapping the Literature of Artificial Intelligence in Medical Education: A Scientometric Analysis Fairuz Iqbal Maulana, Muhammad Yasır Zaın, Dian Lestari, Agung Purnomo, and Puput Dani Prasetyo Adi
Abstract This study aims to investigate the use of artificial intelligence in health education in the last ten years from 2012 to 2022 using the Scopus database. Researchers use bibliometric analysis combined with the quantification method of Vosviewer and Rstudio software used for literature analysis. The results of the article that were flushed showed data such as the year of publication, journal, country, keywords, and authors, to the highest number of citations. The general keywords used by researchers are artificial intelligence, medical, and education. Researchers limit findings through keywords as in the last ten years and English only so that 1681 articles were obtained. According to the study’s findings, McGill University affiliation had the greatest number of papers, with 69 pieces, and medicine had a proportion of 39.7% (n = 1067). This bibliometric study will be useful for other researchers to examine the development of research on artificial intelligence in health education in the last ten years. Keywords Artificial intelligence · Medical · Education · Bibliometric · Review F. I. Maulana (B) Computer Science Department, Bina Nusantara University, Jakarta 11480, Indonesia e-mail: [email protected] M. Y. Zaın Informatics Engineering Study Program, University of Madura, Pamekasan, Indonesia e-mail: [email protected] D. Lestari Faculty of Medicine, Universitas Airlangga, Surabaya 60132, Indonesia e-mail: [email protected] A. Purnomo Entrepreneurship Department, BINUS Business School Undergraduate Program, Bina Nusantara University Jakarta, Jakarta 11480, Indonesia e-mail: [email protected] P. D. P. Adi Telecommunication Research Center, National Research and Innovation Agency, Jakarta 11480, Indonesia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_47
693
694
F. I. Maulana et al.
1 Introduction Artificial intelligence (AI) is intellect portrayed by a computer to mimic the behavior of the object being imitated, and all of this is used to aid human labor in a variety of disciplines. Artificial intelligence (AI) has been a quickly increasing phenomenon over the last decade, primed to cause large-scale changes in medical. However, medical schooling has lagged behind AI’s rapid developments. Simply stated, AI models can be used to discover trends in massive amounts of data in order to make highly accurate forecasts for a variety of tasks. AI implementation has been developed and used in various fields of science, such as libraries [1], smart buildings [2], climate change [3], industry [4], e-learning [5], to medical education [6]. The complexity and improvement of data quality in health sciences show that the potential implementation of artificial intelligence (AI) will be increasingly applied [7]. In recent years, the integration of artificial intelligence (AI) in medical imaging has drastically from around 100–150 articles per year in 2007–2008 to 700–800 articles per year in 2016–2017, such as for radiological needs [8]. Research using the bibliometric method is used to determine the extent of implementing artificial intelligence (AI) in nursing management through scientific articles [9]. Even the extent to which the profile of artificial intelligence, applications, and trends in nursing education has been studied using PRISMA [10]. Other studies aim to identify the gaps and main themes in the peer-review literature on artificial intelligence training in undergraduate medical education using the Medline and Embase databases [6]. Five core things are discussed in this study: the need for the curriculum, recommendations for learning content, suggestions for delivering curriculum or learning, emphasis on ethical culture, and any challenges and opportunities from AI implementation in undergraduate medical education. Research conducted for this paper uses the bibliometric method to map the trend of artificial intelligence implementation in general health education over the past ten years through the article’s publication. Researchers use the Scopus database and limit the year of publication of articles from 2012 to 2022. In this study, researchers want to know the impact of how far the development of artificial intelligence in the world of health through scientific articles in the last ten years, who are the writers who have contributed to this field, the keyword what are the most widely used, which publications have the highest situation, and other research opportunities.
2 Method In this research, researcher uses data from the Scopus library to conduct a bibliometric mapping analysis. For our bibliometric mapping research, researcher followed Aria and Coccurullo’s [11] suggested process for mapping science. On the flip side,
Mapping the Literature of Artificial Intelligence in Medical Education …
695
researcher presents the content analysis in accordance with the methods described in [12–14].
2.1 Article Selection Process Research focuses on mapping bibliometrics based on information content from keywords sought through the Scopus database. The article selection process for this research is through three stages [15, 16], namely (1) search of literature and data collection; (2) extraction, loading, and data conversion; (3) data synthesis by analyzing it. This stage is illustrated in Fig. 1, which shows each process. To collect extensive data needed for this research, researchers focus on the search for articles published related to artificial intelligence in medical education in the last ten years from 2022 to 2012, indexed by Scopus. Researchers do not limit only to English, and the findings of the type of document do not restrict the researcher. Because researchers want to know any language and what kinds of documents Scopus have indexed in the keywords sought. Several general protocols for data collection are applied to the search through the Scopus database. The keyword chosen is “artificial intelligence,” “medical,” and “education” in the topic section, using the continued search function. The search results found 685 articles related to using AI in health education (Access Date: March 1, 2023). Table 1 contains information about the search protocol, how it is used in each database, and the findings received.
Fig. 1 Method used to collect data for bibliometric and text research
Table 1 Data search methods and the quantity of data collected Database
Description of protocol
Combination of search string
Scopus
Using binary operators to apply the search terms in quote to the Scopus title, abstract, and keywords fields
• TITLE-ABS-KEY (“Artificial 2279 Intelligence” AND “medical” AND “education”)
Applying additional conditions by limiting the time span to 2012–2022, and limit language to English only
• Researcher limit publication year from 2012 until 2022 • Researcher choose language “English”
Search outcome
1681
696
F. I. Maulana et al.
The next stage is extraction, loading, and data conversion. This study collects metadata from the Scopus database downloaded in RIS and CSV formats. Then, the data system is carried out by analyzing metadata being processed then using a VOSviewer and Rstudio.
2.2 Data Coding and Analysis Researchers conducted a bibliometric analysis of selected articles using VOSviewer and Rstudio to reveal the visualization of the network of the most commonly used keywords, the words used in abstracts, as well as shared quotation analysis in the article. Researchers study and discuss joint data during the content analysis process.
3 Results and Discussion The findings from the bibliometric study using Scopus data are presented here. The purpose of this bibliometric analysis is to shed light on the evolution of research into AI’s application in health education over the past decade. These findings also identify the author and country-based organization that has actively published research on the application of AI in medical education.
3.1 Findings Result from Research Publication Growth of Articles on Artificial Intelligence in Medical Education The paper growth distribution over the past decade, as determined by Database Scopus, is displayed in Fig. 2. In terms of the total number of papers published on the topic of artificial intelligence in healthcare education, 438 will be published in 2021, followed by 412 in 2022. The smallest publication volume occurs from 2012, with 21 articles publications. The following year there was an increase in the production of reports from 2016 to 2022, from 35 articles experienced a significant increase of up to 412 articles in 2022. The tendency to increase the number of articles was the possibility accompanied by the development of other fields that impacted the topic being studied.
Mapping the Literature of Artificial Intelligence in Medical Education …
697
Fig. 2 Research publication growth in the last ten years
3.2 Finding Result from Most Productive Author, Affiliation, and Countries Publishing Articles on Artificial Intelligence in Medical Education Regarding the production of authors over time, the researcher investigates the top ten authors. The results show that most top authors will publish AI articles in health education by 2022. As shown in Fig. 3, authors Del Maestro, R.F., Ledwos, N., Mirchi, N., and Yi, P. H., four authors, have the highest publication of seven published articles. The researcher also mapped the ten most productive affiliates based on the number of documents. The affiliate from McGill University had the highest number of articles,
Fig. 3 Most productive authors on artificial intelligence in medical education
698
F. I. Maulana et al.
Fig. 4 Most productive affiliations on artificial intelligence in medical education
Fig. 5 Three-field plot of active affiliation and country on artificial intelligence in medical education
namely 69 papers, followed by the Mayo Clinic with 64 articles and the University of Toronto with 61 articles. In the last ten years, the three affiliations have published more Scopus-indexed publications than any other affiliation on this research topic (Fig. 4). Our research identified some of the top universities in terms of institutions (author affiliations) and countries leading the way in artificial intelligence in medical education. Some of these universities, as shown in Fig. 5, included the University Hospital Carl Gustav Carus Dresden, the University of British Columbia, and the Huazhong University of Science and Technology.
3.3 Finding Result from Keyword Co-occurrence Pattern of Studies on Artificial Intelligence in Medical Education The keyword co-occurrence pattern focuses on understanding a scientific field’s knowledge components and knowledge structure for the interrelationships between
Mapping the Literature of Artificial Intelligence in Medical Education …
699
Fig. 6 Visualized author keyword co-occurrence analysis of papers on artificial intelligence in medical education: This is one of the field’s most frequently occurring terms
keywords in articles published in the same area. Figure 6 presents a visualization of keywords often used in artificial intelligence in medical education. The relative sizes of the nodes reveal that various other AI-related keywords, for example, “machine learning” and “medical education,” are highly associated with “education.” By visualizing using VOSviewer, researchers found 12 clusters divided into several node colors. The “machine learning” nodes and the “deep learning” nodes are part of the scientific field of “artificial intelligence.” Node “COVID-19” is a word often mentioned because when searching for keywords, this topic is still related to the COVID pandemic. Figure 7 is a data visualization of recent developments in the use of artificial intelligence for teaching in the medical field. In order to determine which problems and policies have received the most attention over the past decade, this study looked at the writers’ use of specific terms. Word clouds of the authors’ terms are used for this purpose, as they reveal insights into the authors’ research interests. The potential of AI-assisted health education initiatives is also examined.
700
F. I. Maulana et al.
Fig. 7 AI-related trends in medical education as determined by the writers’ chosen phrases in a word cloud
3.4 Finding Result from Subject Area and Most Citation on Artificial Intelligence in Medical Education From the search results for 1682 documents, Fig. 8 shows the subject area based on the topic of artificial intelligence in medical education in the last ten years. The subject area of medicine had a proportion of 39.7% (n = 1067), followed by the subject area of computer science with a proportion of 16.1% (n = 436), and engineering with 9.9% (n = 268). Other subject areas are Social Science (n = 169), Biochemistry, Genetics and Molecular Biology (n = 135), Health Professions (n = 120), Mathematics (n = 117), and others. The researcher found that in the subject area of medicine, in 2020, Liu had the highest citation, namely 1112 [17]. Whereas in the subject area of Computer Science, it was found that the writer Tjoa had the highest citation of 270 [18], and in the subject area Engineering, which is the book with the highest citation, namely 180 [19]. Of the 1682 articles processed, the researchers collected ten articles with the
Fig. 8 Subject area of AI in medical education in terms of authors keywords
Mapping the Literature of Artificial Intelligence in Medical Education …
701
Table 2 Most citation papers in the last ten years Total citation
Years
“Dermatologist-level classification of Esteva, A., et al. skin cancer with deep neural networks [20]”
6101
2017
“Online mental health services in China during the COVID-19 outbreak [17]”
Liu, S., et al.
1112
2020
“Digital pathology and artificial intelligence [21]”
Niazi, M.K.K., et al.
335
2019
“Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine [8]”
Pesapane, F., et al.
297
2018
“Human–Robot Interaction [22]”
Sheridan, T.B
290
2016
Paper
Author
“Introduction to radiomics [23]”
Mayerhoefer, M.E., et al.
284
2020
“A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI [18]”
Tjoa, E., Guan, C
270
2021
“Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home) [24]”
Kepuska, V., Bohouta, G
205
2018
“Robot-proof: Higher education in the age of artificial intelligence [19]”
Aoun, J.E
180
2017
“Medical students’ attitude toward artificial intelligence: a multicentre survey [25]”
Pinto dos Santos, D., et al.
166
2019
highest citations based on search topics for artificial intelligence in medical education in the last ten years from 2012 to 2022, which can be seen in Table 2 (retrieved: 5 March 2023).
3.5 Finding Result from Collaboration Network on Artificial Intelligence in Medical Education Fig. 9 displays the collaboration network of authors, which is divided into seven color cluster nodes. Connected nodes are a collaboration between authors with one. As in the purple node cluster with an author named Fishman, Elliot K is connected to an author named Weisberg, Edmund M. The authors are connected because they are in the same article [26].
702
F. I. Maulana et al.
Fig. 9 Collaboration network of AI in medical education
4 Conclusion This research offers a complete overview of scientific papers on artificial intelligence in medical education released in peer-reviewed journals between 2012 and 2022. The main approaches are investigated using bibliometric analysis to answer questions about the increase in article production over the last decade, the authors who are prolific and its affiliates are publishing to advance artificial intelligence in medical education, and keyword co-occurrence patterns may guide the author’s future research focus. This research presents several results. According to these results, 2021 is the most productive year, with 438 papers produced. With 69 publications, the affiliate from McGill University had the most pieces. Researchers discovered 12 groups split into several node hues by visualizing them with VOSviewer. The scientific area of “artificial intelligence” includes the “machine learning” and “deep learning” components. The field of medicine has the highest percentage, 39.7% (n = 1067). This research had constraints in terms of content analysis. It will be fascinating to see how artificial intelligence is used in education for medical courses. Nonetheless, this study makes a major contribution to knowledge. The results of this study can shed light on how AI research has evolved over the last decade. By inference, researcher concludes that while writers are utilizing AI to advance teaching and learning in the medical field, more efforts are needed, particularly from regions, countries, and institutions underreported among nations.
References 1. Borgohain DJ, Bhardwaj RK, Verma MK (2022) Mapping the literature on the application of artificial intelligence in libraries (AAIL): a scientometric analysis. Libr Hi Tech. https://doi. org/10.1108/LHT-07-2022-0331 2. Luo J (2022) A Bibliometric review on artificial intelligence for smart buildings. Sustainability 14:10230. https://doi.org/10.3390/su141610230
Mapping the Literature of Artificial Intelligence in Medical Education …
703
3. Liu ZL, Peng CH, Xiang WH, Tian DL, Deng XW, Zhao MF (2010) Application of artificial neural networks in global climate change and ecological research: an overview. Chin Sci Bull 55:3853–3863. https://doi.org/10.1007/s11434-010-4183-3 4. Johnson M, Jain R, Brennan-Tonetta P, Swartz E, Silver D, Paolini J, Mamonov S, Hill C (2021) Impact of big data and artificial intelligence on industry: developing a workforce roadmap for a data driven economy. Glob J Flex Syst Manag 22:197–217. https://doi.org/10.1007/s40171021-00272-y 5. Jia K, Wang P, Li Y, Chen Z, Jiang X, Lin C-L, Chin T (2022) Research landscape of artificial intelligence and e-learning: a bibliometric research. Front Psychol 13:1–14. https://doi.org/10. 3389/fpsyg.2022.795039 6. Lee J, Wu AS, Li D, Kulasegaram KM (2021) Artificial intelligence in undergraduate medical education: a scoping review. Acad Med 96:S62–S70. https://doi.org/10.1097/ACM.000000000 0004291 7. Davenport T, Kalakota R (2019) The potential for artificial intelligence in healthcare. Future Healthcare J 6:94–98. https://doi.org/10.7861/futurehosp.6-2-94 8. Pesapane F, Codari M, Sardanelli F (2018) Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine. Eur Radiol Exp 2. https://doi.org/10.1186/s41747-018-0061-6 9. Chang C-Y, Jen H-J, Su W-S (2022) Trends in artificial intelligence in nursing: Impacts on nursing management. J Nurs Manag 30:3644–3653. https://doi.org/10.1111/jonm.13770 10. Hwang G-J, Tang K-Y, Tu Y-F (2022) How artificial intelligence (AI) supports nursing education: profiling the roles, applications, and trends of AI in nursing education research (1993–2020). Interact Learn Environ 0:1–20. https://doi.org/10.1080/10494820.2022.2086579 11. Aria M, Cuccurullo C (2017) Bibliometrix: an R-tool for comprehensive science mapping analysis. J Inf 11:959–975. https://doi.org/10.1016/j.joi.2017.08.007 12. Maulana FI, Febriantono MA, Raharja DRB, Sofiani IR, Firdaus VAH (2021) A scientometric analysis of game technology on learning media research study in recent 10 years. In: 2021 7th ınternational conference on electrical, electronics and ınformation engineering (ICEEIE), pp 1–6. https://doi.org/10.1109/ICEEIE52663.2021.9616963 13. Purnomo A, Afia N, Prasetyo YT, Rosyidah E, Persada SF, Maulana FI (2022) Meiryani: business model on M-business: a systematic review. Procedia Comput Sci 215:955–962. https:// doi.org/10.1016/j.procs.2022.12.098 14. Maulana FI, Audia Agustina I, Gasa FM, Rahmadika S, Ramdania DR (2021) Digital art technology publication in ındonesia: a bibliometric analysis (2011–2020) technology. In: 2021 International conference on ınformation management and technology (ICIMTech), pp 375–379. https://doi.org/10.1109/ICIMTech53080.2021.9535104 15. Agbo FJ, Oyelere SS, Suhonen J, Tukiainen M (2021) Scientific production and thematic breakthroughs in smart learning environments: a bibliometric analysis. Smart Learn Environ 8:1. https://doi.org/10.1186/s40561-020-00145-4 16. Agbo FJ, Sanusi IT, Oyelere SS, Suhonen J (2021) Application of virtual reality in computer science education: a systemic review based on bibliometric and content analysis methods. Educ Sci 11. https://doi.org/10.3390/educsci11030142 17. Liu S, Yang L, Zhang C, Xiang Y-T, Liu Z, Hu S, Zhang B (2020) Online mental health services in China during the COVID-19 outbreak. Lancet Psychiatry 7:e17–e18. https://doi. org/10.1016/S2215-0366(20)30077-8 18. Tjoa E, Guan C (2021) A Survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Trans Neural Netw Learn Syst 32:4793–4813. https://doi.org/10.1109/TNNLS. 2020.3027314 19. Aoun JE (2017) Robot-proof: higher education in the age of artificial intelligence. The MIT Press 20. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologistlevel classification of skin cancer with deep neural networks. Nature 542:115–118. https://doi. org/10.1038/nature21056
704
F. I. Maulana et al.
21. Niazi MKK, Parwani AV, Gurcan MN (2019) Digital pathology and artificial intelligence. Lancet Oncol 20:e253–e261. https://doi.org/10.1016/S1470-2045(19)30154-8 22. Sheridan TB (2016) Human-robot interaction. Hum Factors 58:525–532. https://doi.org/10. 1177/0018720816644364 23. Mayerhoefer ME, Materka A, Langs G, Häggström I, Szczypi´nski P, Gibbs P, Cook G (2020) Introduction to radiomics. J Nucl Med 61:488–495. https://doi.org/10.2967/JNUMED.118. 222893 24. Kepuska V, Bohouta G (2018) Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home). In: SC, HN S (eds) 2018 IEEE 8th annual computing and communication workshop and conference, CCWC 2018. Institute of Electrical and Electronics Engineers Inc., pp 99–103. https://doi.org/10.1109/CCWC.2018. 8301638 25. Pinto dos Santos D, Giese D, Brodehl S, Chon SH, Staab W, Kleinert R, Maintz, D, Baeßler B (2019) Medical students’ attitude towards artificial intelligence: a multicentre survey. Eur Radiol 29:1640–1646. https://doi.org/10.1007/s00330-018-5601-1 26. Fishman EK, Weisberg EM, Chu LC, Rowe SP (2020) Mapping your career in the era of artificial intelligence: it’s up to you not google. J Am Coll Radiol 17:1537–1538. https://doi. org/10.1016/j.jacr.2020.03.035
Development of a Blockchain-Based On-Demand Lightweight Commodity Delivery System Bayezid Al Hossain Onee, Kaniz Fatema Antora, Omar Sharif Rajme, and Nafees Mansoor
Abstract The COVID-19 pandemic has caused a surge in the use of online delivery services, which rely on user-generated content to promote collaborative consumption. Although Online Food Delivery (OFD) is a popular delivery system in Bangladesh, it has yet to ensure item authenticity, especially with the increasing demand for lightweight commodity delivery services across the country. The authenticity of products, involvement of multiple parties, and fair exchange are all challenging aspects of coast-to-coast services. Therefore, it is necessary for the three entities involved in the supply chain transaction—seller, carrier, and buyer—to establish at least two peer-to-peer operations to ensure reliability and efficiency. To address these limitations and meet consumer expectations, the study proposes a framework for a nationwide on-demand marketplace for lightweight commodity items and a delivery system. Furthermore, transaction details are stored in a blockchain to ensure the transparency and reliability of the proposed system. Keywords Blockchain applications · Supply chain · Lightweight commodity items · Ethereum · On-demand delivery system
B. A. H. Onee · K. F. Antora · O. S. Rajme · N. Mansoor (B) University of Liberal Arts Bangladesh (ULAB), 688 Beribadh Road, Dhaka 1207, Bangladesh e-mail: [email protected] B. A. H. Onee e-mail: [email protected] K. F. Antora e-mail: [email protected] O. S. Rajme e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_48
705
706
B. A. H. Onee et al.
1 Introduction The development of technology is bringing about significant changes across various fields, including marketplaces. However, while Software Change Management (SCM) plays a critical role in an organization’s overall performance, it has not kept pace with technological advancements. Consequently, the current supply chain faces challenges in meeting people’s demands for product quality, authenticity, fair pricing, and relevant information. Moreover, companies with extended global supply chains incur between 80 and 90% of their costs in supply chain management [1], which can lead to product price hikes and a compromise on product quality [2]. The involvement of more intermediaries in SCM increases system complexity and results in longer delivery durations and higher costs. Each intermediary in the supply chain seeks to profit, leading to the injection of adulteration into raw materials for additional profit without the manufacturer’s permission [3]. To address these issues, this study proposes an on-demand lightweight commodity item marketplace and a delivery system. The traditional delivery system has several limitations, such as being costly and time-consuming, limited accessibility in rural areas, lack of proper tracking, and environmental impact due to carbon emissions. Additionally, the system may need to be more flexible to accommodate last-minute changes or custom customer requests. It may offer limited delivery options, such as same-day or weekend delivery. Lastly, the traditional delivery system may need proof of the goods’ condition during transit. The proposed lightweight commodity delivery system is cost-effective for less fuel, maintenance, and operational costs than traditional delivery systems. These systems are quicker to transport goods, more flexible to quickly adopt changes in delivery locations, more convenient for offering more flexible delivery options, and can deliver packages directly to a customer’s doorstep. Most importantly, the lightweight commodity delivery system provides more efficiency, lower emissions, and less environmental impact. Crowd-sourcing-based carrier hiring is also being considered to ensure the smooth execution of the platform. This involves hiring registered users as carriers to deliver products from the origin of the requested product to the consumer’s destination. This not only ensures timely delivery but also creates job opportunities for individuals traveling across the country, especially in developing countries where unemployment rates are high. The proposed system guarantees the privacy of user data through a decentralized database, numerous security protocols, and algorithms that authenticate and maintain the unmodified nature of user data. Moreover, the user-friendly design of the platform makes it easily accessible through both smartphone apps and websites, ensuring a hassle-free experience for customers. Additionally, the proposed system ensures a fair exchange of products and enhances the stability of the supply chain. By ensuring the authenticity of products and transparency in the delivery process, the platform can boost the confidence of customers, thereby increasing demand for authentic products. This, in turn, can promote the economic development of developing countries and
Development of a Blockchain-Based On-Demand Lightweight …
707
ensure the stable, authentic production of products in those countries. The objective of this study is to provide • A blockchain-based online lightweight commodity delivery system where customers can order fresh foods. • A cost-effective business model with a higher security protocol that works on both Android and web portals. • An online delivery system without third-party interference. The rest of this article is structured as follows: Sect. 2 provides a comprehensive examination of the relevant literature pertaining to the existing on-demand delivery systems. In Sect. 3, the architecture of the proposed system is discussed. Next, the proposed system is explained in Sect. 4, whereas the design and development of the proposed system are presented in Sect. 5. Finally, the article is concluded in Sect. 6 along with the future works.
2 Existing Systems In Bangladesh as well as in other countries, there exist quite a few on-demand delivery systems. Some of the popular systems are discussed in this section highlighting both their strengths and limitations. Foodpanda is the most familiar food delivery system across 50 countries worldwide, headquartered in Berlin, Germany [4]. Its main job is to enable users to place orders at nearby restaurants with the help of its website or mobile app. There are some existing delivery services if we consider the delivery system of solid products such as electronic gadgets, resources of complete products, and crafting products. They allow the consumer to deliver products across the country, such as Sundarban Courier Service and S.A. Paribahan LTD. Their main target is to deliver solid products or any digestive product. Although they deliver those types of products, their service could be better, which is not usable for daily purposes. Pathao Food and Shohoz Food are two Bangladeshi companies offering food delivery services, with Pathao launching its service in January 2018 in Dhaka and Chittagong, and Shohoz Food starting in October 2018 [5]. Pathao Food is a Bangladeshi food delivery system that operates similarly to Foodpanda but does not have a restaurant agreement. Shohoz Food is another Bangladeshi food delivery system that began operations in October 2018 [6]. ShipChain is a logistic utility ecosystem that leverages blockchain technology to provide an integrated system for logistics companies, allowing for benefits such as trustless contract execution, historical data immutability, and no single point of failure [7]. Triwer is a Norwegian company aiming to eliminate inefficiencies in cargo delivery by introducing a crowd delivery marketplace and recording intermediate transactions, parcel tracking, and pricing on their custom side-chain while running the delivery smart contracts on the Ethereum blockchain. Triwer plans to charge a commission of 5–15% on all delivery revenue [8].
708
B. A. H. Onee et al.
The Israeli PAKET Project aims to establish a decentralized delivery network that is accessible to all, transparent, and does not charge any commission. The project is developing a protocol that enables physical parcel deliveries among network participants, regardless of their location. The PAKET protocol ensures the safe delivery of parcels through collateral deposits, relays, and storage in hubs [8]. The project partners offer a blockchain-secured parcel service model that complements the logistics system. This model ensures reliable tracking of package delivery using a smart contract triggered by an off-chain event. The delivery agent serves as a reliable “oracle” to confirm delivery to the buyer or self-service package station, and a multisignature procedure adds an extra layer of fraud protection. The parcel service also records the sender’s ID and ensures that personal data is not stored on the blockchain but is cryptographically secured. In case of privacy constraints, the seller’s identity can be entirely hidden from the logistics operator [9]. The company, ShopUp, operates in Bangladesh and facilitates connections between producers and wholesalers. ShopUp addresses issues such as product unavailability, lack of transparent pricing, and inefficient delivery systems that hinder small entrepreneurs in their daily business operations. The company has developed a B2B platform to enable rapid connections between producers, wholesalers, retailers, and consumers. ShopUp has successfully migrated a significant portion of the market to digital platforms and additionally provides services such as last-mile logistics, digital credit, and business management solutions [10]. All of these systems are involved in the delivery system. However, each of their working principles differs in some sectors, but mainly they work in the same way. Some serve in the local area, such as Pathao Food, Shohoz Food, and Foodpanda. They can only deliver products in a particular city, which creates a problem of product availability across the country. Other issuers take charge of registering in their system, such as Foodpanda, which charges $100–150 per restaurant as a registration fee. Then Foodpanda charges a 15–20% commission on every food order the restaurant receives [11]. Also, there is a delivery charge depending on the distance. According to some research, the total daily delivery was at 25,000 orders per day in 2019 on average [12]. There is an estimate that the overall market size for food delivery is $10 million, and it could grow to over $5 billion by 2025 [13]. Also, another major issue with delivery services, such as Sundarban Courier Services which serve across the country delivery system, needs a real-time tracking service. The consumer needs help to track their product in real time. Consumers need to depend on their agent or ask their customer service center every day about their product. They need to find out where the product comes from, the manufacturing element of their buying product, who maintains the product cycle, etc. After analyzing the problems, we found a big issue: each product needs an authorized document from the particular producer of each product. So that a consumer buys a product that depends on only vendor guarantee, which is valueless. Moreover, the product is terrible for the consumers and can be adulterated or cannot be long-lasting because of artificial manufacturing products.
Development of a Blockchain-Based On-Demand Lightweight …
709
3 System Architecture The conceptual framework and architecture of the proposed system are presented herein: This section elaborates on the design elements of the developed product delivery system and the specific intentions behind its design (Fig. 4). The system comprises various technologies, including a private blockchain server, a decentralized database for authenticated users, a mobile and web application for supply chain management, and a monitoring system for tracking product carriers. The application also incorporates a vendor verification status for each product. The system relies on three distinct use cases: consumer, carrier, and producer, with varying activities for each role. Consumers place orders through either the mobile application or web platform. The decentralized architecture spreads tasks among multiple nodes. As a result, it decreases the risk of execution failure at a single node. This strategy ensures that no single node is solely responsible for the entire system, hence permitting a high level of reliability. Consumers have the ability to set an anticipated delivery time, ensuring flexibility and transparency in each delivery cycle. Once an order is placed, the system processes it, adds it to the request list, and notifies nearby carriers. This order request is visible to both carriers and vendors. If a carrier receives a request and can meet the expected delivery time, they accept the request, purchase the product from the vendor, and deliver it. Any registered user traveling from the product’s origin to the order’s destination can assume the role of a carrier. Vendors authorize specific requests to guarantee product authenticity during the purchase process. Subsequently, carriers proceed to the consumer’s location. Consumers can track carriers and verify product authenticity through the application. Upon delivery, carriers receive reviews and ratings from consumers. Completed order information is securely stored in the blockchain as encrypted hash data, preventing tampering or modification. The system’s delivery fees depend on the distance between consumers and carriers, calculated by a backend algorithm. The following sections provide an in-depth explanation of the individual technologies and their functions.
3.1 System Components The system and design have been precisely crafted to include blockchain technology and authentication, taking extensive study into account. The developed product delivery system consists of several precise technologies, including a private blockchain server that functions as a decentralized database for authenticated users, an application compatible with both mobile and web platforms for supply chain management, and a monitoring system that permits the tracking of the location of the product carrier. In addition, the application has a status for vendor verification for each product. There are three different use cases for the system: consumer, carrier,
710
B. A. H. Onee et al.
and producer. Each user participates in unique activities based on their assigned position. A blockchain server is responsible for storing digital evidence and managing transactions; a smart contract system validates and generates digital contracts with hash data; a trusted directory service oversees certificates and facilitates their verification; a real-time server facilitates robust communication between the database and system; and a user-friendly application manages the system. In this part, the components of the framework are outlined, along with an explanation of digital evidence, data management certifications, and data management conditions.
3.1.1
Blockchain Server
In the aforementioned text, it was stated that there exist multiple data storage methods such as MySQL, NoSQL, and SQLite. After significant study and deliberation, it was determined that blockchain is the appropriate solution for data gathering and storage because of its capacity to guarantee data security, manage decentralized databases, and give data transparency. Notably, a centralized database is highly dependent on the network connection and hence subject to failure. Distributed ledger process is shown in Fig. 1. The use of a specialized blockchain server ensures the security and high availability of data via a decentralized database stored on a private network, notwithstanding the restricted potential of data backup. In addition, the validity of the product is assured since the product’s provider authenticates it.
Fig. 1 Distributed ledger process
Development of a Blockchain-Based On-Demand Lightweight …
3.1.2
711
Smart Contract
The suggested method combines on-demand commodity delivery with blockchain technology to improve transparency, security, and efficiency. Smart contracts authenticate goods, track their whereabouts, and permit automatic payments, thereby minimizing the need for intermediaries and the likelihood of fraud. The immutable blockchain record permits movement tracking, enhanced transparency, and theft prevention. Decentralization increases security and reduces the likelihood of hacking. Transactions that are expedited and cost-effective make distribution more accessible for small businesses and consumers. Smart contracts organize information logically, allowing for automatic order classification and precise delivery. Upon delivery completion, they generate digital bills containing addresses, dates, and costs, which are recorded on the blockchain, producing a transparent and unalterable transaction record.
3.2 Procedure Using decentralization, Proof-of-Stake consensus, cryptography, smart contract architecture, and immutable records, the system ensures data protection. These characteristics prevent hackers from altering data and validating transactions. Smart contracts prevent unauthorized access, hence decreasing fraud, errors, and criminal behavior. A permanent, tamper-proof record of transactions is preserved to ensure accountability and transparency. The combination of decentralization, Proofof-Stake consensus, cryptographic methods, smart contract design, and record immutability strengthens data security overall. The full blockchain server procedure is as follows: 1. A new account will be established for the user in a particular node according to their function (consumer, carrier, or producer), and they will be granted authorization to act as a sealer upon registration (for consumers and producers). 2. Upon receiving a new order, the carrier will send the transaction to Node2. 3. Once the product for a particular order has been delivered to the carrier, the authorized producer will finalize the transaction. 4. Upon delivery of the merchandise, the carrier will submit the transaction of delivery to the blockchain server. 5. The customer will finalize the transaction after getting the merchandise. This is the sequence in which blocks will be added to the blockchain. Block creation in blockchain is shown in Fig. 2.
712
B. A. H. Onee et al.
Fig. 2 Block creation in blockchain
3.3 Environment The application is built using Flutter for Android and iOS, with a Node.js server API to interact with the application, blockchain, and Firestore for storing data. The application facilitates communication between vendors, carriers, and consumers, with a focus on ensuring product authenticity throughout the product life cycle. Carrier tracking is done using GPS technology to monitor products and transportation, with the aim of reducing the risk of loss and enhancing security during transportation. The system relies on high-quality data, with a combination of real-time positioning and easy-to-use reporting applications to provide organizations with the confidence they need. NoSQL is used for storing general data in Firestore due to its low latency, large data volume, and flexible data models. A blockchain transaction is created for each server request. Ethereum transaction with Node.js is shown in Fig. 3.
4 Proposed System The system has developed on different frameworks and languages for both websites and Android. Flutter is used for developing mobile applications and Laravel 7 framework for web applications. Flutter is based on dart programming language, and Laravel 7 framework follows Model–View–Controller (MVC) architecture. Frontend has developed on VueJS and Bootstrap, whereas the backend has developed on Node.js that controls all data flow by building an API on Node.js. Node.js backend server is more stable and secure than others and is responsible for communicating with the fronted web, Android app, blockchain, and MongoDB to store data. NoSQL has been used for storing the general data in Firestore for low latency, large data volume, and flexible data models. Four user perspectives: users, guest regular users, registered carriers, and admin are available in the system. Users are registered regular users who can create an account in the system providing necessary information, can order products, can track
Development of a Blockchain-Based On-Demand Lightweight …
Fig. 3 Ethereum transaction with Node.js
Fig. 4 Workflow diagram
713
714
B. A. H. Onee et al.
their order via app or web page, view previous order history, edit their profile, can give a rating to the producers and carriers, and many more. A regular user can contact the carrier about the product and shipping status. Producers or sellers are guest users who can only see orders. The website provides customers with features such as browsing and searching for products, adding products to their cart or wish list, and checking out by filling in billing details. Customers can also view their order history and track real-time order status through the user panel. Carriers can see available orders based on their location. The system has two types of administrators: admin and super admin. Both can control the system through the admin panel, which includes features such as adding coupons, updating the items list, and editing user information. Super admins have additional privileges such as adding and removing admin users. Admins can view order placement quantity, order status, and item availability list. They can also add, edit, or remove products from the system. The system verifies users’ information before approving transactions and sends a confirmation email to customers for each order placed. Carriers can only accept orders within a 10 km radius and manage their delivery summary, profile, and order status. An admin is responsible for maintaining and managing the system. The system uses Restful API and stores sensitive information in a blockchain to prevent tampering. Ratings and reviews undergo cross-checking to ensure their authenticity. Orders have an expiry time, and shipping costs are automatically calculated based on distance and total units ordered. A customer can access many features landing on the website’s homepage such as products from the products page; however, customers can search for products manually. Individual product selection will redirect to specific pages. The system makes product information such as product price, rating, details, and location visible to consumers who can directly add products to their cart and update product quantity. Consumers can also keep products on their WishList. At the time of checkout, consumers need to fill up billing details along with the necessary information. Consumers can check order collection dates and duration, price, items, and relevant information after placing an order. Additionally, consumers can see previous order history. The user panel is operational on both the website and the mobile apps with the advantage of real-time order tracking of product orders. Users can update their profile info, change passwords, etc. In the proposed system, carriers see all the available orders they can accept based on their nearest location. The system has two different types: administrator and super admin. Admin panel has only access to the website. Both admins can control the entire system through this panel. The view of the system’s dashboard, add coupons, update the items list, change system settings, and more features are available for the admin panel. Admins can edit user information from this page if necessary. Only the super admin has the advantage of adding new admin users, removing admin users, and many more. The admin can see order placement quantity, order status, and item availability list. The order status could be the shipment period, carriers’ order received period, and relevant information. Admin can also change order details as per requirement. Admin of the proposed system can add new items and edit existing product details. Besides,
Development of a Blockchain-Based On-Demand Lightweight …
715
they can remove products from the system if items are not offered in-store. Workflow diagram is shown in Fig. 4. The transactions part of the system include initiation, validation via a consensus method, and activation of the smart contract to autonomously conduct the delivery procedure. Throughout the distribution process, a system coupled with the blockchain network is used for real-time tracking. At delivery completion, the smart contract automatically facilitates payment, and the invoice is updated within the blockchain, creating an immutable record of the transaction. This solution provides a secure and open way to transaction management. The system will use an API to communicate with the server. The process of sending and receiving data will use Restful API. Blockchain technology is used to store sensitive information to prevent tampering with personal user data in the system. The rating and reviews of sellers and carriers would have to go through a cross-check facility before confirming to ensure whether they are legitimate or not. Every order has a customer-defined expiry time. As a result, the customer needs to reorder the product if no one accepts the order within the time period. In this situation, the shipping cost will be automatically generated based on the distance and total unit of the ordered product.
5 Design and Development of the Proposed System This section details the design and development of our proposed system, beginning with its conception and ending with its actualization. Technique and technology employed guarantee a comprehensive understanding of the architecture and performance capabilities of the system.
5.1 Use-Case Diagram The use-case diagram (Fig. 5) shows the interaction between users and the proposed system. It has three types of entities: admin, consumer, and carrier. Log In and Log Out functionalities are compulsory for the system; however, other functionalities are distinct. The carrier and consumer users have to create their accounts whether the super admin only creates an admin account. Use-case diagram is shown in Fig. 5. The admin is responsible for managing the whole system and can manage their profiles. Consumers can access all seven functionalities: Profile Management, Login, Log Out, Create Account, Order, Contract, and Search Items, and logging out of System Management. On the other hand, carriers can access the same functionalities except for the search for items.
716
B. A. H. Onee et al.
Fig. 5 Use-case diagram
˙ 5.2 User Interface The system has developed on different frameworks and languages for the mobile platform. Flutter has been used to build mobile, web, and desktop applications on a single codebase. Node.js has been used for making the API that controls the whole system’s data flow. The backend server of Node.js is responsible for communicating with the frontend web and Android app. On the other hand, the runtime environment of Node.js helps to interact with the blockchain through applications. Cloud-based NoSQL document database “Firestore” is used to store data for its low latency, large data volume, and flexible data models.
Development of a Blockchain-Based On-Demand Lightweight …
(a)
(b)
717
(c)
The developed system is cross-platform and works smoothly on both mobile and web applications. The following images represent some user interface designs of the mobile application of the proposed system. Figure (a) is the Homepage of the customer account from where users can access many actions. Consumers can search and view all products from the product page. The search result will redirect them to the search result page, and an individual item selection will redirect to the selected product details page. The individual product item page contains the product image, price, rating, details, and location. Consumers can either buy the product directly or can directly add the product’s quantity to their cart Fig. (b). Consumers can save items in their cart for buying later. Additionally, they can update the item’s quantity and add products to save for later. The checkout page requires billing details and necessary information to complete the checkout Fig. (c).
718
B. A. H. Onee et al.
(d)
(e)
(f)
The order status page Fig. (d) is the previous order history of a consumer, such as pickup dates, prices, and product items. Real-time order tracking gives customers the advantage of tracking their order items in real time through mobile apps Fig. (e). The Request List Fig. (f) page is only accessible from the carrier user end. Carriers can see all the available orders they can accept based on their location.
6 Performance Analysis A customer can place multiple products at the same time from different places through the system. The system can give errors for the mentioned situation unless the exception could not be handled properly. As a result, the system divides a single order, counts every single item in a separate order considering the location of the product, and shows the order request to the carrier based on the product shop location. The current location of a carrier cannot exit the radius of about 10.5 km to accept the order request. Hence, carriers must update their current location. The proposed system calls a Google Map Controller to reduce the issue of updating the location every 30 s. Besides that, multiple carriers might get the same delivery request based on their location. In this situation, the system’s implemented process handles these types of conflict requests and gives those orders to the carrier based on the carrier rating and service overview record. This product has some limitations with the features as this is a prototype. As a result, not all the features have been implemented yet except some of the basics. The system’s analysis result shows that order tracking and status maintenance is a vital task. Data transfer from the app or web app to the main server is a very crucial part of the system because phone numbers’ improper handling can mess up
Development of a Blockchain-Based On-Demand Lightweight …
719
the whole delivery chain and orders. The system uses a contact number that helps track orders to solve this circumstance. Interacting with the blockchain is crucial. A tiny data misconfiguration can shut down the blockchain server. Every data flow and authenticity of data origin is mandatory to re-check to avoid this situation.
7 Future Work and Conclusion The study aims to build a transparent online platform to order and deliver an authentic product across Bangladesh using blockchain, other security protocols, and algorithms. The reason for using blockchain, security protocols, and algorithms is to ensure the transparency and security of the system. The current estimated food delivery market size is 10 million USD which could grow to over 14 million by the year 2025 [13]. The authentic product delivery market demand is high. The system is now at the development level; however, the system’s blockchain server is going to be deployed in the public Ethereum network. The addition of some high-level algorithms and security protocols will make the system more robust and secure. A user must need to buy licenses to use the system. The Firestore storage plan will additionally help to develop the required settings and new attributes in the future development. The reward option will help users and carriers to get a reward point based on their service. Users can use the reward point to get some discounts on products. In the current food delivery system, it is hard to track whether customers are getting authentic products or they are not. However, the proposed solution gives the advantage of transparency and authenticity in the delivery process. The sooner stepping forward to the current food delivery market and providing genuine products would be the wisest. As a result, the whole system will bring significant change.
References 1. Stadtler H (2015) Supply chain management: an overview. Supply chain management and advanced planning: concepts, models, software, and case studies, 3–28 2. Abbasi WA, Wang Z, Alsakarneh A (2018) Overcoming SMEs financing and supply chain obstacles by introducing supply chain finance. HOLISTICA–J Bus Public Admin 9(1):7–22 3. Li Z, Li Z, Zhao D, Wen F, Jiang J, Xu D (2017) Smartphone-based visualized microarray detection for multiplexed harmful substances in milk. Biosens Bioelectron 87:874–880 4. Yeo SF, Tan CL, Teo SL, Tan KH (2021) The role of food apps servitization on repurchase intention: a study of FoodPanda. Int J Prod Econ 234:108063 5. Ullah GW, Islam A (2017) A case study on Pathao: technology-based solution to Dhaka’s traffic congestion problem. Case Stud Bus Manag 4(2):100–108 6. Saad AT (2021) Factors affecting online food delivery service in Bangladesh: an empirical study. Bri Food J 123(2):535–550 7. Shipchain. https://docs.shipchain.io/docs/intro.html. Last accessed 4 Oct 2020
720
B. A. H. Onee et al.
8. Hribernik M, Zero K, Kummer S, Herold DM (2020) City logistics: towards a blockchain decision framework for collaborative parcel deliveries in micro-hubs. Transp Res Interdisciplinary Perspectives 8:100274 9. Badzar A (2016) Blockchain for securing sustainable transport contracts and supply chain transparency—an explorative study of blockchain technology in logistics 10. Toma, N. Z. (2021). An analysis on the use of operation and information technology management in a start-up logistic company. 11. Hwang J, Lambert CU (2008) The interaction of major resources and their influence on waiting times in a multi-stage restaurant. Int J Hosp Manag 27(4):541–551 12. Muntasir B (2019) Meteoric rise of online food business. Dhaka Tribune 13. Kader R (2020) The State of Online Food Delivery in Bangladesh at the Beginning of 2020: Subsidies Make True Demand Hard To Gauge, Future Start Up.)
Flower Disease Detection Using CNN Vemparala Vigna Sri, Gullapalli Angel, and Yalamanchili Manasa Chowdary
Abstract The continuous change in the environment is harmful to the crops and leads farmers toward debt and suicide. Most of the science students intend to provide solutions to the farmers who are involved in major crop production neglecting small-scale farmers. This project aims to develop a framework for the classification of diseases that can be seen in marigold flowers. The addition of global mobile phone utilization and recent enhancement in computer vision made possible by deep learning has floored the way for disease detection. 97% accuracy is achieved by the model using a convolutional neural network in conjunction with a fully connected layer. This project summarizes the need for an application to provide a background about a disease, its symptoms, the different disease aetiologia, and its treatment. Keywords Image preprocessing · Convolution neural networks · Dataset · Training model · Deep learning · Pooling · Fully connected layer
1 Introduction Flower sicknesses and pests are one form of herbal screwups that have an effect on the ordinary increase of plant life or even purpose plant demise at some point of the complete increase system of plant life from level 0 improvements to the last level increase. For gadget imaginative and prescient tasks, plant sicknesses and pests have a tendency to be the standards of human enjoy in preference to only mathematical definition. Flower sicknesses are critical problems, and their goal is to offer a larger discount in combination with high-quality floriculture items. An automated plant-abnormality diagnosis model presents a clean increase in the tracking of large fields, as it is the most effective way which offers to know sicknesses at starting stage. The answer consists of a version that takes a photograph and detects the disorder. Version detects V. V. Sri · G. Angel · Y. M. Chowdary (B) Department of CSE, VR Siddhartha Engineering College, Vijayawada, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_49
721
722
V. V. Sri et al.
Fig. 1 Diseased marigold crop [2]
diseased flower plant life and plays classification. The inspection effects may be supplied in diverse ways. Because marigolds are a significant crop in Andhra Pradesh, the crop is susceptible to seasonal diseases [1]. Figure 1 shows the image of diseased marigold.
1.1 CNN The most well-known deep learning algorithm is the convolution neural network (Conv-Net) which is mostly used for detection and classification [3–5]. Like other layered neural networks, a convolution neural network contains an entry layer, a result layer, and several hidden layers in between. There are numerous nonlinear processing layers in a deep neural network, as well as the use of simple elements that work in parallel and are driven by biological scared systems. It consists of an entrance layer, multiple invisible layers, and a result layer. Each layer uses the previous layer’s output as its input, which is connected via nodes or neurons. Layers that execute convolutions are found in the convolution neural network, which has layers that cannot be seen. Naturally, this includes a layer that generates a dot using the convolution kernel and the input matrix of the layer. This product is for the common Frobenius internal product, which has a ReLU activation attribute. The convolution function gives a function map as the convolution kernel goes along the input matrix for the layer, which is in turn provided to the input of the subsequent layer. Pooling layers are fully linked layers, and normalizing layers are among the layers that accompany them. Image and video identification, recommendation systems, image classification, photograph deconstruction, scientific photograph natural language processing, mind computer interfaces, and monetary time sequence are among the duties they do.
Flower Disease Detection Using CNN
723
1.2 Deep Learning Substructures TensorFlow is an open-source platform. It is possibly maximum famous aid for machine learning and deep learning. TensorFlow was JavaScript-primarily-based totally and is derived readily with an extensive variety of gear and network sources that facilitate easy training and deploying ML/DL fashions. Apart from running and deploying models on affective computing clusters, TensorFlow also can run models on mobile platforms. TensorFlow needs widespread coding, and it operates with a static computation graph. Hence, sketch down the graph first, then perform the calculations. In the event that the version structure changes and it needs to be retrained. TensorFlow is a fantastic tool for creating and monitoring deep learning models. It was used to build functionality for record integration, like fusing graphs, SQL tables, and pictures.
1.3 Libraries 1.3.1
Matplotlib
Matplotlib is a complete visualization library for developing static, lively, and interactive visualizations in Python. The library can carry out diverse features like developing publication-pleasant plots, exporting and embedding to some of the document formats and interactive environments, and greater. It is an open-supply cum plot library present in Python delivered within the annual year 2003. It was a complete library and created in this sort of manner that the maximum features required for plotting in MATLAB might be utilized in Python. It provides a number of plots, such as the linear plot, bar plot, scatter plot, histogram, and others, that can be used to analyze a variety of records.
1.3.2
OpenCV
OpenCV is a famous and open-source computer imaginative and prescient library; this is centered on real-time programs. The library has a modular shape and consists of numerous loads of computer vision and prescient algorithms. OpenCV consists of some modules which include image processing, video analysis, 2D feature framework, and item detection. The OpenCV library enables to carry out the following tasks: • Interpret and create pictures. • Record and save videos. • Process images.
724
V. V. Sri et al.
• Accomplish feature detection. • Recognize unique items along with faces, eyes, and cars, with videos and pictures. • Examine the video, assess the movement in it, remove the background, and adapt items in it. 1.3.3
TensorFlow
TensorFlow is an open-source library for deep learning systems developed by Google. It also helps with traditional machine learning. TensorFlow was first developed to handle large numerical computations without sacrificing deep learning. However, it is known to be very beneficial for deep learning improvement also, consequently opensource information technology. TensorFlow approves records within the side shape of multi-size arrays of better dimensions referred to as tensors. Multi-proportional arrays are very reachable in coping with huge quantities of records. TensorFlow works on the idea of records go with the drift graphs which have nodes and borders. As the execution procedure is within the side shape of plots, it is far a lot simpler to run TensorFlow code in a dispensed way throughout a collection of computer systems while the usage of GPUs.
1.3.4
Keras
It is a high-level deep learning API advanced by google to enforce networks. It was naturally designed in Python and makes it simple to carry out neural network development. Furthermore, it supports a few backend neural network computations. Keras is remarkably easy to learn and use, despite the fact that it provides a high-level of abstraction Python frontend and a few simpler back ends for computation. Due to this, Keras is less user-friendly for beginners than other deep learning frameworks yet slower than others. Alternate between excellent lower back ends with Keras.
1.4 Motivation The main motivation in the back of deciding on this project is to work for the underprivileged community which is small-scale farmers by the usage of modern techniques of machine learning and computer vision. When entry to inputs and situations are equal, smaller farms have a tendency to be extra efficient in line with hectares than plenty of large farms. The traditional tools and strategies are not very beneficial because it takes up masses of time and manual work. Its miles very tough for farmers to diagnose diverse ailments in flowers but with the assistance of an ailment detection machine, those problems will now not occur.
Flower Disease Detection Using CNN
725
1.5 Problem Statement Plant diseases manifest themselves in various parts of the plant, such as the leaves, stems, flowers, and even the soil they are grown in. This could lead to more plant losses. Proper ailment diagnosis is therefore important. Making effective use of image processing through the usage of Web applications to speed up farmers’ ability to identify disease and provide the right plant fertilizers.
1.6 Scope This project would carefully assist the farmer’s problems caused by the various flower diseases. This prototype is for farmers, shopkeepers of fertilizers, and students in the agriculture field, and this system can be only used for detecting marigold diseases.
1.7 Objectives • The goal of this work is to broaden a system that is successful to discover and pick out the form of the disease. • To implement a method for preventing diseases and providing management for reducing the losses/damages caused by diseases. • To identify various diseases and provide remedies.
1.8 Advantages • It helps the farmers to yield more crops by identifying the disease at the early stage. • This Web app scans the image uploaded and verifies whether the flower is healthy or not. If it is not healthy, then it shows the disease name and the corresponding remedies to overcome it. • The remedies for the diseases are available in the form of both organic and chemical forms. Organic remedies are effective in controlling diseases. The chemical remedies help in relieving the symptoms of the diseases. Digital technology has helped farmers in reducing the use of chemical fertilizers and pesticides. • The farmers do not have to waste their time and money on the treatment of the disease as the Web app gives the remedies too.
726
V. V. Sri et al.
1.9 Applications • Image Recognition: In the field of machine vision, the ability of software to identify objects, locations, people, text, and actions is referred to as image recognition. • Classification: It enhances for better classification with a precision of 90%. • Plant disease prediction and fertilizer suggestion.
1.10 Organization Section 2—Literature surveys on neural networks. Section 3—The proposed framework. Section 4—Results and analysis.
2 Related Works In this paper [6], the authors developed a process of flower recognition through some basic techniques like enhancement of the image, segmentation of the image, extraction of the features, and classification of the images which are the sub-techniques of artificial neural networks (ANNs). They collected a few images from photographers and oxford 102 flowers dataset enhanced them, segmented them according to their requirements, extracted the features and classified the images into different categories using supervised learning. They have achieved an accuracy of 81.19%. This article [7] uses convolution neural networks to recognize flowers and leaves for plant identification (CNN). This study investigates the overall effectiveness of using images of flowers, leaves, and their combinations, and CNN can identify plants. The flower recognition dataset and the Folio leaf dataset, both publicly available, were utilized for training and sorting. The authors used a few available layers of CNN, i.e., convolution, pooling, ReLU, and fully connected layers. The attributes they have considered are the color of the flower, texture, and shape of the flower. In this paper [8], the authors tried to use a tracking system by placing a sensor on the soil. They captured the details like humidity, temperature, soil moisture, wind direction, UV index, etc., and performed exploratory data analysis (EDA) on it. They have split the obtained dataset into two categories, the training dataset and the testing dataset. Later, they tried to identify the diseases of the plant from the behavior of the soil. In this paper [9], a sensor was placed in the soil to recognize the water levels, if the soil is not irrigated a motor which is connected to the system automatically turns on. Also, the farmer will be notified with help of a Wi-Fi module. This paper also comprises the second module which is used for disease identification. The
Flower Disease Detection Using CNN
727
sub-modules used are image acquisition, image conversion, clustering (K-means clustering), and classification. In this article [10], the Apple Leaf dataset is collected from the Internet on which the modules like image augmentation and image annotation find whether the leaf is affected by a disease or not. Image augmentation is used to observe the leaves and draw a pattern of diseases. Image annotation turns the content of the diseased area of the leaf into an XML file using Lesion’s bounding boxes. They proposed the INARSSD model for the disease detection of the leaf. The authors in this article [11] collected a set of images from the Internet and also from the fields of marigolds. They tried to isolate an image with high quality by using the FLANN algorithm. The background of the isolated image has been removed using the grab-cut algorithm. Later, they used the HSV color model to highlight the affected part of the flower. The colors used are white for the affected part and gray for the rest of the flower. The authors of the paper intended to find whether or not the marigold flower is affected by a disease using VGG16-based CNN algorithm. This paper [12] gave an overview of the different methods that are currently being used in plant disease detection published by different authors. Some of the methods are analyzing DNA structure, K-means clustering, gray level cooccurrence matrix (GLCM), SVM classifier, etc. Also, a conclusion is provided that the SVM classifier is the better method compared to other methods. In this paper [13], the images are taken in MATLAB with the help of the read command. The original image is converted into a thumbnail to decrease the pixel size. Some of the attributes used in disease detection are color, texture, morphology, etc. The obtained image is transferred into an AI system that is created by the authors. This system detects the disease in the leaf.
3 Proposed System The architecture of the proposed system and methodology is described in this section. The following figure, i.e., Fig. 2, displays the proposed system diagram of our model. Our model incorporates user input. The image of the flower that needs to be identified is used as the input. The flower image is then categorized as healthy or unhealthy using the trained model. Output includes the disease type and appropriate remedies.
3.1 Architecture The architecture of our model’s proposed system diagram is shown in the accompanying picture, designated as Fig. 3. The three main activities involved in architecture are preprocessing, feature extraction, and classification. The CNN model does these tasks. Conv-Net, RELU, pooling, and fully connected layers are among the layers
728
V. V. Sri et al.
Fig. 2 Proposed system model for flower disease detection
of the deployed CNN model. These layers categorize by extracting useful elements from images across numerous steps. The proposed system precisely outlines the entire course of events that will take place in the model from beginning to conclusion in the manner stated.
3.2 Methodology 3.2.1
Data Acquisition Module
The dataset considered marigold pictures. 150 images were taken into account for the purpose of developing a pride model, and they were divided into 90:10 ratios, with 90% of the images being used for training and 10% being utilized for testing the model. The dataset comprises each healthy and unhealthy picture. The pictures are converted to (24, 24) length pictures to map the goal length of enter layer utilized in model development.
Flower Disease Detection Using CNN
729
Fig. 3 Architecture of the proposed system
3.2.2
Training and Testing Module
There are several layers in this module convolutional layer. This layer is the most important part of convolutional neural networks, and it is usually the first layer at the very least. Its purpose is to uncover a set of hard and fast traits in the photographs that were used as input. A. ReLU Rectified linear unit makes it possible for training to be faster and more effective while preserving positive values. These three processes are carried out over tens or hundreds of layers, and each layer learns to identify various traits. Adding an activation function to i feature maps will make the network more nonlinear. This is due to how strongly nonlinear images are. By setting them to zero, it eliminates negative values from an activation map. B. Pooling Layer This layer is typical of convolutional layers in that it gets a lot of characteristic maps and pools each one of them. The image scale is reduced during the pooling operation. The pooling procedure comprises shrinking the size of the pictures while maintaining their core features. C. Fully Connected Layer This layer is normally the final layer in a neural network, whether convolutional or not; therefore, it is not a CNN function. A new output vector is created by this layer
730
V. V. Sri et al.
using an input vector as a starting point. In order to achieve this, it first applies an activation characteristic to the entry values produced by a linear aggregate before using those values as input. The ultimate fully linked layer classifies the image as an input to the network.
3.2.3
Disease Prediction Module
A. Feature Extraction The feature extraction network accommodates hundreds of convolutional and pooling layer pairs. A set of virtual filters are included in the convolutional layer to perform the convolution operation on the input data. B. Classification The cornerstone of image classification, a deep learning phenomenon in which a label and category are given to an image in order to categorize and identify it, is convolutional neural networks (CNNs). Image-type usage of CNN paperwork is an extensive part of machine learning experiments. The input image is preprocessed to extract features, which are then passed to a trained CNN model to determine if the image is healthy or unhealthy. Unhealthy image suggests that one of these diseases damping off, leafspot, powdery mildew, or spider mite will affect the blossom. According to the disease’s predicted symptoms, related treatments are available to lessen the disease’s effects on the flowers.
3.3 Algorithm CNN Algorithm 1: 2: 3: 4: 5: 6: 7:
Select a dataset. Arrange the dataset for training. Prepare training data. Rearrange the dataset. Allocate features and labels. Conversion of labels into the explicit data and normalization of X. Splitting X and Y for use in CNN.
Training and Testing Step1: Step2: Step3: Step4: Step5:
Start Create database (healthy/unhealthy) Preprocessing reduce (Size of 64 * 64) Training CNN Take image from cam/gallery
Flower Disease Detection Using CNN
731
Step6: Preprocessing (Size = 24 * 24) Step7: Testing Step8: If the risk of healthy > risk of unhealthy Display a healthy image otherwise display an unhealthy image Step9: Go to Step 4 Step10: End
4 Results and Analysis The system’s outcomes are shown below. The graph in Fig. 4 illustrates the accuracy and loss of the model for each epoch by considering the images as input for training the model. The model performed well even in classifying the new set of images which are not used for training. The loss characteristic is utilized by the version to learn. Accuracy is much easier to achieve. It assesses how well our model predicts by comparing model predictions to actual values in percentage terms. The loss function decreases gradually as the number of epochs is increased which means our model is performing well. An accuracy of 97% was shown by our model. Results are shown in Figs. 5, 6, and 7.
Fig. 4 Accuracy plot
732
Fig. 5 Powdery mildew
Fig. 6 Spider mite
V. V. Sri et al.
Flower Disease Detection Using CNN
733
Fig. 6 (continued) Fig. 7 Leaf spot
5 Conclusion and Future Work In many nations, marigold flowers are widely used in a variety of disciplines, including religion, health, and cattle. Farmers may, however, contract infections on the marigold flowers prior to harvesting, resulting in farm damage. Using CNN architecture, deep learning technologies, and image processing, this study demonstrated a method for determining whether marigold flowers have the disease or not. For future
734
V. V. Sri et al.
research, we intend to build a technique and application based on Internet of things technology for identifying each illness on the flower, which might aid farmers via closed circuit television (CCTV) in the real world.
References 1. Gurjar P, Meena L, Verma AK (2021) Plant disease detection using convolutional neural network. Int J Adv Sci Res Manag (IJASRM), ISSN 2455-6378 2. Hugh (2012). https://www.whatgrowsthere.com/grow/2012/04/10/botrytis-disease-in-marigo lds-can-be-avoided/ 3. Boulent J, Foucher S, Theau J, St-Charles P-L (2019) Convolution neural networks for the automatic identification of plant diseases. Front Plant Sci 4. Srivastava P, Mishra K, Awasthi V, Sahu VK, Pal PK (2021) Plant disease detection using convolutional neural network. Int J Adv Res (IJAR), ISSN 2320-5407 5. Kolli J, Vamsi DM, Manikandan VM (2021) Plant disease detection using convolutional neural network. In: IEEE Bombay section signature conference (IBSSC), Gwalior, India, pp 1–6 6. Almogdady H, Manaseer S, Hiary H (2018) A flower recognition system based on image processing and neural networks. Int J Sci Technol Res 7(11) 2277-8616 7. FatihahSahidan N, Juha AK, Mohammad N, Ibrahim Z (2019) Flower and leaf recognition for plant identification using convolutional neural network. Indonesian J Electr Eng Comput Sci 16(2):737–743 8. Kumar M, Kumar A, Palaparthy VS (2021). Soil sensors-based prediction system for plant diseases using exploratory data analysis and machine learning. IEEE Sens J 21(16):17455– 17468 9. Priya LR, Ignisha Rajathi G, Vedhapriyavadhana R (2019) Crop disease detection and monitoring system. Int J Recent Technol Eng (IJRTE), 8(4):3050–3053, ISSN: 2277-3878 10. Jiang P, Chen Y, Liu B, He D, Liang C (2019) Real-time detection apple leaf diseases using deep learning approach based on improved convolutional neural networks. IEEE Access, pp 59069–59080 11. Chopvitayakun S, Nuanmeesri S, Poomhiran L, Kadmateekarun P (2021) Marigold flower disease prediction through deep-neural network with multi model image. Int J Eng Trends Technol, pp 174–180 12. Kiran SM, Chandrappa DN (2019) Current trends in plant disease detection. Int J Sci Technol Res 8, 2277-8616 13. Sujawat GS, Chouhan JS (2021) Application of artificial intelligence in detection of diseases in plants. Turk J Comput Math, pp 3301–3305
Detection of Natural Disasters Using Machine Learning and Computer Vision by Replacing the Need of Sensors Jacob Bosco, Lavanya Yavagal, Lohith T. Srinivas, Manoj Kumar Katabatthina, and Nivedita Kasturi
Abstract Natural disasters not only cause death and property destruction around the world but the severity of these disasters is also reflected in many other forms like the economic loss, loss of lives, and the ability of populations to rebuild. This paper focuses on using machine learning, deep learning, and computer vision to detect natural disasters. Previously, many attempts have been made to detect natural disasters using different techniques including the use of sensors, satellites, drones, crowdsourcing, and basic ML models and reducing the severity of these disasters. But natural disaster detection still faces challenges due to the limitations of the above techniques like high cost of equipment and training time, lower accuracy, etc. Therefore, in order to combat these issues, transfer learning models are being used, namely a few architectures from VGG, ResNet, and EfficientNet. Many different values for epochs and dataset sizes were experimented with, as well as different configurations for the hidden layers and optimizers to improve the classification accuracy. From the results obtained after training the models, ResNet50 gave the best results with an accuracy of 96.35%. Keywords Transfer learning · Natural disaster detection · Image classification · ResNet
1 Introduction Natural disasters are phenomena caused by various conditions and situations on Earth. These events are extremely hard to predict as there are numerous factors that contribute to these occurrences. All natural disasters cause some form of loss. The number of deaths caused is also influenced by population density. Due to the increasing burden on global resources, due to rapid population growth, people and J. Bosco · L. Yavagal · L. T. Srinivas (B) · M. K. Katabatthina · N. Kasturi PES University, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_50
735
736
J. Bosco et al.
their infrastructure are becoming more vulnerable to the ever-present natural disasters. This creates a dynamic balance between these forces, and advances in science and technology play an important role. Frequent earthquakes, floods, landslides, and wildfires need to be investigated with today’s superior techniques to identify effective preventative measures. Non-profit organizations interested in helping victims are called NGOs. During disasters, NGOs play an important role in disaster relief, mitigation, and recovery. Emergency food aid, temporary shelter, medical aid, debris removal, habitat restoration, etc., are all part of the activities of the NGOs. Natural disasters are unavoidable, but they may be detected. Sensors are used to monitor natural catastrophes all around the world. To monitor earthquakes, seismic sensors and vibration sensors are utilized (and downstream tsunamis). Radar charts have been used to detect a tornado’s characteristic ‘hook echo’. Flood sensors are used to monitor flood levels, whereas water level sensors are used to monitor the height of the water alongside rivers, streams, and other bodies of water. Wildfire sensors are still in their infancy. Each one of these sensors is tailored to a certain calamity. They aid in the early detection of problems. However, the price of acquiring and maintaining hardware/sensors is quite expensive and unaffordable. Additionally, it is limited to the control and communication range of the drones, so it cannot alert people to other afflicted locations. It is sometimes quite difficult to capture photographs with clear sky for damage assessment. Following sensors, machine learning has emerged as one of the most significant computer technologies of today, with expanding applications in everyday life and numerous industrial domains. ML uses algorithms to create additional predictions that impact the qualities of given data. As the amount of data increases, the effectiveness of ML algorithms tends to improve. Established deep learning-based ML models, on the other hand, need a substantial quantity of sample data for training. Obtaining this during an emergency reaction is typically challenging. As a result, transfer learning has almost been used to address the issue of inadequate training data. It is a ML approach that uses a model generated for one job as the basis for a model developed for a different task. Given the enormous computing and temporal resources required to create neural network models for these challenges, pre-trained models are used as a starting point. This is a highly common method. Social media can expose you to unwanted crises. All the people in a region cannot be intimated about the disaster as the target users are always a particular subpopulation. There is also no guarantee on the credibility of information that is shared through social media. Social media data itself is subject to various uncertainties, such as the inaccuracy of GPS coordinates, reliance on sources, underrepresented communities, analytical methods, and trustworthiness of volunteers and the companies that run social media networks. Till date, many have suffered greatly from disasters. The disaster could not be predicted accurately, and the victims could not evacuate the disaster area in time. After the disaster, people were not offered mitigation measures. Keeping in mind the above limitations, the research focuses on finding a suitable transfer learning model that can quickly and accurately detect the presence of the natural disaster. The usage of the approaches described in this paper eliminates the requirement for expensive and specialized gear. This, in turn, makes the system
Detection of Natural Disasters Using Machine Learning and Computer …
737
more accessible, minimizing the time required for reaction and the initiation of relief operations. Another advantage of the proposed procedure is that the ImageNet model serves as a solid foundation for the model that has been proposed, reducing training time. Once the disaster is successfully detected, in order to inform the public about the disaster, a mobile application will also be developed. This will help the people evacuate in time. The coordinates of the people in the disaster region will also be shared to the NGO relief teams to help locate the people in need. The NGO location data will also be available to the people in the vicinity. The upcoming sections are organized as follows: ‘related work’ section, which talks about the previous works and background information related to the paper; ‘proposed methodology’ section, which introduces the architecture of the method that is being followed in this paper, the different transfer learning models used and its implementation. ‘Results and discussion’ section provides the results of the models executed and analyzes it; ‘conclusion and future scope’ section summarizes the research done and discusses potential future research work.
2 Related Work Analyzing the severity of natural disasters has received considerable attention in the current decade. Current works in this field rely on satellite information to predict and detect the presence of natural disasters. Erdelj and Natalizio [1] placed sensors to monitor temperatures and humidity at various places near the water bodies that sent data to servers to combat the effects of natural disasters caused by flooding of rivers and other water bodies. If a flood is detected, UAVs drones are deployed to survey the affected area. But the cost of maintaining and purchasing hardware/sensors is very high and not affordable. It can also only monitor areas within the drones’ control and transmission range, hence not capable of informing about other areas that are affected. Also obtaining images with clear skies for damage assessment is often a severe problem. Many social media networks like Instagram, Twitter, etc., have also been used as sources for detection of disasters. Havas et al. [2] discussed the techniques and benefits of using social media, targeted crowdsourcing to help the impacted people and improve the information product. Another study [3] provides proof that the scientific techniques and operational aims have been accomplished by sketching the proposed system architecture and describing appropriate situations. But there is a possibility for unwanted crisis exposure. There are a number of difficulties in the social media data itself, including inaccurate GPS positions, the reliability of the source, underrepresented populations, analytic techniques, and reliability of volunteers or the companies behind the social media networks. Zhang et al. [4] focused on the rapid and accurate identification and classification of media content uploaded to social networking sites with the use of Paillier homomorphic encryption scheme and used the concept of federated transfer learning for this classification.
738
J. Bosco et al.
Their study focuses on timely adoption of disaster response strategies and reducing disaster losses and timely planning and carrying out relief operations. The study by Mythili and Shalini [5] sheds light and provides a comparative study of various mobile applications for disaster management and their efficacy and efficiency. They give a thorough analysis of current emergency applications by considering their functions, advantages, and drawbacks. But the applications discussed in this study serve as ‘proof of concept’, and many of these applications are tools for Web cooperation which might be a drawback. Another study [6] proposes a crisis management system for real-time emergency notification of users through smart watches and mobile applications. A user-friendly Web portal is created so that government organizations may quickly and effectively alert people in risk. Many other solutions have been proposed to combat specific types of disasters such as establishing towers in the remote forests to detect wildfires. These solutions rely on highly specialized hardware that are very expensive and require special maintenance. In the study made by Sahin and Ince [7], the method makes use of a radio-acoustic sounding (RASS) technology to assess distant temperatures and perform thermal sensing on a specific forest area. It can immediately and constantly measure the profiles of the air temperature. The results from the simulations also showed that this technique is not useful for group fires. Since RASS, Lidar, and other fire detection systems need the use of emerging technologies as well as certain additional equipment and instruments, they are more expensive than fire detection systems based on observation towers. Figure 1 depicts the outline of a 5-layer dense block in a dense convolutional network with a growth rate of 4 where each layer receives a combined knowledge from all the preceding layers with respect to a particular layer. This paper [9] demonstrates image classification using pre-trained deep neural network model VGG16 which is trained on images from ImageNet dataset. A new deep neural network model is built on top of the convolutional base model for image classification based on a fully connected network. Another study by Milan Tripathi [10] sheds light on how convolutional neural networks can be used to classify and hence identify fruits in a grocery store scenario. The previous works have also used simple basic machine learning models like LSTM, RNN, VGGOR, Naive Bayes, etc., that have limited accuracy, only focusing on individual disasters and the intensity of the damage caused. The proposed method focuses on detecting disasters extended to multiple types. These also take a longtime to train the models. After looking into the previous related works, the methodology used in this paper is explained in the following section. It also gives a description of the architecture of the model, the different transfer learning models used, and its implementation.
Detection of Natural Disasters Using Machine Learning and Computer …
739
Fig. 1 5-layer dense block with a growth rate of k = 4. Each layer takes all preceding feature maps as input [8]
3 Proposed Methodology The method used entails training the models considered as part of the research undertaken with the image dataset and choosing the best performing model based on the performance metrics obtained while testing the models individually. The loss function and accuracy are the metrics considered to choose the best performing model for the nature/type of images considered for classification.
3.1 Dataset and Preprocessing The dataset used in the research consists of a total of 4495 images from 4 different classes: earthquakes, floods, cyclones, and fires with 1350, 1073, 928, and 1144 images, respectively. Three of which (earthquakes, cyclones, and floods) were assembled from the PyImageSearch Reader, and the dataset for fires was gathered from Kaggle. The dataset was preprocessed using an open-source library called OpenCV which is used for computer vision and image processing. It normalizes the images such that its shape is in the form of a NumPy array (3, 224, 224) where the size of the
740
J. Bosco et al.
Fig. 2 Preprocessing steps
images is 224 × 224 and 3 channels, one for each RGB color are created. The ImageDataGenerator library is used to add certain features to each of the images to obtain better accuracy. This includes rotating the images by 30° or − 30° and zooming them within the range [0.85, 1.15]. The ‘width_shift_range’ and ‘height_shift_range’ have been used to shift the images right/left and up/down, respectively, and choose the shift values randomly within the range [− 20%, 20%]. The images have been slanted along an axis and stretched slightly at a certain angle being 0.15°. While training the models, ‘horizontal_flip’ has been made ‘true’ in order to randomly flip the images horizontally. All these values have been chosen carefully to train the models thoroughly and obtain better results. Figure 2 gives the summary of the above mentioned process.
3.2 Transfer Learning Models The method used will employ a few convolutional neural networks (CNNs) that have been pre-trained on a training dataset in order to improve classification accuracy for the four natural disasters: 1. VGG16 and VGG19: This model only utilizes 3 × 3 convolutional layers piled up on top of one another to increase the depth. Max pooling is responsible for reducing the volume size as it primarily creates a downsampled feature map. The numbers stand for the respective number of weight layers (present in the network). Macroarchitecture of VGG16 is shown in Fig. 3. 2. ResNet: ResNet relies on microarchitecture modules or ‘network-in-network’ architecture. This module shows how to use the residual modules to train a very deep network with standard SGD. Updating this residual module leads to better accuracy and hence is beneficial. The residual module in ResNet as originally proposed by He et al. in 2015 is shown in Fig. 4.
Detection of Natural Disasters Using Machine Learning and Computer …
741
Fig. 3 Macroarchitecture of VGG16 [11] Fig. 4 Residual module in ResNet as originally proposed by He et al. in 2015 [12]
3. EfficientNet: This EfficientNet model offers higher efficiency and accuracy than existing alternatives, also contributes in decreasing size of parameters and floating-point OPS by high orders of magnitude. Block-diagram of EfficientNet model is shown in Fig. 5.
742
J. Bosco et al.
Fig. 5 Block diagram of EfficientNet model [13]
3.3 Implementation The method used uses transfer learning, which can be stated as a ML technique that aims to use a model developed for one task as a base for a model for another task. Established deep learning-based ML models need a substantial quantity of sample data for training. Transfer learning is used to address this issue of inadequate training data. It allows developers to avoid the need for large amounts of new data. It is inexpensive and faster than the techniques used previously. The models used were trained with varying input batch sizes and number of epochs. It is a machine learning approach that uses a model generated for one job as the foundation for a model developed for a different task. Given the enormous computing and temporal resources required to create neural network models, deep learning for computer vision and natural language processing applications uses pre-trained models as a starting point. The models considered as part of the research use ImageNet as its base. Initially, the transfer learning models do not include the top layer which is to ensure that the layers are not trained and the weights are used as it is. The model’s head has been built and will be placed on top of the base model. The actual model that will be
Detection of Natural Disasters Using Machine Learning and Computer …
743
trained will be the head model. In order to prevent updates during the initial training process, the basic model’s layers are looped over and frozen. This head model has been created by using some layers so that the training models can gain some deep information regarding the images and their features to train the models better. The first is the ‘flatten’ layer which basically converts a 2D matrix into 1D vectors. Then, the ‘dense’ layer has been used where 512 indicates the number of neurons in the 1st layer, then using the ‘dropout’ layer after that and then again using the ‘dense’ layer again to indicate the number of neurons in the output layer. The ‘dropout’ layer indicates 0.5 which means that there is a probability of 50% that the output of a neuron will be forced to 0. Then, the model is compiled with the ‘adam’ optimizer set as the ‘optimizer’ where it updates the learning rate dynamically and the ‘loss’ function is set as ‘categorical_ crossentropy’ and ‘metrics’ as ‘accuracy’ in order to assess the models’ effectiveness considered as part of the research undertaken. The loss function and the accuracy metric help to find the best performing model among the models considered as part of the research.
3.4 Architecture Diagram of Model Figure 6 shows an overall outline of the architecture of the system. The strategy for training the model in this implementation is to use ImageNet as its base by using its pre-trained weights and then adding some more layers on top of it to improve its performance and accuracy as mentioned above. The results of the models that are executed are analyzed in the following section.
4 Results and Discussion After implementing the model as discussed in the above section, this section describes and analyzes the results obtained from the experiments on each model considered as part of research using different parameters with different sets of images. Table 1 gives a brief description of the results of the experiments. The research paper is focused on five transfer learning models, and those models are trained with different configurations, for example, the number of images fed to the models. The models are validated with results such as loss, accuracy, validation loss, and validation accuracy as seen in [8, Eq. (1)] and [8, Eq. (2)]. The dataset is split into validation data and training data for each epoch. If yi is the predicted value of the ith sample and yi is the corresponding true value, then the fraction of correct predictions over n samples is defined as
744
J. Bosco et al.
Fig. 6 Model architecture
accuracy y, yˆ =
1 n samples
n samples −1
1 yi = yi
(1)
i=0
If ti is the truth label and pi is the Softmax probability for the ith class, then cross entropy loss function is defined as L CE = −
n
ti log( pi ), i for n classes
(2)
i=1
Figure 7 depicts the variation of accuracy alias training accuracy with an increasing number of images on which the models are trained on, for each different model. It is evident that the top 2 training accuracy values are both achieved by ResNet50 under the 2000 and 2500 (number of images) configuration. Almost all the models show a trend of increasing training accuracy as the number of images on which the models are trained is increased. Figure 8 depicts the variation of validation accuracy with an increasing number of images on which the models are trained on, for each different model. It is evident that the highest validation accuracy value of 0.9624 is attained by both ResNet50 and EfficientNetV2L under the 2000 (number of images) configuration. However, a steeper drop in validation accuracy is noticed in the case of EfficientNetV2L when run against a relatively higher number of images as compared to ResNet50. Figure 9 depicts the variation of validation loss with an increasing number of images on which the models are trained on, for each different model.
Detection of Natural Disasters Using Machine Learning and Computer …
745
Table 1 Accuracy and loss measures for different models under different configurations Models
Evaluation criteria 1 K images 1.5 K images 2 K images 2.5 K images
VGG16
Loss
VGG19
ResNet50
ResNet152V2
2.6778
3.7854
3.5838
Accuracy
0.9199
0.9358
0.9362
0.9159
val_loss
4.3153
4.3502
5.0641
6.1331
val_ac
0.8995
0.9033
0.9348
0.8838
Loss
1.6728
2.4449
3.1661
2.3933
Accuracy
0.9262
0.9325
0.9256
0.9284
val_loss
4.3044
4.7449
4.9594
6.1190
val_ac
0.8500
0.9097
0.9713
0.8697
Loss
0.9290
0.3693
0.1472
0.1189
Accuracy
0.9387
0.9533
0.9600
0.9635
val_loss
2.4712
0.4962
0.1323
0.1807
val_ac
0.8950
0.9365
0.9624
0.9459
1264.8455
550.4902
758.6967
276.7216
Loss Accuracy val_loss val_ac
EfficientNetV2L Loss
ResNet50 (non-pre-trained model)
1.7527
0.6175
0.6427
0.6037
0.6958
1138.0365
774.1076
700.8853
229.0619
0.5700
0.5552
0.6266
0.6894
0.8696
0.5212
0.3831
0.6885
Accuracy
0.9100
0.9316
0.9325
0.9370
val_loss
0.8103
0.3840
0.1343
0.3100
val_ac
0.8700
0.8930
0.9624
0.9379
Loss
9.7225
2.5121
1.123
1.0136
Accuracy
0.4162
0.5658
0.7010
0.7279
val_loss
7.0960
1.4419
1.2023
1.1011
val_ac
0.3970
0.6400
:0.7201
0.7743
ResNet50 and EfficientNet gave the highest accuracy with 7 epochs with ResNet 50 giving lesser loss. Higher number of epochs resulted in overfitting due to the limited number of images and simplicity of the nature of images in most of the models considered as a part of the research. Based on the results, the accuracy and validation accuracy of the ResNet50 model are high and stable at all times, irrespective of the number of images on which the model is trained. Hence, ResNet50 model was therefore selected as the best performing model on the chosen image dataset after taking the following things into consideration: 1. 2. 3. 4.
Training accuracy/validation accuracy Model size Complexity of the data Complexity of the model.
746
J. Bosco et al.
Fig. 7 Plot of number of images versus accuracy for all models
Fig. 8 Plot of number of images versus validation accuracy for all models
4.1 Limitations The model can classify only four types of disasters which are earthquakes, cyclones, floods, and fire. In this research, only five top transfer learning models are considered (taking into consideration the simplicity of the model architecture and training time of the models), and the best performing model on the image dataset is selected out of these five models (VGG16, VGG19, ResNet50, ResNet 152 V2, EfficientNetV2L) trained on the image dataset.
Detection of Natural Disasters Using Machine Learning and Computer …
747
Fig. 9 Plot of number of images versus validation loss for all models
The other limitation of the study is that only five transfer learning models are trained, with a dataset of relatively less number of images due to unavailability of powerful computing resources.
5 Conclusions and Future Scope In this paper, natural disasters are detected by conducting a comparative study of different transfer learning models. These models were developed using the dataset, and ResNet50 gives the best accuracy for 7 epochs when the model is compiled with Adam optimizer. The use of the above obtained methods removes the need for highly expensive and specialized hardware. This in turn makes the system easily accessible, thus reducing the time needed for reaction and starting relief operations. One other benefit of the proposed method is that the ImageNet model provides a solid base to build the model, hence reducing the training time. One suggestion for future research would be to consider the other complex transfer learning models other than those models studied as part of this research. Another suggestion would be to train the transfer learning models under focus in future research with an image dataset of a very high number of images provided that the necessary powerful computing resources are available. This research is of high significance as it paves an ideal path for future research in terms of the methods to be followed, the models to be considered and the potential goals to be achieved.
748
J. Bosco et al.
References 1. Sushma L, Lakshmi KP (2020) An analysis of convolution neural networks for image classification using different models. Int J Eng Res Technol (IJERT) 9(10) 2. Havas C, Resch B, Francalanci C, Pernici B, Scalia G, Fernandez-Marquez JL, Van Achte T, Zeug G, Mondardini MR, Grandoni D, Kirsch B (2017) E2mc: improving emergency management service practice through social media and crowdsourcing analysis in near real time. Sensors 17(12):2766 3. National Institutes of Health (2017) National Library of Medicine, National Center for Biotechnology Information, The United States of America, Accessed 2 Nov 2022. https://www.ncbi. nlm.nih.gov 4. Zhang Z, He N, Li D, Gao H, Gao T, Zhou C (2022) Federated transfer learning for disaster classification in social computing networks. J Saf Sci Resilience 3(1):15–23 5. Mythili S, Shalini E (2016) A comparative study of smart phone emergency applications for disaster management. Int Res J Eng Technol (IRJET) 3(11):392–395 6. Ghazal M, Ali S, Al Halabi M, Ali N, Al Khalil Y (2016) Smart mobile-based emergency management and notification system. In: 2016 IEEE 4th International conference on future Internet of Things and cloud workshops (FiCloudW). IEEE, pp 282–287 7. Sahin YG, Ince T (2009) Early forest fire detection using radio-acoustic sounding system. Sensors 9(3):1485–1498 8. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the Harvard IEEE conference on computer vision and pattern recognition, pp 4700–4708 9. Desai C (2021) Image classification using transfer learning and deep learning. Int J Eng Comput Sci 10(9) 10. Tripathi M (2021) Analysis of convolutional neural network based image classification techniques. J Innov Image Process (JIIP) 3(02):100–117 11. Lin Q, Ci T, Wang L, Mondal SK, Yin H, Wang Y (2022) Transfer learning for improving seismic building damage assessment. Remote Sens 14(1):201 12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 13. Mahmudul Hassan SK, Maji AK, Jasinski M, Leonowicz Z, Jasinska E (2021) Identification of plant-leaf diseases using CNN and transfer-learning approach. Electronics 10(12):1388
Prediction of Blood Pressure and Diabetes with AI Techniques—A Review G. R. Ashisha and X. Anitha Mary
Abstract Diabetes and hypertension are the most prevalent and rapidly spreading non-communicable diseases in many nations today. Researchers are striving to prevent and predict diabetes and high blood pressure earlier. Various artificial intelligence models are being used to predict these diseases. The majority of those who have these diseases also have higher rates of complications from heart disease, renal failure, stroke, foot injury, retinopathy, neuropathy, and pregnancy-related problems. In this work, a comprehensive study has been performed, which focuses on prediction of blood pressure and diabetes using machine learning approach. We stress on how artificial intelligence can help diabetes and hypertension prediction by utilizing medical data from different sources. This work will serve as a guideline for future studies that will enable the accurate prediction of diabetes and blood pressure. Keywords Machine learning · Hypertension · Blood pressure · Diabetes · Artificial intelligence
1 Introduction Diabetes and hypertension are major health conditions and are becoming more prevalent, especially in lower income and developing nations. The main cause of death in diabetes is heart disease, which is aggravated by high blood pressure. Machine learning (ML)-based blood pressure and diabetes prediction have been increasingly developed due to its ability to build efficient prediction model. In this work, a study G. R. Ashisha (B) Department of Electronics and Instrumentation Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India e-mail: [email protected] X. Anitha Mary Department of Robotics Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_51
749
750
G. R. Ashisha and X. Anitha Mary
of ML technique-based diabetes and hypertension prediction has been presented. Artificial intelligence (AI) includes the subfield of ML. It is a continually advancing technology in many fields that allows computers to gain knowledge automatically from the past data. ML uses variety of algorithms for developing prediction method; recently, this approach has been increasingly used in areas such as image processing, detection of object, speech recognition, video games, robotics, accounts, and many others.
1.1 Diabetes Diabetes is now a major disease [1] that can be brought on by obesity, age, lack of physical activity, inheritance, lifestyle, poor diet, high blood pressure, and other factors. Diabetes is a condition in which one’s blood sugar levels are abnormally high. After we eat, some of the food is converted into glucose, a type of sugar that flows via bloodstream to all body cells. Diabetes is brought on by an excess quantity of glucose in the blood, faulty beta cell function in the pancreas, and effects on various body parts. Diabetes can be due to either by insufficient insulin is produced or by the body’s cells do not react to it as intended. Common types of diabetes are: Type1 Diabetes (T1D): In T1D (insulin-dependent diabetes), cells of the immune system kill the pancreatic beta cells that produce insulin over the course of months or years without causing any symptoms. Without insulin, blood glucose level rises, which can damage blood vessels and cause heart disease, renal failure, and blindness. People with T1D take insulin injection multiple times per day [2]. Type2 Diabetes (T2D): T2D diabetes is also called “non-insulin-dependent diabetes.” In T2D, the pancreas is unable to release higher levels of insulin required to prompt the resistant cells to absorb the bloodstream sugar. Additionally, T2D frequently affects elderly or obese people [2]. Gestational Diabetes: It is also a form of diabetes that can be detected when pregnant. During pregnancy, a non-permanent organ develops which is attached to the fetus and the mother called placenta. Nutrition and oxygen to the fetus were provided by the placenta, which produces several hormones that support pregnancy. Some hormones impair insulin’s ability to work, rendering it inefficient. This type of diabetes can result in complications like a big baby, low glucose levels in the baby, high BP in mother, and later, diabetes in both the mother and the baby [2]. The most common symptoms of diabetes include excessive urination, intense thirst, severe hunger, fatigue, vision problems, sudden weight loss (usually T1D), obesity (usually T2D), and skin itchiness. Some of the diabetes-related problems include kidney damage, heart disease, nerve issues, stroke, knee injury, cerebrovascular problems, and so on. Diabetes could be controlled with the help of prescribed medication, a well-planned diet, and regular physical activities.
Prediction of Blood Pressure and Diabetes with AI Techniques—A Review
751
2 Diagnosis of Diabetes A1C Test: Glycated hemoglobin test (A1C test) shows the average range of blood glucose over the recent 2–3 months. It evaluates the amount of blood glucose that is connected to hemoglobin, the molecule that carries oxygen in red blood cells [3, 4]. Fasting Glucose Test (FGT): Fasting blood sugar test (Fasting plasma glucose test) detects the glucose levels after at least 8 h fast (Table 1). Glucose Tolerance Test (GTT): After consuming sugary drinks, FGT patients would recheck their blood glucose levels after 2 h.
2.1 Hypertension Hypertension is the most serious disease in the globe and most people suffer from this. Hypertension arises when BP increases than normal, and it is necessary to diagnose hypertension at an early stage, as it can enhance the risk of kidney damage, brain damage, heart disease, and others. According to World Health Organization, one in four males and one in five females have this disease, and it is considered as one of the main causes for premature death globally [5]. BP is generated by the blood’s excessive force pushing against the arteries. Arteries are the walls of blood vessels that supply blood from heart to all organs. A BP cuff (sphygmomanometer) is an equipment used to measure BP. BP measurement displays two numbers, namely Systolic BP (SBP) and Diastolic BP (DBP). SBP measures the pressure your blood applies to the artery walls while your heart beats. The amount of pressure your blood is applying to the walls of your arteries between heartbeats which is measured as DBP (Table 2). Table 1 Range of diagnosis of diabetes Status of diabetes
A1C Test
FGT
GTT
Diabetes
< 5.7%
≤ 99mg/dl
≤ 139mg/dl
Prediabetes
From 5.7% to 6.4%
100mg/dl to 125mg/dl
140mg/dl to 199mg/dl
Normal
≥ 6.5%
≥ 126mg/dl
≥ 200mg/dl
Table 2 Categories of BP Categories of BP
SBP
DBP
Normal BP
Below or 120mmHg
Below or 80mmHg
Prehypertension
120mmHg to 139mmHg
80mmHg to 89mmHg
Hypertension (Stage1)
140mmHg to 159mmHg
90mmHg to 99mmHg
Hypertension (Stage2)
Above or 160mmHg
Above or 100mmHg
752
G. R. Ashisha and X. Anitha Mary
People who are overweight are more likely to manifest hypertension due to inherited factors, high cholesterol, high water-salt ratios, hormone levels, excessive salt intake, and long-term stress. If untreated, hypertension can result in heart attacks, irregular heartbeats, kidney failure, stroke, microvascular-related disease, and prolonged chest pain, which can increase the risk of premature death. Hypertension can be controlled with medicine, a good diet, and exercise. Some of the diagnosis methods of hypertension are echocardiogram, ECG, urinalysis test, and ambulatory BP monitoring (ABPM) device. There are few surveys available in this field of diabetes and BP prediction and its associations. In [6], the study of the types of diabetes and its associated complications (Fig. 1) are carried out. From the literature, it is observed that the high BP is one of the major causes of diabetic retinopathy, and the diabetic nephropathy condition may cause high BP. Kee et al. [7] focused on the recent development of ML-based prediction algorithm of cardiovascular disease among T2D patients. Study shows that neural network algorithm has good accuracy in predicting cardiovascular disease with 76.6% precision. Brunstrom et al. [8] investigated the consequence of antihypertensive treatment in diabetic person. This review tells that people with diabetes whose SBP greater than 140mmHg had a lower risk of mortality and cardiovascular morbidity while taking antihypertensive treatment. However, continuous antihypertensive medication increases the chance of cardiovascular death, if SBP is less than 140mmHg. Zhang et al. [9] conducted a review on evaluating the relationship between hyperglycemia in pregnancy (HIP) and BP in progeny. According to this review, gestational diabetes may cause higher SBP and DBP in the offspring, through the cardiovascular complications brought on by HIP. Gholizadeh et al. [10] evaluated the impact of low sodium diet and high sodium diet on BP in diabetic people. Study confirms that the low sodium diet had a good impact on BP reduction in diabetic subjects. But no effect was found in mean arterial pressure. In our paper, a study has been performed, which concentrates on prediction of BP and diabetes using ML algorithms.
Cerebrovascular Disease Diabetic Retinopathy Coronary Artery Disease Kidney Damage Diabetic Neuropathy Diabetic Foot infection
Fig. 1 Complications of diabetes
Prediction of Blood Pressure and Diabetes with AI Techniques—A Review
753
3 Methodology 3.1 Search Strategy For this work, a literature survey was performed as a main objective which included searching for the pertinent published research articles. A search was done on research articles published in ScienceDirect, Springer, and Scopus till February 2023. Keywords used for the search were “diabetes,” “blood glucose,” “blood pressure,” machine learning,” “prediction,” and “artificial intelligence.” A total of 200 papers were collected using all the keywords, and 29 of them were ultimately chosen. This efficient strategy is planned to provide a complete outline of the published research articles based on ML and AI. Figure 2 shows the flowchart of the research article search policy. Fig. 2 Flowchart of the literature search strategy
Initial Research Article Selection (n=300)
Research Articles after removal of replicates (n=270)
Research Article screening for relevance (n=200)
Checking for full text articles (n=97)
NO Matching Inclusion Standards
Elimination (n=68)
YES Research articles included (n=29)
754
G. R. Ashisha and X. Anitha Mary
3.2 Inclusion Standard The inclusion standard comprised of: 1) date of publication, chosen papers were published between 2003 and 2023; 2) type of publication, journal articles, and the conference articles were taken into consideration; 3) relevancy, titles of the published research paper, and abstracts were considered.
3.3 Exclusion Standard Some research papers were found on multiple sources, so these replicate research articles were eliminated. Many publications were found to be irrelevant when titles and abstracts were scrutinized. Few articles were not found as a full paper; therefore, they were removed. In the end, 29 articles were selected for the systematic survey process.
4 Literature Survey A thorough literature review has been done in this survey which includes the prediction of BP and diabetes using AI techniques, its advantages and challenges. In recent years, researchers have been working on predicting BP and diabetes using various ML algorithms. AI is a technology that allows a system to imitate human behavior. ML is a branch of AI which enables a system to spontaneously learn from previous data without explicit planning. Using this algorithm, it is simple to develop exceedingly complex or expensive models. ML is classified into supervised, unsupervised, and reinforcement learning. All the research papers presented here utilize either unsupervised or supervised learning. Figure 3 shows the general framework of AI-based diabetes and hypertension prediction model. The plan to predict BP using solely PPG waveforms was examined by Teng and Zhang [11]. In this work, a linear regression system was used to assess the association between PPG characteristics and arterial BP. They collected BP and PPG signals from 15 males between the ages of 24 and 30. PPG Features such as amplitude, width, systolic upstroke duration, and diastolic duration were chosen for the work. It was observed that the diastolic duration has a significant association with SBP and DBP than other attributes. In 2019, Tanveer et al. [12] introduced ANN-LSTM technique for BP prediction using ECG and PPG waveforms. This study used 39 patient’s data collected from Physionet’s MIMIC data file. The mean absolute error (MAE) and root mean square error (RMSE) for SBP prediction using the suggested network are 1.10 and 1.56 mmHg, and for DBP prediction are 0.58 and 0.85 mmHg, respectively. This model meets the AAMI criteria of BP measurement. Additionally, according to the BHS standard, both SBP and DBP’s BP measurement quality have obtained A
Prediction of Blood Pressure and Diabetes with AI Techniques—A Review
Diabetes Dataset
Data Preprocessing
Feature Extraction
755
Data Normalization
Test Data
Comparison of model performance
Prediction Model
Train Data
Fig. 3 General ML-based diabetes and hypertension prediction model
grade. This technique is expected to significantly facilitate the currently accessible digital healthcare devices in accurate continuous BP measurement. Hassan et al. [13] developed a regression system for SBP estimation using PTT technique. In this investigation, PPG and ECG data were taken from 10 normal individuals. Error rate of this model satisfies AAMI standards but is tested on a very limited data of only ten subjects, while the AAMI standard needs at least 85 subjects to be evaluated. Satomi and Koji [14] proposed a technique of SBP estimation using PPG sensor. In this research, data were gathered from 368 individuals. Error-correcting output coding technique was applied on the combination of AdaBoost classifier. By comparing the estimated SBP with the standard SBP, the mean difference (MD) and standard deviation (SD) were calculated as -1.2 mmHg and 11.7 mmHg, respectively. The performance of the work did not meet the AAMI standards and can only offer sporadic SBP estimation. In 2013, Ruiz Rodriguez et al. [15] presented a deep neural network-based BP estimation. 572 data were collected for this study from the university teaching hospital. Deep belief network restricted Boltzmann machine (DBN-RBM) technique was used for estimating mean arterial pressure, SBP, and DBP. However, the prediction results were extremely variable, causing the SD to exceed the 8mmHg limit imposed by AAMI criteria. The study’s findings may be enhanced by adding a feedback connection from previous phases to the input layer to compensate for the periodic relations in the PPG characteristics. Datta et al. [16] proposed a combination of ML models for BP estimation from PPG waveforms. This work includes noise reduction methods to eliminate noise from PPG waveforms and mean subtraction for normalization of
756
G. R. Ashisha and X. Anitha Mary
the signals. Linear regression technique was used for BP estimation, and the error of the model is below 10%. Pradeep Kandhasamy et al. [17] used a variety of ML approaches, including random forest (RF), support vector machine (SVM), K-nearest neighbor (KNN), and J48 decision tree. The four algorithms were utilized to make predictions of diabetes under two scenarios: One is data with noise and another one is data without noise. When compared to other three classifiers, J48 decision tree classifier obtains better accuracy for noisy data. In noiseless dataset, RF and KNN techniques achieve better sensitivity, specificity, and accuracy. Ahmed et at. [18] proposed a ML technique to develop a prediction model for diabetes. Three different classifier algorithms were used in this model, namely RF, multi-layer perceptron (MLP), and radial basis function network (RBFN). Finally, MLP and RBFN achieve the maximum specificity for analyzing diabetic data. After implementing tenfold cross-validation techniques, RBFN provided the highest accuracy. A classification technique, including decision trees, SVM, and Naive Bayes (NB) was introduced by Deepti et al. [31] to predict diabetes. Finally, NB classifier achieved better accuracy on a variety of performance measures, including recall, precision, Fmeasure, and accuracy. The outcomes were also verified using the receiver operating characteristic curve. Hani et al. [32] introduced an artificial neural network (ANN) algorithm to estimate BP and diabetes. MLP algorithm obtained higher accuracy for diabetes dataset and hypertension dataset. Table 3 shows the summary of AI algorithms on diabetes and hypertension prediction.
5 Discussion and Conclusion According to literature, T2D is connected to a higher incidence of hypertension and vice versa. The key findings from this analysis are listed below:• PPG and ECG waveforms are the popular distinctive waveforms for hypertension prediction. • PIMA Indian Diabetes Dataset (PIDD) has been popularly used dataset for diabetes prediction. • The most frequently used approaches for the prediction of hypertension and diabetes are RF and SVM (Fig. 4). • According to recent studies, those who have diabetes have a higher risk of developing high BP. • More research is needed to establish the association between BP and blood glucose. • From the literature, it was observed that the pulse rate is not a good predictor of BP levels. The major discovered gaps on which scientific research investigations must focus are as follows:-
Prediction of Blood Pressure and Diabetes with AI Techniques—A Review
757
Table 3 Summary of AI algorithms on hypertension and diabetes prediction Study
Dataset/data
AI techniques
Result (%)
[19] (2017)
PIMA
LDA, NB, QDA, GP
GP accuracy = 81.97 –
[20] (2018)
Hospital data
Diabetes prediction RF accuracy = 80.84 Required more using RF, J48, Neural Sensitivity = 84.95 indexes in dataset Network Specificity = 76.73 MCC = 61.89
[21] (2018)
NHANES dataset
Hypertension prediction using logistic regression
[22] (2018)
338 data
Diagnosis of diabetes Accuracy = 89.63 using RF Specificity = 96.87 Sensitivity = 98.8
[23] (2019)
39 ICU data
BP estimation using ANN-LSTM
SBP MAP = 1.10 SBP RMSE = 1.56 DBP MAP = 0.58 DBP RMSE = 0.85
Examining the model with large dataset is needed
[24] (2019)
Collected diabetes dataset
Diabetes estimation using AdaBoost, gradient boost, RF, logistic regression, extra trees, LDA
Logistic regression accuracy = 98.8
–
[25] (2020)
Healthcare Dataset of California
RNN, LSTM, RNN GRU
Accuracy of RNN GRU = 73–83
–
[26] (2020)
Data records from QBB
Hypertension prediction using DT, RF, logistics regression
RF accuracy = 82.1 PPV = 81.4 Sensitivity = 82.1 F-measure = 81.6
Algorithms were used to small sample size
[27] (2021)
Prevalidated Hypertension virtual dataset prediction using hybrid ML approach
Results demonstrate that the hybrid approach can enhance the performance by up to 27%
Not tested using large dataset contains unhealthy individuals data
[28] (2021)
Physical examination dataset
RF accuracy = 82 AUC = 92 Sensitivity = 83 Specificity = 81
Generalization of RF algorithm to other population needs several additional research studies
Hypertension prediction using RF, CatBoost, MLP, and LR
AUC = 73 Sensitivity = 77 Specificity = 68 PPV = 32
Limitations
Generalizing the model to other races may not be possible
(continued)
758
G. R. Ashisha and X. Anitha Mary
Table 3 (continued) Study
Dataset/data
AI techniques
Result (%)
Limitations
[29] (2022)
PIMA
Diabetes prediction using RF, KNN, logistic regression, DT, AdaBoost, voting classifier
Voting classifier accuracy = 80.95
–
[30] (2022)
DHS survey data
Hypertension prediction using XGB, GBM, LR, LDA, DT, and RF
GBM accuracy = 90 Precision = 90 Recall = 100 F1-score = 95 Log loss = 3.33
Only two risk factors of hypertension were identified
88 87.5
87.4 86.8
87 86.5 86
85.2
85.5 85
84.5
84.5 84 83.5 83 RF
SVM Diabetes Prediction
BP Prediction
Fig. 4 Prediction of BP and diabetes
• Creating a reliable BP and diabetes prediction system for quantifying BP and diabetes in real-world applications. • Designing a user-friendly, adaptable, and most interestingly durable multisensory device comprised of sensors (PPG, ECG) that can be utilized to obtain accurate and consistent data collection. • Enhancing the effectiveness and performance of BP and diabetes prediction by using deep learning technique. The quality of BP prediction obtained using ML is associated with the data quality, ML algorithm’s performance, choosing significant attributes, and suitable validation algorithm. In the majority of published research papers, more than one biological signal was used to predict BP. Then the original data had been preprocessed by eliminating artifacts and unwanted signal with filtration systems followed by feature selection and extraction. The work presented here included crucial information regarding
Prediction of Blood Pressure and Diabetes with AI Techniques—A Review
759
past studies with ML approaches employed in that design, its benefits, shortcomings, and challenges. This study will assist future researchers in selecting the best sensors and ML algorithms for accomplishing their aim of hypertension and diabetes prediction.
References 1. The top 10 causes of death. https://www.who.int/news-room/fact-sheets/detail/the-top-10-cau ses-of-death. Accessed 31 Jan 2023 2. Diabetes. https://www.who.int/news-room/fact-sheets/detail/diabetes. Accessed 06 Feb 2023 3. Use of glycated haemoglobin (HbA1c) in diagnosis of diabetes mellitus. https://www.who.int/ publications/i/item/use-of-glycated-haemoglobin-(-hba1c)-in-diagnosis-of-diabetes-mellitus. Accessed 07 Feb 2023 4. Classification and Diagnosis of Diabetes (2022) Standards of medical care in diabetes. Diab Care 45:S17–S38 5. Hypertension. https://www.who.int/health-topics/hypertension#tab=tab_1. Accessed 08 Feb 2023 6. Bereda G (2022) A review of the hybrid description of diabetes mellitus. BOHR Int J Curr Res Diab Prev Med 1(2):35–38 7. Kee OT (2023) Cardiovascular complications in a diabetes prediction model using machine learning: a systematic review. Cardiovasc Diabetol 22(1) 8. Brunström M, Carlberg B (2016) Effect of antihypertensive treatment at different blood pressure levels in patients with diabetes mellitus: systematic review and meta-analyses. BMJ 352 9. Zhang X (2023) Hyperglycaemia in pregnancy and offspring blood pressure: a systematic review and meta-analysis. Diabetol Metab Syndr 15 10. Gholizadeh-Moghaddam M, Shahdadian F, Shirani F, Hadi A, Clark CCT, Rouhani MH (2023) The effect of a low versus high sodium diet on blood pressure in diabetic patients: a systematic review and meta-analysis of clinical trials. Food Sci Nutr 11. Teng XF, Zhang YT (2023) continuous and noninvasive estimation of arterial blood pressure using a photoplethysmographic approach. In: Annual ınternational conference of the IEEE engineering in medicine and biology—proceedings, 4, pp 3153–3156 12. Tanveer MS, Hasan MK (2019) Cuffless blood pressure estimation from electrocardiogram and photoplethysmogram using waveform based ANN-LSTM network. Biomed Signal Process Control 51:382–392 13. Hassan MKBA, Mashor MY, Nasir NFM, Mohamed S (2008) Measuring blood pressure using a photoplethysmography approach. IFMBE Proc 21:591–594 14. Minn (2009) IEEE Engineering in Medicine and Biology Society. Annual conference (31st : 2009 : Minneapolis et al., EMBC 2009: proceedings of the 31st annual ınternational conference of the IEEE engineering in medicine and biology society: engineering the future of biomedicine, 2–6 15. Ruiz-Rodríguez JC (2013) Innovative continuous non-invasive cuffless blood pressure monitoring based on photoplethysmography technology. Intensive Care Med 39(9) 16. Datta Institute of Electrical and Electronics Engineers (2016) 2016 IEEE ınternational conference on communications, pp 22–27 17. Kandhasamy JP, Balamurali S (2015) Performance analysis of classifier models to predict diabetes mellitus. Procedia Comput Sci 47:45–51 18. Refat RAA (2021) A comparative analysis of early stage diabetes prediction using machine learning and deep learning approach using machine learning and deep learning approach. In: International conference on signal processing computing and control 19. Maniruzzaman M (2017) Comparative approaches for classification of diabetes mellitus data: machine learning paradigm. Comput Methods Programs Biomed 152:23–34
760
G. R. Ashisha and X. Anitha Mary
20. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H (2018) Predicting diabetes mellitus with machine learning techniques. Front Genet 9 21. López-Martínez F, Schwarcz MDA, Núñez-Valdez ER, García-Díaz V (2018) Machine learning classification analysis for a hypertensive population as a function of several risk factors. Expert Syst Appl 110:206–215 22. Samant P, Agarwal R (2018) Machine learning techniques for medical diagnosis of diabetes using iris images. Comput Methods Programs Biomed 157:121–128 23. Tanveer MS, Hasan MK (2019) Cuffless blood pressure estimation from electrocardiogram and photoplethysmogram using waveform based ANN-LSTM network. Biomed Signal Process Control 51:382–392 24. Mujumdar A, Vaidehi V (2019) Diabetes prediction using machine learning algorithms. Procedia Comput Sci 165:292–299 25. Ljubic B (2020) Predicting complications of diabetes mellitus using advanced machine learning algorithms. J Am Med Inform Assoc 27(9):1343–1351 26. AlKaabi LA, Ahmed LS, Al Attiyah MF, Abdel-Rahman ME (2020) Predicting hypertension using machine learning: findings from Qatar Biobank Study. PLoS ONE 15(10) 27. Magbool A, Bahloul MA, Ballal T, Al-Naffouri TY, Laleg-Kirati TM (2021) Aortic blood pressure estimation: a hybrid machine-learning and cross-relation approach. Biomed Signal Process Control 68 28. Zhao H (2021) Predicting the risk of hypertension based on several easy-to-collect risk factors: a machine learning method. Front Public Health 9 29. Dhande B, Bamble K, Chavan S, Maktum T (2022) Diabetes & heart disease prediction using machine learning. ITM Web Conf 44:03057 30. Islam SMS (2022) Machine learning approaches for predicting hypertension and ıts associated factors using population-level data from three South Asian countries. Front Cardiovasc Med 9 31. Sisodia D, Sisodia DS (2018) Prediction of diabetes using classification algorithms. Procedia Comput Sci 132:578–1585 32. Bani-Salameh H (2021) Prediction of diabetes and hypertension using multi-layer perceptron neural networks. Int J Model, Simul, Sci Comput 12:2
Comparative Analysis of Various Machine Learning Algorithms to Detect Cyberbullying on Twitter Dataset Milind Shah, Avani Vasant, and Kinjal A. Patel
Abstract The advent of the digital era has seen the rise of social media as an alternative mode of communication. The usage of social media platforms for facilitating contact between individuals has become widespread. As a direct result of this, conventional modes of communication have been replaced by digital modes, thanks to social media. Because of the growing prevalence of cyberbullying, this digital development on social media platforms is a significant problem that must be addressed. Bullies have various options to harass and threaten individuals in their communities because of the platforms that are already available. It has been argued that a number of different tactics and approaches may be employed to combat cyberbullying via the use of early identification and alerts to locate and/or protect victims of cyberbullying. Methods from the field of machine learning (ML) have seen widespread use in the search for language patterns used by bullies to cause damage to their victims. This research paper analyzes standard supervised learning and ensemble machine learning algorithms. The ensemble technique utilizes random forest (RF) and AdaBoost classifiers, whereas the supervised method uses Gaussian Naive Bayes (GNV), logistic regression (LR), and decision tree (DT). We use the dataset to train and evaluate our binary classification model to classify abusive language as bullying or non-bullying and extract Twitter features using term frequency-inverse document frequency (TF-IDF). Downloaded the dataset from Kaggle. This paper analyzes each machine learning algorithm. Ensemble-supervised algorithms outperformed standard supervised algorithms in the analysis. With a dataset, the random forest classifier performed best with 92% accuracy, while the Naive Bayes classifier performed worst with 62% accuracy.
M. Shah (B) · A. Vasant Department of Computer Science and Engineering, Krishna School of Emerging Technology and Applied Research (KSET), Drs. Kiran and Pallavi Patel Global University (KPGU), Vadodara, Gujarat, India e-mail: [email protected] K. A. Patel Faculty of Computer Applications and Information Technology, Gujarat Law Society University, Ahmedabad, Gujarat, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_52
761
762
M. Shah et al.
Keywords Machine learning · Supervised learning · Ensemble machine learning · Cyberbullying · Term frequency-inverse document frequency · Analysis · Detection · Twitter
1 Introduction Cyberbullying is a prevalent practice that may have negative psychological, behavioral, and physiological effects on its victims. According to research, cyberbullying happens worldwide, throughout the developmental lifespan, and among both genders. Cyberbullying is an increasingly significant and serious problem in society that may have harmful effects on people. It is the use of the Internet, mobile phones, and other electronic devices to intentionally harm or harass people. Cyberbullying is increasing in prevalence because of the recent popularity and expansion of Facebook and Twitter are various types of social media. The American Psychological Association and the White House have identified it as a significant threat to public health. The National Crime Prevention Council also found that over 40% of US youths have been bullied on social media. To keep these online social media platforms healthy and secure, automated systems for detecting these events have become vital [1]. Figure 1 shows the types of cyberbullying. According to Williard (2004), there are eight different forms of cyberbullying, some of which include impersonation, defamation, and harassment. It has been close to two decades since social media sites were first introduced, but despite this, there have not been many successful measures taken to combat social bullying, although it has emerged as one of the most concerning problems recently. Recent researchers have also focused on automatic cyberbullying detection mechanisms that tap into individuals’ psychological characteristics such as personality, sentiment, and emotions. We present a model for detecting cyberbullying based on two supervised machine learning methods to detect cyberbullying content (standard and ensemble methods). Naive Bayes, logistic regression, decision tree, random forest, and AdaBoost classifiers were utilized. In this research, the Gaussian Naive Bayes classifier performed the worst, whereas the random forest classifier performed best across all parameters. To evaluate the performance of all classifier algorithms, we have used various metrics including accuracy, precision, recall, F1-score, and ROC area, which is discussed in detail in Sect. 8. The results of the evaluation showed that the ensemble-supervised algorithms outperformed the supervised algorithms. This paper is covered by following subsections: Sect. 1 is an introduction, which discusses about cyberbullying with its current research opportunities. Section 2 describes the impact of cyberbullying. Section 3 describes Research Questions. Section 4 discusses related work. Section 5 describes existing approaches and limitations, which compare the method or algorithm used with its future scope or limitations. Section 6 is the proposed methodology. Section 7 covers Novelty of the Research. Section 8 describes results and analysis. Section 9 describes Algorithm of
Comparative Analysis of Various Machine Learning Algorithms …
763
Online fights with angry languafe
Harassment
Repeatedly sending mean or insulting messages
Denigration
Sending gossips & rumours
Outing
Sharing secrets or embarassing information
Trickery
Tricking someone to sharing secrets
Impersonation
Pretending to be someone else
Exclusion
Cruelly excluding someone
Cyberstalking
Intense Harassment
Types of Cyberbullying
Flaming
Fig. 1 Types of cyberbullying
Proposed Work and the final in Sect. 10 describes conclusion and future work, which concludes the paper.
2 Impact of Cyberbullying Although there is no physical interaction between the bully and the victim during cyberbullying, it may nevertheless have a negative impact on the victim’s mental health. 8% of people who were bullied online pondered ending their own lives, and many victims who were teenagers stopped going to school or college. Headaches, insomnia, stomach discomfort, loss of appetite, and nausea are some of the physical effects that might result. The effects on a person’s mental health might include
764
M. Shah et al.
feelings of anxiety, isolation, and sadness. Their sense of self-worth is diminished. Because people with low self-esteem are an easy target for bullies online, low self-esteem may also be the beginning of cyberbullying.
3 Research Questions This research paper discusses the following questions. Q1: What are the most essential features to detect cyberbullying using standard and ensemble models? Q2: How effective are decision trees, logistic regression, and Naive Bayes models in detecting cyberbullying? Q3: In detecting cyberbullying, how does the performance of a random forest and the AdaBoost ensemble models are different compared to standard models?
4 Related Work Till now, many researches have been done to find possible solutions for cyber-attacks on various social networking sites. In [2] Tommy K. H. Chan et al., to integrate all of the previously acquired information by conducting thorough research and examination of the relevant body of literature. First, we go through the background, research trends, and theoretical foundations of the problem. After that, we use the social cognition theory to summarize what is known and identify what is needed to be discovered, focusing on the compositional mutual interactions between offenders, victims, and bystanders. When evaluating and using the results of this research, it is important to keep in mind that the research does have a few limitations. In [3] Umaa Ramakrishnan et al., to do an in-depth analysis of each current approach and then offer an innovative answer to the issue identified. Statistical methods are where you’ll find the most successful techniques. We offer a statistical strategy to tackle the issue of BOW, which stands for “bag of words.” The difficulty with sentiment analysis that arises when users attempt to use an excessive number of filters and apply all of them at the same time is something that the proposed method promises to fix. This approach gives the user the authority to apply filters and create an overall result of the polarity of the Tweets generated in a certain area. The algorithm has provided correct findings, but the execution of these results takes too long. In future, we want to focus on improving the time efficiency of the system and making it function in a more robust manner. In [4] Vimala Balakrishnan et al., to deal with automated mechanisms for the identification of cyberbullying by tapping into the psychological characteristics of Twitter users, such as their personalities, moods, and emotions. The big Five and Dark Three models identify user identities, and Naive Bayes, random forest, and
Comparative Analysis of Various Machine Learning Algorithms …
765
J48 classify tweets into vulgar, aggressor, spammer, and normal categories. The #Gamergate hashtag was used to compile the 5453 tweets that were then manually annotated by human subject matter experts and uploaded to Twitter for analysis. As a starting point for the algorithm, we employed a selection of Twitter’s many properties, including text, user, and network-based information. The findings indicate that the identification of cyberbullying was enhanced when personalities and feelings were taken into consideration; however, the same impact was not shown with regard to emotion. However, future research might investigate the use of a cyberbullying dataset that is more evenly distributed, or it could construct synthetic examples with the use of methods such as the synthetic minority oversampling technique. In [5] Wahyu Adi Prabowo et al., to evaluate the relevance of the sentiment included within comments made on social media platforms, it is necessary to analyze the sentiment that is included within the comments that were made on social media platforms. At the stage of sentiment analysis, the comment data will be processed, and the processing will consist of the following steps: the pre-processing stage, the TF-IDF, and SVM classification methods. One categorization method will be term frequency-inverse document frequency (TF-IDF). Comment data are 1500 items of Python-crawled Web data. Additionally, data training and testing are to be segregated into 80% and 20%, respectively. The evaluation result showed 93% accuracy, 95% precision, and 97% recall. The accuracy, precision, and recall values are all clocking in at the same time. Additionally, the creation of a system model is worked on as part of this research. This enables a browser to connect to the system to launch a user page on comment classification. In [6] Sara Bastiaensens et al., experimental scenario research should be undertaken if someone is being bullied on Facebook to evaluate the impact of contextual characteristics such as severity and viewer personalities and behaviors. Help the bully or victim. This research will help us establish how contextual elements such as event intensity and bystander personality and conduct affect behavior. This research included 453 Flanders secondary school students. On the other hand, more serious instances encourage observers to support victims more. This was evidenced by the fact that they were more likely to provide assistance. Bystanders’ behavioral intentions to aid the victim were further influenced by the severity of the occurrence, which interacted with the identities of other witnesses to further shape those intentions. On the other hand, onlookers’ behavioral intentions to engage in the bullying were observed to be greater when the other witnesses were close friends rather than co-workers. This was true irrespective of whether the bully was known to the bystanders who saw the incident. Addition, it was shown that an interaction effect between the identification of other bystanders and their behavior influenced the behavioral intentions of individuals to participate in the bullying. This interaction effect was shown to influence the behavioral intentions of individuals. There were also disparities between the sexes in terms of the elements that aided and those that reinforced behavioral objectives. This experimental research also has a few shortcomings, which should be brought to everyone’s attention. To begin, we must be clean and disclose that we did not assess whether or not the harassing circumstances that were used are in fact considered to constitute cyberbullying by the youngsters
766
M. Shah et al.
who participated in the research. When it comes to deciding whether an activity constitutes cyberbullying, research has revealed that youngsters primarily examine the intentionality of the behavior as well as the power imbalance involved. In the second phase of the research, the participants were given hypothetical situations to read about in the form of a questionnaire at their respective schools. On the other hand, these conditions are quite unlike those in which young people often come into contact with harassing behavior on social networking sites. In [7] Mifta Sintaha et al., Twitter is a social media platform; therefore, thus you should implement a system that can monitor it for evidence of criminal activity, such as fraud, extortion, spam, impersonation, and so on. People will be better able to identify potential criminal activities and early threats, as well as the sorts of accounts they should be on the lookout for real time, as a result of this type of research, which will result in a more secure use of social media. To discover how to identify instances of cyberbullying that take place on the internet and how to be vigilant about potential dangers in both the real world and the virtual world, we will conduct a comparison to see which of the three provides the greatest level of accuracy. We use Naive Bayes, support vector machine, and convolutional neural networks (CNNs) to predict social media bullying and threats using sentiment analysis. Because of this, we will be able to determine how to identify instances of cyberbullying behavior on the Internet and remain alert to dangers in the physical world. Because of time constraints, we were unable to put into practice a number of strategies that we had devised in an effort to make the output of our algorithms more precise. The following is a list of the primary restrictions placed on our experiment: sentence construction, distinction between relevant items, and ones that are irrelevant, grouping of synonyms, categorizing in terms of opinion orientation, there is often a hidden meaning in sentences, which raises privacy concerns. In [8] Samar Almutiry et al., to put the Arabic sentiment analysis (ASA) into action, we are making use of recent developments in computing technology to train a dataset automatically created by ArabiTools using the Twitter API using ML and SVM algorithms. To ensure that the effectiveness of the cyberbullying tweet detection process is not compromised, the contents of the dataset are tagged using both automated and human methods. Cyberbullying is when someone uses technology, such as the Internet, to bully another person by being aggressive and using offensive language. The tweets in the dataset are categorized in an automated fashion according to their subject matter. A tweet is considered to be cyberbullying if it includes one or more terms that fall under that category. On the other hand, if the tweet does not contain any words that may be seen as aggressive, it is considered non-bullying. Pre-processing includes normalization, tokenization, light stemming, ArabicStemmerKhoja, and TF-IDF word weighting. WEKA and Python apply the SVM after the preparation steps. The first experiment used the light stemmer, the second used the ArabicStemmerKhoja, and the third used Python. WEKA correctly identifies text, but Python creates the model faster. 85.49% and 352.51 s for WEKA’s light stemmer. ArabicStemmerKhoja works 85.38% in 212.12 s for WEKA. Python takes 142.68 s and is 84.03% efficient. Using SA and ML, the researchers behind this research aimed to identify instances of cyberbullying that were incorporated inside
Comparative Analysis of Various Machine Learning Algorithms …
767
tweets written in Arabic. Because of clitics and affixes, Arabic terminology and structure are complex. Arabic has greater difficulties. Arabic words are constructed from root words, making it difficult to remove their prefixes, suffixes, and clitics. Punctuation might vary. Arabic’s many synonyms made processing tough, and its SA’s major issue was research. Arabic is complex. In [9] Iain Coyne et al., they analyzed how bullying method (offline versus online), type (personal versus work-related), and target similarity (friend versus work colleague) affected bystanders’ intentions to interact, understand, and interact with the victim (defender role), enhance the victim (prosecutor role), or be neutral. They have also investigated the influence that the form of bullying plays (offline vs online) as well as the kind of bullying that occurs (personal versus work-related) (commuter role). The research was carried out with a design that was based on scenarios. The findings revealed a trend in which the manner and kind of the event influenced the intentions of bystanders. Bystanders supported the criminal more than the victim in cyberbullying and employment-related cases. Target proximity appears to affect bystanders’ intentions. Even though the effect sizes were small, bystanders were more willing to defend the victim and less likely to support the criminal when the target was a friend. Even with tiny impact sizes. Highlighting future research and bystander education. In [10] Sayanta Paul et al., to propose an innovative use of BERT, such as for the detection of cyberbullying. A basic BERT-based classification model can achieve state-of-the-art findings on three real-world corpora. Twitter, Formspring, and Wikipedia have less than 12,000 posts (less than 100,000 posts). The results of our experiments show that our suggested model makes considerable gains over previous efforts, especially in comparison with attention-based or slot-gated deep neural network models. The incorporation of other modalities of information, such as pictures, videos, and audio data, is an aspect of the work that will be covered in future scope of this research. In [11] Jalal Omer Atoum et al., a SA model is proposed for recognizing bullying messages on the social media platform Twitter. This model uses supervised machine learning classification methods SVM and Naive Bayes when compared with research that was performed similarly in the past. When applied to similar texts, a wider ngrams language model produced promising outcomes. Additionally, the findings demonstrated that SVM classifiers had superior performance metrics than NB classifiers when applied to tweets of this kind. In conclusion, for guiding future research on the identification of cyberbullying, we would want to investigate more machine learning approaches, such as neural networks and deep learning, using bigger collections of tweets. To manage such a vast collection of tweets, it is also necessary to implement certain tried and true techniques for an automated annotation process. In [12] Qing Li et al., to examine the ideas and practices of high school students that are connected with cyberbullying. In particular, it investigates this relatively recent occurrence from the following four points of view: (a) What happens next for kids who have been the target of cyberbullying? (a) When they see other students being bullied online, what should students do? (c) For what reasons do victims choose not to disclose the incidents? And (d), what are some of the kids’ perspectives
768
M. Shah et al.
on the issue of cyberbullying? 269 Canadian 7th–12th graders from five schools provided information. The investigation has uncovered some significant recurring motifs, which have developed because of the examination. A finding is that more than forty percent of people would take no action if they were the target of cyberbullying, and only around one in ten would tell an adult. Students have various reasons for why they are hesitant to report instances of cyberbullying to officials in their schools. In [1] Harsh Dani et al., to present an efficient optimization framework to make use of sentiment information to properly identify cyberbullying activity on social media. This will be accomplished by leveraging this information. The superiority of the proposed framework was shown by experimental findings on two datasets derived from real-world social media. Additional research illustrates the usefulness of using sentiment information for cyberbullying detection. There are numerous potential ways to go forward. However, although the majority of the work done so far in the detection of cyberbullying has been identified in the English language, it is essential to build systems that can handle other languages as well. In other instances, bullying postings may include a nasty language that is spread over numerous posts. Investigating the effect of the sarcastic information included in the post is another approach that might be pursued to improve the ability to identify instances of cyberbullying. In [13] Gianluca Gini et al., to evaluate whether the responses of bystanders to various forms of bullying affected students’ perceptions of bullying, attitudes toward victims, and feelings of safety at school. In the first part of the research project, known as Study 1, a total of 217 middle school students were given the opportunity to read an imagined scenario that detailed an instance of direct bullying. In Study 2, 376 students in elementary school and 390 children in middle school were given situations that described both direct and indirect forms of bullying. The direct form of bullying was referred to as “teasing.” In each instance, both the responses of the spectators to the bullying and gender of the person who was bullied were fabricated. Participants gave their approval to the prosocial activity that was beneficial to the victims, but they did not give their approval to the harassing behavior. In addition to this, they considered responses of indifference to the bullying to be undesirable conduct. Positive views toward victims were shown by the participants, with these sentiments being much more prevalent at younger-grade levels and among females. Both the participants’ views of the victims and their feeling of how safe the school was to them were impacted by the actions of bystanders. We address the implications of this finding for anti-bullying strategies that take the group ecology into account. Furthermore, future research that analyzes the group variables that are involved in the phenomenon of bullying and the characteristics of the ecological system in which bullying emerges and continues to exist will require the use of a variety of methods and instruments. This is because the phenomenon of bullying arises from and persists within an ecological system. In [14] Vinita Nahar et al., to provide a suggestion for an efficient method that can identify and order the most significant people (predators and victims). Through the use of a detection graph model, it makes the issue of network communication more manageable. According to the findings of the experiment, this method has a very high degree of precision. In our future research, we want to continue the in-depth
Comparative Analysis of Various Machine Learning Algorithms …
769
investigation of indirect bullying and its evolving patterns, with the goal of assisting in the diagnosis of cyberbullying and the prevention of it. In [15] Ulfa Khaira et al., abuse may take several forms, one of which is cyberbullying. Most social media platforms include this kind of bullying. Twitter enables users share and communicate. Across the years, Twitter cyberbullying has increased. An examination of a tweet’s sentiment may be used to determine whether it contains bullying behavior. The spectrum of feelings may be broken down into three categories: bullying, non-bullying, and neutral. The categorization of cyberbullying requires three steps: the acquisition of a dataset, the preparation of the data, and the classification method itself. In this research, an algorithm called sentiStrength was used. This method takes a lexicon-based approach. This SentiStrength lexicon includes the weight of the sentiment strength associated with its entries. The evaluation findings of 454 tweets yielded 87 neutral tweets (19.1%), 161 tweets that did not include bullying (35.4%), and 206 tweets that did involve bullying (45.4%). The accuracy value produced by this research comes in at 60.5%. In [16] Semiu Salawu et al., to present a comprehensive analysis of published research on cyberbullying detection methods. This research should be sourced from the Scopus, ACM, and IEEE Xplore databases. Our in-depth research study classified the various existing approaches into four primary classifications: supervised learning, linguistic, rule-based, and blended methods. When it comes to developing prediction models for the identification of cyberbullying, supervised learning-based techniques often make use of classifiers such as SVM and Naive Bayes. Lexicon-based systems make use of word lists and determine whether or not cyberbullying has occurred based on the existence of certain terms within the word lists. Mixed-initiative systems use human-based reasoning and rules-based techniques to detect bullying. Two major obstacles that are currently hindering cyberbullying detection research include a lack of high-quality, representative, and labeled datasets and a lack of researchers taking a holistic approach to the problem of cyberbullying while designing detection algorithms. This research analyzes the state of cyberbullying detection research and helps academics decide where to focus their future research. Determining a victim’s emotional state after a cyberbullying incident, detecting non-textual cyberbullying, and expanding cyberbullying role detection beyond victims and bullies are some of the future approaches that we advocate for advancing cyberbullying detection research. Word representation learning for identifying cyberbullying and making evaluations of annotations.
5 Existing Approach Limitations Twitter dataset analysis is shown in Table 1.
770
M. Shah et al.
Table 1 Twitter dataset analysis Author name
Publication with year
Methods/ algorithms used
Dataset used
Accuracy
Limitations
Umaa Ramakrishnan et al. [3]
IJAER, 2015
Natural language processing algorithm
Twitter streaming API
87%
The algorithm has provided correct finding; however, the execution of those results takes too long. In the coming years, we want to improve the time efficiency of the system and make it operate in a more robust manner
Samar Almutiry et al. [8]
Egyptian Society of Language, 2021
SVM Algorithm
AraBully Tweets
85.49%
Python requires less time to construct the classification model, whereas WEKA requires more time
Iain Coyne et al. [9]
Crossmark, 2017
–
–
–
When analyzing the relationship between the manner and kind of bystander intervention intention, future researches should take into consideration trait empathy and moral identity
Jalal Omer Atoum et al. [11]
IEEE, 2020
Naïve Bayes and SVM algorithm
With the goal of advancing our research on the identification of cyberbullying, we would want to investigate further machine learning approaches, such as neural networks and deep learning, using more extensive Twitter datasets. To manage such a vast collection of tweets, it is also necessary to implement certain tried and true techniques for an automated annotation process
Comparative Analysis of Various Machine Learning Algorithms …
771
6 Proposed Methodology Our proposed methodology for detecting cyberbullying is discussed in this section. Additionally, the Twitter dataset is used, and the algorithms (standard and ensemble) are also discussed. To detect cyberbullying text from a Twitter dataset, you can use a combination of machine learning algorithms. There are several steps for detecting cyberbullying text such as data pre-processing, feature extraction, data resampling, and much more. Our system comprises unprocessed datasets, natural language processing (NLP), machine learning models, and results analysis. Figure 2 explains the proposed methodology for detecting cyberbullying. Data involves collection, pre-processing, and feature extraction.
6.1 Datasets For the final result, we used the Kaggle-obtained Dataturks Twitter dataset for Cybertroll detection [17]. It is important that we select a dataset due to the severity of the challenge we want to address that is comprehensive, credible, appropriate, and up-to-date. While we evaluate a large number of different datasets, the majority of them lack features, are of low quality, or are shown to be irrelevant during manual review. Therefore, after investigating several different open source datasets, we chose [17], which seemed to meet all the necessary criteria. The datasets contain the following fields: • Content—Text of the tweets. • Annotation—The label of the tweet, either “normal” or “cyberbullying/cyberaggression.” • Extras—Additional information about the tweets, such as the ID and date.
Fig. 2 Proposed framework for cyberbullying detection
772
M. Shah et al.
Table 2 .
Twitter Total instances
20,001
Cyberbullying instances
7822
Non-cyberbullying instances
12,179
The dataset was collected using the Twitter search API, and the tweets were filtered to include only those in the English language. The tweets were then labeled by human annotators through a crowdsourcing process using the Dataturks platform. These tweets cover a wide range of topics and include potentially offensive language and content. This is the dataset’s detailed description: (1) This dataset is partially labeled manually. (2) Overall Instances: 20,001. The dataset has two attributes: tweet and label (0 represents No and 1 represents Yes). Additionally, the dataset should be used with caution due to the potentially offensive language and content it contains.
6.2 Data Collection The dataset was in JSON format. Due to the relative simplicity of the dataset description, the unique set of columns in the description attribute have been eliminated and substituted with label values to simplify the following step. The number of samples for each class is shown in Table 2.
6.3 Data Pre-processing The pre-processing step is done using the NLTK library and regex like this: 1. Word Tokenization—A token is a unique instance used to generate sentences or paragraphs. Our text gets tokenized into a list of individual characters via word tokenization. 2. Stop Words Elimination—Using nltk.corpus.stopwords.words (“English”), stop words from the English dictionary are retrieved and removed. Stop words— meaningless terms like “the,” “a,” “an,” and “in”—don’t affect data analysis. To remove stop words from a statement, you can split the text into individual words and then remove the word if it appears on the NLTK-provided list of stop words. 3. Punctuation Elimination—String.punctuation validates just the non-punctuation characters placed here.
Comparative Analysis of Various Machine Learning Algorithms …
773
4. Stemming—Syntactic normalization that reduces words to their basis. Stemming tokens using nltk.stem.porter.PorterStemmer produce stemmed tokens. “Connect” replaces “connection,” “connected,” and “connecting.” 5. Digit Elimination—Additionally, we eliminated any numeric data since it does not contribute to cyberbullying. 6. Feature Extraction—Feature extraction has achieved through TF-IDF. Extract features for ML algorithms next. We utilized Python’s sklearn module and the TF-IDF transformer. The TF-IDF measures word importance statistically. The document’s word frequency is multiplied by the inverse. TF-IDF diminishes words that exist in several texts, making them unusable. CountVectorizer calculates word frequency. The output matrix includes each document (row), word (column), and weight (tf * idf values). A document with a high TF-IDF typically contains a term that is absent from others. Signature words are required. To demonstrate manual attribute evaluation, the top 25 terms with estimated TF-IDF scores were shown. Hate, fuck, damn, suck, ass, that, lol, im, like, you, it, get, what, no, would, and bitch were among the most ranking terms in the dataset.
6.4 Data Resampling Data loss necessitates reprocessing the training data. We used the minority class due to sufficient data. This function overwrites the minority class to return 1000 suppose that the majority class has 1000 and the minority class has100. All “nonminority” classes utilize the imblearn RandomOverSample function for redundancy. Our minority is one. Re-sampled training data included 9750 cyberbullying and non-cyberbullying states.
6.5 Machine Learning Algorithms A. Gaussian Naïve Bayes Naive Bayes classifiers are a group of classification algorithms that are based on Bayes’ theorem, which can be found in mathematics. Bayes’ theorem can be found in the field of mathematics. It is possible to explain Bayes’ theorem in layperson’s terms as a description of the likelihood of an occurrence, depending on the previous knowledge of circumstances that might be associated with the event. Bayes’ theorem describes the likelihood of an occurrence as a description of the likelihood of an occurrence. Its algorithms require that each pair of attributes being categorized is independent. Algorithm group. It is a group of algorithms operating together. The Naive Bayes algorithm is a classification approach that may be used for problems involving binary (two-class) and multi-class categories. The methodology was developed by Bayes himself. It is much simpler to understand the procedure in its totality
774
M. Shah et al.
from reading about it when it is supplied with input values that are binary or categorical. The computation of the probability for each hypothesis is simplified in naive Bayes, producing its name. This is done to make the calculation of the probabilities feasible. Most individuals use a Gaussian distribution to apply Naive Bayes to real-valued attributes. This Naive Bayes extension is called a Gaussian. In addition to the Gaussian Naive Bayes model, there are further models known as the Bernoulli Naive Bayes model and the Multinomial Naive Bayes model. We went with the Gaussian Naive Bayes model since it is the most common and one of the easiest to put into practice. All we must do to estimate its parameters take the training data and calculate its mean and standard deviation. The sklearn.naive bayes package was used in the development of the classifier. B. Logistics Regression The method of predictive modeling known as regression analysis examines the relationship between a dataset’s goal variable, also known as the dependent variable, and the dataset’s independent variables. The methods of regression analysis are used in situations in which the target variable comprises continuous values and the connection between the independent variables and the target variable may be characterized as either linear or non-linear. In regression analysis, a step includes calculating the best fit line, which is a line that traverses all the data points in a manner that reduces the amount of space that separates the line from each data point to the smallest possible value. A method of regression analysis known as logistic regression is an option to consider using in situations in which the format of the dependent variable being researched is discrete. Example: 0 or 1, true or false, etc. As a result, the target variable can only have one of two values, and the relationship between it and the independent variable is represented by a sigmoid curve, which transforms any value into a number between 0 and 1. Because the amount of the dataset we acquired was rather considerable and it had a relatively equal distribution of values across the target variables, we concluded that logistic regression would be the most appropriate method to apply. Additionally, there was no correlation between any independent variables that was part of the dataset that was examined. The sklearn.linear model package was used in the development of the classifier. C. Decision Tree Classifier To create a decision tree, one must first generate a collection of questions relevant to the dataset. A follow-up question was requested until a record class label was selected. Decision trees, hierarchical structures containing nodes, and directed edges can arrange inquiries and responses. This kind of organization is known as a decision tree. Root, internal, and leaf nodes are the three categories that it may be divided into.
Comparative Analysis of Various Machine Learning Algorithms …
775
Every leaf node that makes up a decision tree has a class label associated with it. The root and other internal nodes contain attribute test criteria that can distinguish records with similar features but distinct characteristics. The decision procedure starts at the root of the tree and divides the data by the feature that produces the greatest information gain (IG), or decrease in uncertainty, as we get closer to the final choice. Following that, we may iteratively divide each child node until the leaves are clean. All leaf node samples belong to the same category. The sklearn.tree package was used in the development of the classifier. D. AdaBoost Classifier This strategy is comprised of iterative ensemble algorithms such as AdaBoost. The core idea behind boosting methods is to train predictors sequential, with each succeeding predictor making an effort to improve upon the one that came before it. This is known as the “boosting principle.” The AdaBoost classifier can produce a strong classifier by combining a number of other classifiers that have a lower level of performance. This results in a robust classifier and has a high level of accuracy. The core concept that underpins AdaBoost is the notion that the weights of the classifiers should be set and that the data sample should be trained in each iteration of AdaBoost in such a manner that it offers accurate predictions of unusual observations. Any method of machine learning that allows for the application of weights to the training set can function as the fundamental classifier for the system. To a certain extent, AdaBoost is analogous to the random forest in the sense that, to get the final classification, both approaches add up the predictions that are produced by each decision tree that is included inside the forest. This is done to arrive at the correct answer. However, a few peculiarities differentiate the two of them from one another. One level of depth is applied to the decision trees that are used by AdaBoost (i.e., 2 leaves). Additionally, the model’s ultimate output, which comes in the form of a final prediction, is uniquely impacted by the predictions that are generated by each decision tree. The AdaBoost technique has each decision tree in the forest contribute a different amount to the final prediction rather than taking the average of their forecasts. Instead of averaging all forest decision tree predictions, this is done. The previous technique determines the average outcome based on the decisions of each decision tree in the forest. The sklearn.ensemble package was used in the development of the classifier. E. Random Forest Classifier The random forest classifier is a machine learning method that utilizes a large number of independent decision trees to produce an ensemble. Our model’s prediction is based on the class with the highest votes from the random forest’s individual trees. The most crucial component is low model correlation, which permits ensemble predictions to be more accurate than individual predictions. The trees protect each other from their prediction errors. Each tree in the bag is given approval to choose a random sample from the dataset via replacement, making bagging a method for diversifying models. Bagging is also known as “tree bagging.” The mentioned method is used.
776
M. Shah et al.
7 Novelty of the Research A novel cyberbullying detection model using decision trees, logistic regression, Naïve Bayes, AdaBoost, and random forest uses a combination of different machine learning algorithms to detect cyberbullying. Each of these algorithms has advantages and disadvantages, and combining them can leverage the advantages of each algorithm to improve the overall accuracy of model identification. For example, decision trees can be useful to quickly identify critical features and make biased decisions, while logistic regression can be used to model the probability of cyberbullying messages. In addition, using ensemble methods such as AdaBoost and random forest can improve the accuracy of the model by combining some weak classifiers into strong classifiers. This can help reduce the risk of bias and improve the generalization performance of the model. In general, the novelty of this approach is to combine different algorithms and methods to improve the accuracy of cyberbullying detection, making it a more effective tool for solving this important problem.
8 Results and Analysis Bullying has been around since the beginning of time; the only thing that has changed throughout the years is the manner in which it is carried out, moving from traditional forms of bullying to more modern forms such as cyberbullying. The volume of hate speech is continually growing as a direct result of the exponential growth in the popularity of user-generated online content, in particular on social media networks. The spread of hate speech over the Internet has been connected to an increase in acts of violence committed across the world against members of minority groups. These acts of violence include mass shootings, lynchings, and ethnic cleansing. We have implemented this technique using two approaches, namely supervised machine learning techniques and ensemble machine learning techniques. As a typical methods for analysis, we have found success with the Naive Bayes (Gaussian), logistics regression, and J48 decision tree approaches when working with supervised machine learning techniques. Additionally, we have used AdaBoost and random forest classifiers while using ensemble machine learning approaches. According to the findings of this research, the Gaussian Naive Bayes classifier had the worst performance overall, while the random forest classifier had the greatest results across the range in terms of each statistic. Because the random forest classifier is an expansion of the decision tree classifier, and it came out on top in all performance measures, which were to be anticipated given that it takes the results of successive recursions of the same algorithm and averages them out. This research showed that the random forest algorithm is capable of producing a high result, which is superior to the results produced by the earlier research. This is especially true when because random forest produces more accurate results because of their superior performance on low-dimensional datasets, as
Comparative Analysis of Various Machine Learning Algorithms …
777
was demonstrated by another research. A few factors contribute to random forest ensemble techniques tending to provide more accurate results than other machine learning algorithms: 1. Random forests significantly reduced by overfitting. 2. Random forests are more robust to outliers and noise than other classification methods. 3. Random forests are able to process data with a high dimension. 4. There is a low degree of difficulty in adjusting random forests. In random forest classifier, high accuracy is achieved because it utilizes numerous decision trees to generate a final prediction, which helps minimize model variance and enhance prediction accuracy. The dataset is taken from Kaggle and contains Tweets for detecting Cyber-Trolls. Data cleaning, pre-processing, and resampling have been done before implementing any machine learning algorithm used. The severity of the problem that we are trying to address made it necessary for us to choose a dataset that met the criteria of being exhaustive, trustworthy, pertinent, and concise. While we may take into consideration a great number of additional datasets as well, the vast majority of them lacked important features, lacked sufficient quality, or included useless data after being subjected to human inspection. As a result, after evaluating several alternative opensourced datasets and trying them out, we settled on since it seemed consistent with all of the necessary characteristics. The dataset comprises 20,001 instances, two characteristics (tweet and label), and is partially manually labeled. The following metrics are used to evaluate the performance of all classifiers algorithms: 1. Accuracy: Accuracy estimates the model’s valid predictions. Thus, Accuracy =
TP + TN TP + TN + FP + FN
(1)
2. Precision: Precision is the number of correctly predicted bullying tweets through the algorithm. It is said as follows: Precision =
TP TP + FP
(2)
3. Recall: Recall is the number of possible bullying tweets that are actually found by the algorithm. It is said as follows: Recall =
TP TP + FN
(3)
4. F1-Score: Gives unbiased results by class. It takes false positives and false negatives into account and calculates the weighted average of precision and recall. Calculated as follows:
778
M. Shah et al.
F1 − Score =
(2 × PRECISION × RECALL) PRECISION + RECALL
(4)
5. ROC Area: Denotes the area obtained by plotting the tp rate curve. where TP = Total True Positives TN = Total True Negatives FP = Total False Positives FN = Total False Negatives. The confusion matrix’s first column and row represent cyberbullying, while the second row and column indicate non-bullying. The reason to select these Naïve Bayes, logistics regression, decision tree, AdaBoost, and random forest classifiers is because Naive Bayes, a powerful text classification technique, might help detect cyberbullying as it usually arises from online. Naive Bayes is simple and computationally efficient, making it appropriate for huge text datasets. Logistic regression, a prominent binary classification approach, may detect cyberbullying in online discussions. Logistic regression is interpretable, indicating which characteristics are the most predictive. Decision trees can identify cyberbullying because they can handle numerical and categorical data and capture non-linear feature correlations. Decision trees can also reveal the most predictive attributes. AdaBoost uses ensemble learning to create a strong classifier from numerous weak ones. AdaBoost can identify cyberbullying better than a single model by combining weak classifiers that capture various data features. Another ensemble learning approach, a random forest, integrates several decision trees to improve accuracy. Random forest can identify cyberbullying in text data with noisy data and high-dimensional feature spaces. Random forests can provide the most predictive features. Figure 3 shows the result of Gaussian Naïve Bayes classifier. Figure 4 shows the result of logistic regression classifier. Figure 5 shows the result of decision tree classifier. Figure 6 shows the result of AdaBoost classifier. Figure 7 shows the result of random forest classifier. Performance metrices of supervised methods and ensemble methods are shown in Tables 3 and 4. Figure 8 shows the performance of each algorithm used for detecting cyberbullying. Figure 9 shows the accuracy chart of each algorithm used.
Comparative Analysis of Various Machine Learning Algorithms … Fig. 3 Naïve (Gaussian) Bayes classifier
779
780 Fig. 4 Logistic regression classifier
M. Shah et al.
Comparative Analysis of Various Machine Learning Algorithms … Fig. 5 Decision tree classifier
781
782 Fig. 6 AdaBoost classifier
M. Shah et al.
Comparative Analysis of Various Machine Learning Algorithms …
Fig. 7 Random forest classifier
783
784
M. Shah et al.
Table 3 Supervised methods Naïve Bayes (%)
Regression (%)
Decision tree (%)
Accuracy
62
80
85
Precision
79
81
88
Recall
62
80
85
F1-score
59
81
85
ROC area
68
81
87
Table 4 Ensemble methods
AdaBoost (%)
Random forest (%)
Accuracy
71
92
Precision
74
92
Recall
71
92
F1-score
72
92
ROC area
73
92
Analysis Chart
Fig. 8 Analysis chart
100% 80% 60% 40% 20% 0% Precision
Recall
F1-Score
Naïve Bayes
Regression
AdaBoost
Random Forest
ROCArea
Decision Tree
Comparative Analysis of Various Machine Learning Algorithms … Fig. 9 Performance metrics of accuracy
785
Accuracy
80% 62%
Naïve Bayes
92%
85% 71%
Logistic Decision AdaBoost Random Regression Tree Forest
9 Algorithm of Proposed Work Inputs: Cyberbullying.csv: A CSV file containing a dataset of social media posts and their labels (either “0” for non-cyberbullying or “1” for cyberbullying). Outputs: rfc_model.pkl: A file containing a trained Random Forest Classifier model. 1. 2. 3. 4. 5. 6. 7.
Load the dataset from cyberbullying.csv. Pre-process the text data by: Converting all text to lowercase. Removing all punctuation. Tokenizing the text into individual words. Removing stop words (common words like “the” or “and”). Lemmatizing each word (reducing it to its base form, e.g., “running” to “run”). 8. Use the Bag of Words (BoW) model and Term Frequency-Inverse Document Frequency (TF-IDF) to convert each pre-processed text post into a feature vector. The BoW model represents the frequency of each word in the post as a vector. 9. Split the feature vectors and their labels into a training set and a testing set. 10. Train a Naïve Bayes, logistics regression, decision tree, AdaBoost, and random forest Classifier model on the training set of feature vectors and their labels.
786
M. Shah et al.
11. Evaluate the trained model on the testing set of feature vectors and their labels, calculating the accuracy, precision, recall, and F1-score. 12. Save the trained all the classifier model to a file called nbc_model.pkl, logreg_ model.pkl, dectree_modelpkl, adboost_model.pkl and rfc_model.pkl. This file can be used to classify new social media posts as either cyberbullying or non-cyberbullying.
10 Conclusion and Future Work In this research paper, we performed a comparison study of different supervised machine learning algorithms and analyzed different supervised ensemble approaches. At 92% accuracy, the random forest classifier performed the best. The ensemble methods outperformed the supervised methods and had a high cyberbullying truepositive rate, which is desirable. The weakest performance came from the Naive Bayes algorithm, which achieved just 61% accuracy. When creating detection algorithms, researchers often do not take a comprehensive approach to the issue of cyberbullying, which is a highly concerning reality that must be addressed. In addition, there is a shortage of labeled datasets. These are two of the most important challenges that cyberbullying detection research must address. According to our thorough review of the relevant literature, most previous cyberbullying detection studies are text-based, so next, we will develop an integrated multimedia (image, audio, and video) detection model. Deep learning algorithms such as CNN and DNN can handle multimedia content. Second, our cyberbullying detection approach is binary—bullying or non-bullying—so future research could focus on multi-class classification.
References 1. Maros H, Juniar S (2016) Sentiment informed cyberbullying detection in social media 2. Chan TKH, Cheung CMK, Lee ZWY (2021) Cyberbullying on social networking sites: a literature review and future research directions. Inf Manag 58(2):103411. https://doi.org/10. 1016/j.im.2020.103411 3. Wang Y, Zhang C, Zhao B, Xi X, Geng L, Cui C (2018) Sentiment analysis of Twitter data based on CNN. Shuju Caiji Yu Chuli/J Data Acquis. Process 33(5):921–927. https://doi.org/ 10.16337/j.1004-9037.2018.05.017 4. S. Khan and S. Khan, “Journal Pre-proof,” 2019. 5. Prabowo WA, Azizah F (2021) RESTI journal. IAII 10:11–12 6. Bastiaensens S, Vandebosch H, Poels K, Van Cleemput K, Desmet A, De Bourdeaudhuij I (2014) Cyberbullying on social network sites. An experimental study into bystanders’ behavioural intentions to help the victim or reinforce the bully. Comput Human Behav 31(1):259–271. https://doi.org/10.1016/j.chb.2013.10.036 7. Sintaha M, Zawad N (2016) Cyberbullying detection using sentiment analysis in social, 21(11) 8. Almutiry S, Abdel Fattah M (2021) Arabic cyberbullying detection using arabic sentiment analysis. Egypt J Lang Eng 8(1):39–50. https://doi.org/10.21608/ejle.2021.50240.1017
Comparative Analysis of Various Machine Learning Algorithms …
787
9. Coyne I, Gopaul AM, Campbell M, Pankász A, Garland R, Cousans F (2019) Bystander responses to bullying at work: the role of mode, type and relationship to target. J Bus Ethics 157(3):813–827. https://doi.org/10.1007/s10551-017-3692-2 10. Paul S, Saha S (2020) CyberBERT: BERT for cyberbullying identification: BERT for cyberbullying identification. Multimed Syst, 0123456789. https://doi.org/10.1007/s00530-020-007 10-4 11. Atoum JO (2020) Cyberbullying detection through sentiment analysis. In: Proceedings—2020 international conference on computational science and computational intelligence CSCI 2020, pp 292–297. https://doi.org/10.1109/CSCI51800.2020.00056 12. Li Q (2010) Cyberbullying in high schools: a study of students’ behaviors and beliefs about this new phenomenon. J Aggress Maltreatment Trauma 19(4):372–392. https://doi.org/10.1080/109 26771003788979 13. Gini G, Pozzoli T, Borghi F, Franzoni L (2008) The role of bystanders in students ’ perception of bullying and sense of safety ✩. J Sch Psychol 46(6):617–638. https://doi.org/10.1016/j.jsp. 2008.02.001 14. Nahar V, Unankard S, Li X, Pang C (2012) Sentiment analysis for effective detection of cyber bullying, pp 767–774 15. Khaira U, Johanda R, Utomo PEP, Suratno T (2020) Sentiment analysis of cyberbullying on Twitter using SENTISTRENGTH. Indones J Artif Intell Data Min 3(1):21. https://doi.org/10. 24014/ijaidm.v3i1.9145 16. Salawu S, He Y, Lumsden J (2020) Approaches to automated detection of cyberbullying: a survey. IEEE Trans Affect Comput 11(1):3–24. https://doi.org/10.1109/TAFFC.2017.2761757 17. DataTurks (2018) Tweets dataset for detection of cyber-trolls. Retrieved (2023, Feb 20) (Online) https://www.kaggle.com/datasets/dataturks/dataset-for-detection-ofcybertrolls? select=Dataset+for+Detection+of+Cyber-Trolls.json
Machine Learning for Perinatal Complication Prediction: A Systematic Review Dian Lestari, Fairuz Iqbal Maulana, Satria Fadil Persada, and Puput Dani Prasetyo Adi
Abstract The objective of this systematic review is to analyze the application of machine learning for the prediction of pregnancy complications through an extensive review of published literature. The data sources for this research are scientific journals listed in prominent databases such as PubMed and Scopus. The findings of this research suggest that machine learning has been effectively employed in predicting pregnancy complications in multiple studies. Decision tree, random forest, logistic regression, and neural network are among the various machine learning algorithms that were utilized in this investigation. However, there are limitations to using machine learning technology in predicting pregnancy complications, such as reliance on the quality of data and a lack of transparency in the prediction process. This study provides a comprehensive understanding of the application of machine learning in predicting pregnancy complications and establishes a firm basis for further research in this area. Keywords Machine learning · Pregnancy complication · Systematic review
D. Lestari (B) Faculty of Medicine, Universitas Airlangga, Surabaya 60132, Indonesia e-mail: [email protected] F. I. Maulana Computer Science Department, Bina Nusantara University, Jakarta 11480, Indonesia e-mail: [email protected] S. F. Persada Entrepreneurship Department, BINUS Business School Undergraduate Program, Bina Nusantara University, Jakarta 1148, Indonesia e-mail: [email protected] P. D. P. Adi Telecommunication Research Center, National Research and Innovation Agency, Jakarta 11480, Indonesia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_53
789
790
D. Lestari et al.
1 Introduction Pregnancy and childbirth are natural events for women, but it also has inherent risks. Among these risks are complications that can adversely affect the health of the mother and the baby, even leading to death [1]. According to WHO, in 2020, around 287,000 women died during and after pregnancy and childbirth. Most of these deaths occurred in low- and lower-middle-income countries, accounting for about 95% of all maternal deaths [2, 3]. However, many of these deaths can be prevented [1]. Several risk factors can influence pregnancy complications, some of which can be identified as early as the first trimester or even before pregnancy occurs. Mothers with higher parity, lower socio-economic status, nutritional problems, and anemia before pregnancy are at greater risk of experiencing complications such as Intrauterine Growth Restriction (IUGR), prematurity, and postpartum hemorrhage [4, 5]. In contrast, in developed countries, birth rates tend to decline over time, resulting in older gestational ages and an increased risk of poor pregnancy outcomes [6, 7]. Artificial intelligence (AI) technology, particularly machine learning (ML), may provide a solution to this problem. Currently, researchers are developing machine learning models that use health data collected from pregnant women to predict the likelihood of complications [8, 9]. These models help to develop algorithms that can predict the risk of pregnancy complications. One example of a pregnancy complication that can be detected using machine learning technology is pre-eclampsia [10–13]. This medical condition can lead to high blood pressure and organ damage in pregnant women, jeopardizing the health of both the mother and the unborn baby. Machine learning models have been developed to predict the risk of pre-eclampsia, using health data from pregnant women, including age, medical history, and laboratory test results [13, 14]. Various machine learning models, such as logistic regression, random forest, support vector machine (SVM), and neural networks, have been commonly employed to predict pregnancy complications [15]. These models employ health data from pregnant women to develop algorithms that can forecast the likelihood of pregnancy complications, such as pre-eclampsia, miscarriage, and preterm labor [14]. The refinement of these machine learning models has been the focus of research and development to improve their accuracy and interpretability in predicting outcomes. The application of these machine learning models can assist medical practitioners in adopting the necessary preventive measures to avoid pregnancy complications and enhance the well-being of both mothers and infants [15, 16]. However, the use of machine learning for predicting pregnancy complications is not without limitations. These limitations include reliance on data quality and the lack of transparency in the prediction process [17]. Consequently, a systematic review is necessary to critically evaluate the effectiveness and limitations of machine learning technology in predicting the likelihood of pregnancy complications. This study aims to provide a comprehensive understanding of the application of machine learning in predicting pregnancy complications and establishes a firm basis for further research in this area.
Machine Learning for Perinatal Complication Prediction: A Systematic …
791
2 Method The present study utilized the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework as a tool and guide for conducting a comprehensive and transparent systematic review. PRISMA is an established protocol that facilitates the assessment of systematic reviews and/or meta-analyses [18].
2.1 Search Strategy The search for literature pertaining to the use of artificial intelligence in predicting complications during pregnancy was conducted using the digital databases of Pubmed and Scopus. The search was restricted to articles written in English and published between 2018 and 2022. The search query utilized a set of predetermined keywords, including “artificial intelligence,” “machine learning,” “deep learning,” “complications in pregnancy,” and “perinatal complications.” In Pubmed, the search was refined using MESH terms, which included synonyms related to the search terms. A total of 122 articles were initially retrieved, out of which 35 were removed due to duplication.
2.2 Eligibility Criteria The present study employed a set of predefined inclusion and exclusion criteria to identify relevant literature for the investigation. Specifically, the inclusion criteria comprised original research articles written in English and published in journals with full-text accessibility. These articles focused on pregnant women as study samples and employed artificial intelligence, machine learning, or deep learning techniques to predict complications in pregnancy. The complications of interest pertained to both the mother and fetus. Conversely, the exclusion criteria entailed systematic reviews, meta-analyses, and bibliometric reviews, as well as articles that discussed, maternal-unrelated diseases, postpartum complications, that heightened the risk of complications during perinatal.
2.3 Article Screening The present study is commenced by accessing the digital databases of Pubmed and Scopus to retrieve relevant literature, which was exported and stored in a Mendeley library. To ensure the quality and consistency of the literature pool, duplicates were identified and removed. Consequently, a total of 122 articles were deemed eligible for
792
D. Lestari et al.
further screening. Subsequently, 36 articles were excluded based on their titles, and an additional 19 were removed due to their failure to meet the established exclusion criteria. Upon completion of these preliminary steps, the remaining 11 articles were selected for inclusion in the systematic review.
2.4 Bias Assessment The 11 articles selected for inclusion in the systematic review underwent a bias assessment using the Critical Appraisal Skills Programme (CASP) checklist, which comprises ten questions designed to evaluate clinical prediction rules. The quality of each study was evaluated based on the CASP critical score, which awards 2 points for fully met criteria, 1 point for partially met criteria, and 0 points for criteria that are not applicable, not met, or not mentioned. Following this assessment, the studies were ranked according to their quality, a total score of 22 implies high quality, while a score ranging from 16 to 21 implies moderate quality. On the other hand, a score of ≤ 15 indicates low quality [19].
3 Results and Discussion 3.1 Study Characteristic The PRISMA approach was employed to categorize articles based on the predetermined standards of title, abstract, and full article. A total of 122 articles were initially identified, and following the removal of duplicates and the application of exclusion criteria, 11 articles were deemed appropriate for systematic review. The analyzed manuscripts comprised a mixture of research types, with cohort (45.5%), prospective (36.3%), experimental (9.1%), and case–control (9.1%) studies, being the most common. The populations under investigation were primarily located in Europe, America, and Asia. Upon conducting a bias evaluation using CASP, it was revealed that the articles were of moderate quality and none were of poor quality based on the evaluation criteria (with an average total score of 18–19). The research on machine learning for predicting pregnancy complications has experienced a steady increase over the years. This trend is evident from the analysis of the Scopus database, which shows the growth of studies on this subject over the past decade (Fig. 2). Research growth on this topic has peaked in the last five years. The increase in growth rate can be attributed to the adoption of machine learning in healthcare facilities by both developed and developing countries. Machine learning’s potential to carry out early diagnosis, symptom analysis, and expedite treatment recommendations for patients is deemed to be a substantial contribution toward enhancing health care. PRISMA flowchart for selecting articles is shown in Fig. 1.
Machine Learning for Perinatal Complication Prediction: A Systematic …
793
Fig. 1 PRISMA flowchart for selecting articles
3.2 Machine Learning Prediction of Perinatal Complications Table 1 shows the perinatal complications predicted through machine learning models.
3.3 Perinatal Complication to Predict The study’s primary objectives were focused on predicting nine major outcomes, which were pre-eclampsia, gestational hypertension, prematurity, gestational diabetes mellitus, fetal death, neonatal death, placenta accreta, Intrauterine Growth
794
D. Lestari et al.
Fig. 2 Machine learning research trends for pregnancy complications prediction over decade
Restriction (IUGR), and Vaginal Birth After Cesarean Section (VBAC). When implementing machine learning algorithms, the main perinatal complications that were considered were maternal complications, which accounted for 54.6% of the cases, and fetal complications, which represented 45.4% of the cases.
3.4 Validation Models Model validation involves evaluating the performance of a model by determining the presence of prediction errors in the model through various methods. The validation techniques employed in the studies involved the utilization of training–test and crossvalidation methods, which exhibited a high degree of reliability in generating results.
3.5 Machine Learning Models and Performance Metrics Within the scope of this review, the majority of studies (91%) utilized the Area Under the Curve (AUC) metric to evaluate predictive accuracy, while a single study (9%) solely relied on accuracy metrics. The assessment of sensitivity was conducted in only 61.3% of the studies, and although all studies utilized at least one performance metric to evaluate outcomes, the reporting of predictive accuracy was often inadequate, with 63.6% of the studies utilizing no more than three performance methods. It should be noted that the predictive models developed in these studies lacked direct clinical applications and were solely aimed at establishing effective diagnostic systems for
During 56,993 pregnancy Retrospective (unspecified cohort study gestation age)
≥ 20 weeks of gestation
[23]
954,813 Stillbirth versus Retrospective born alive cohort study
Gestational hypertension versus no gestational hypertension
Severe neonatal mortality versus no severe neonatal mortality
[22]
100 Prospective study
< 22 weeks gestation
[21]
Outcomes
20–42 weeks’ 112,963 Preterm Birth gestation Retrospective versus Aterm cohort study Birth
Number of samples and study type
[20]
Document/ Time of data author collection
Fundal height Systolic pressure Diastolic pressure Urine protein level Edema
Maternal age Parity Gestational age Antenatal steroid administration
ML models
Test and training (unspecified percentage)
Ten replicates of tenfold cross-validation and on the one standard error rule
Logistic regression Decision Tree RF XGBoost Multi-layer perceptron
Logistic regression XGBoost LightGBM LSTM
Decision tree SVM General additive model Simple NN
Tenfold Logistic cross-validation regression RF ANN Decision tree
Validation technique
• SocioTenfold demographic cross-validation • Chronic conditions • Obstetric complications • Congenital anomalies • Medical history
• • • • •
• • • •
Age (years) Maternal age Height Pre-pregnancy BMI • Previous abortions
• • • •
Predictors
Table 1 Perinatal complications predicted through machine learning models
0.830 0.819 0.831 0.840 0.836
0.789 0.858 0.851 0.870
0.853 0.851 0.850 0.848
0.795 0.771 0.752 0.798
55.2 54.1 55.5 58.1 56.5
– – – –
79.7 79.1 80.6 78.5
62.2 45.2 62.7 58.1
Sen. (%)
– – – – –
– – – –
– – – – –
92.3 92.6 92.1 92.2
75.6 77.4 75.0 73.3
– – – –
(continued)
80.9 79.6 81.8 80.7
87.0 94.1 84.6 90.1
Spec. Acc. (%) (%)
Performance metrics AUC
Machine Learning for Perinatal Complication Prediction: A Systematic … 795
Number of samples and study type
989 Prospective study
262 Prospective study
64 Observational retrospective study
Document/ Time of data author collection
During pregnancy in aterm gestation
During pregnancy (unspecified gestational age)
During pregnancy (unspecified gestational age)
[24]
[25]
[26]
Table 1 (continued) Predictors
Placenta accreta spectrum: – Creta – Increta – Percreta
• MRI imaging • Histological sample • Texture analysis
Late IUGR versus • FHR/CTG record healthy fetus with an indication of signal quality: optimal, acceptable, insufficient/absent
Successful VBAC • Maternal age versus TOLAC • Previous vaginal failure risk delivery • Weight and height • Cervical eff and dilation • Fetal station • Induction of labor • Prior arrest of descent
Outcomes
ML models
Tenfold RF cross-validation k-NN Naïve Bayes Multi-layer perceptron
60% training SVM and 40% testing
Tenfold RF cross-validation GLM XGBoost
Validation technique Sen. (%)
– – – –
0.78
93.7 98.7 77.5 84.9
0.78
0.3544 – 0.3534 – 0.3384 –
AUC
95.6 98.1 80.5 88.6
–
– – –
(continued)
93.7 98.7 75.0 83.8
0.68
– – –
Spec. Acc. (%) (%)
Performance metrics
796 D. Lestari et al.
727 Prospective study
35–37 weeks of gestation
First trimester 43 pregnancy Case–control study
[27]
[28] Risk of GDM
Validation technique
ML models
• RNA extraction • miRNA profiling
0.83 0.86
AUC 71.0 94.0
Sen. (%)
Leave-one-out RF cross-validation AdaBoost (LOOCV) Logistic regression LR: miR-223 LR: miR-23a LR: miR-223 + mir-23a
0.81 0.77 0.74 0.94 0.89 0.91
0.94 0.94 0.88 0.94 1.00 0.94
0.81 0.86 0.76 0.90 0.90 0.90
78.6 81.7
80.3 86.0
(continued)
0.40 0.60 0.40 0.80 0.60 0.80
– –
97.2 97.1
Spec. Acc. (%) (%)
Performance metrics
80% training MOGGE 0.83 – and 20% testing PAR-Antepartum 0.8733 – score MOGGE PAR-Peripartum score
• Blood pressure Testing datasets GBTree • Cutoff of 38 for the RF sFlt-1etoePlGF ratio • Combined blood pressure + proteinuria
Predictors
Placenta accreta • Placenta accreta spectrum: risk-antepartum score – Massive blood • Placenta accreta loss risk-antepartum – Prolonged score hospitalization Admission to ICU
1647 Pre-eclampsia Retrospective versus no cohort study pre-eclampsia
Outcomes
≥ 20 weeks of gestation
Number of samples and study type
[13]
Document/ Time of data author collection
Table 1 (continued)
Machine Learning for Perinatal Complication Prediction: A Systematic … 797
[15]
≥ 20 weeks of gestation
Document/ Time of data author collection
Table 1 (continued) Outcomes
867 Adverse maternal Retrospective and neonatal Cohort Study outcomes of pre-eclampsia
Number of samples and study type • Gestational ages • Hypoalbuminemia • The diagnosis criteria of “impaired liver function,” “renal insufficiency,” “thrombocytopenia,” and “HELLP syndrome” • Creatinine level
Predictors
ML models
70% training k-NN and 30% testing Decision tree RF SVM Multi-layer perceptron LDA Logistic regression
Validation technique 0.911 0.908 0.963 0.976 0.973 0.961 0.958
AUC 0.708 0.846 0.138 0.923 0.923 0.831 0.831
Sen. (%) 0.968 0.968 0.994 0.923 0.923 0.987 0.987
– – – – – – –
Spec. Acc. (%) (%)
Performance metrics
798 D. Lestari et al.
Machine Learning for Perinatal Complication Prediction: A Systematic …
799
various perinatal complications. In total, 18 different machine learning techniques were employed across the nine perinatal complications investigated.
4 Discussion 4.1 Machine Learning as Input Variables Machine learning has emerged as a significant technological advancement that finds extensive applications across multiple domains. Its scope ranges from data mining to natural language processing, and image detection, among others. In the field of medical science, it has the potential to revolutionize disease diagnosis by enabling the development of accurate predictive models [29]. This systematic review presents an investigation of different machine learning methods for diagnosing a variety of perinatal complications and highlights their significance in promoting women’s health. The review identified a total of nine perinatal complications that were predicted using various machine learning models, including pre-eclampsia, gestational hypertension, prematurity, gestational diabetes mellitus, fetal death, neonatal death, placenta accreta, Intrauterine Growth Restriction (IUGR), and Vaginal Birth After Cesarean Section (VBAC). While machine learning has the potential to significantly enhance health care, it is essential to consider the challenges associated with the use of artificial intelligence in health care. Ethical considerations and the potential for human bias in developing computer algorithms need to be addressed [16]. Predictive models in health care can be influenced by race, genetics, gender, and other characteristics, which can lead to underestimation or overestimation of patient risk factors if not carefully considered. Clinicians must ensure that AI algorithms are developed and implemented appropriately to avoid potential issues [30]. This systematic review relied primarily on electronic medical records for data collection. Machine learning techniques can identify patterns in datasets derived from electronic medical records, which can support the prediction and decisionmaking processes for diagnosis and treatment planning. The application of machine learning methods based on electronic medical records can be combined with other large medical data sources, such as genomics and medical imaging [31]. The development of predictive algorithms, when employed as supplementary data, has the potential to enhance clinical diagnosis and treatment procedures. Electronic medical records typically contain demographic data, biological markers, vital signs, clinical records, diagnoses, prescriptions, and procedures. Obtaining them is generally a straightforward process, and their use can help to mitigate errors during the transfer of substantial volumes of information [32]. The diagnosis prediction model for perinatal complications has been predominantly constructed using electronic medical records’ (EMRs) data. Notably, the most commonly utilized feature in this model is maternal sociodemographic characteristics. This feature has proven particularly useful in predicting perinatal complications
800
D. Lestari et al.
that are prevalent in specific populations. The model’s ability to accurately forecast the occurrence of these complications in advance has the potential to greatly enhance perinatal health outcomes overall [33].
4.2 Perinatal Complications as Output Variables The output variable in most machine learning models used for perinatal complication prediction is binary, indicating whether or not there is a complication [10]. However, some studies have used risk measurement for classification. For example, in a study of trial of labor after cesarean (TOLAC), the risk of complications was categorized as high, moderate, or low. Similarly, in a study of gestational diabetes, the risk was quantified as high or low. The use of risk measurement in classification allows for a more nuanced approach to predicting complications and can help healthcare providers to tailor their care to individual patients [28]. Two perinatal complications that are often predicted using machine learning models are prematurity and pre-eclampsia. Prematurity is a major public health issue because premature babies often suffer from significant morbidity and mortality during the neonatal period, which can result in high treatment costs [20]. Preeclampsia is a pregnancy disorder characterized by hypertension after 20 weeks of pregnancy and organ damage and is one of the leading causes of maternal and neonatal morbidity and mortality worldwide. Prediction of the risk of developing pre-eclampsia can be done during the first half of pregnancy, allowing healthcare providers to monitor at-risk patients more closely and intervene if necessary [13].
4.3 Machine Learning Method Performance This systematic review employed various machine learning (ML) methods to predict preterm labor and gestational hypertension. Among these methods, the decision tree model demonstrated superior performance in predicting preterm labor due to its ability to handle unbalanced data where the number of positive (preterm) and negative (not preterm) samples is uneven [20]. Conversely, the Long Short-Term Memory (LSTM) model was found to perform well in predicting gestational hypertension due to its suitability for time series data and its ability to address the vanishing gradient problem that occurs in conventional artificial neural network models [22]. Despite these findings, it cannot be concluded that any particular ML model is the best for predicting perinatal complications, as different models have their own unique advantages and limitations, and may not have the same input variables, record type, and sample size. Nonetheless, the best overall performance was observed in the support vector machine (SVM) model for pre-eclampsia prediction, with an accuracy of 92.3% in predicting adverse maternal and neonatal outcomes. SVM was found
Machine Learning for Perinatal Complication Prediction: A Systematic …
801
to be a simple and flexible model that can effectively address multiple classification problems, even when sample size is limited [15].
5 Findings The findings of this systematic review suggest that machine learning has been effectively employed in predicting pregnancy complications in multiple studies. Decision tree, random forest, logistic regression, and neural network are among the various machine learning algorithms that were utilized in this investigation. However, there are limitations to using machine learning technology in predicting pregnancy complications, such as reliance on the quality of data and a lack of transparency in the prediction process. This study provides a comprehensive understanding of the application of machine learning in predicting pregnancy complications and establishes a firm basis for further research in this area.
6 Conclusion To conclude, the primary benefit of interpretable machine learning (ML) applications is the objective nature of their output, which is based on real-world data and identifies the most important variables for doctors. It is crucial to further promote research in this area to develop ML solutions that can be widely applied in clinical settings to reduce perinatal complications. AI technology has the potential to transform women’s health care by improving diagnostic accuracy, reducing the workload of medical professionals, lowering healthcare costs, and providing benchmark analyses for tests with varying interpretations among specialists. This systematic review makes a significant contribution to the existing literature on artificial intelligence and maternal’s health. It is hoped that upcoming studies will delve deeper into the forecasting of perinatal complications not only during pregnancy but also prior to conception, utilizing a diverse range of tools including deep learning, artificial intelligence, and beyond machine learning.
References 1. Padilla CR, Shamshirsaz A (2022) Critical care in obstetrics. Best Pract Res Clin Anaesthesiol 36:209–225 2. WHO (2017) Recommendations on newborn health: approved by the WHO Guidelines Review Committee WHO, 1–28 3. World Health Organization (2020) Maternal mortality evidence brief, 1–4
802
D. Lestari et al.
4. Leonard SA, Main EK, Lyell DJ (2022) Obstetric comorbidity scores and disparities in severe maternal morbidity across marginalized groups. Am J Obstet Gynecol MFM 4 5. Diana S, Wahyuni CU, Prasetyo B (2020) Maternal complications and risk factors for mortality. J Public health Res 9:1842 6. Mehari MA, Maeruf H, Robles CC, Woldemariam S, Adhena T, Mulugeta M, Haftu A, Hagose H, Kumsa H (2020) Advanced maternal age pregnancy and its adverse obstetrical and perinatal outcomes in Ayder comprehensive specialized hospital, Northern Ethiopia, 2017: a comparative cross-sectional study. BMC Pregnancy Childbirth 20:1–10 7. Bhandari TR (2013) Maternal and child health situation in South East Asia. Nepal J Obstet Gynaecol 7:5–10 8. Sadovsky Y, Mesiano S, Burton GJ, Lampl M, Murray JC, Freathy RM, Mahadevan-Jansen A, Moffett A (2020) Advancing human health in the decade ahead: pregnancy as a key window for discovery: A Burroughs Wellcome Fund Pregnancy Think Tank. Am J Obstet Gynecol 223:312–321 9. Bertini A, Salas R, Chabert S, Sobrevia L, Pardo F (2022) Using machine learning to predict complications in pregnancy: a systematic review. Front Bioeng Biotechnol 9 10. Feduniw S, Golik D, Kajdy A, Pruc M, Modzelewski J (2022) Application of artificial ıntelligence in screening for adverse perinatal outcomes—a systematic review. Healthc 10 11. Attwaters M (2022) Detecting pregnancy complications from blood. Nat Rev Genet 23:136 12. Pietsch M, Ho A, Bardanzellu A, Zeidan AMA, Chappell LC, Hajnal JV, Rutherford M, Hutter J (2021) APPLAUSE: automatic prediction of PLAcental health via U-net segmentation and statistical evaluation. Med Image Anal 72 13. Schmidt LJ, Rieger O, Neznansky M, Hackelöer M, Dröge LA, Henrich W, Higgins D, Verlohren S (2022) A machine-learning-based algorithm improves prediction of preeclampsiaassociated adverse outcomes. Am J Obstet Gynecol 227(77):e1-77.e30 14. Espinosa C, Becker M, Mari´c I, Wong RJ, Shaw GM, Gaudilliere B, Aghaeepour N, Stevenson DK (2021) Data-driven modeling of pregnancy-related complications. Trends Mol Med 27:762–776 15. Zheng D, Hao X, Khan M, Wang L, Li F, Xiang N, Kang F, Hamalainen T, Cong F, Song K, Qiao C (2022) Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: a retrospective study. Front Cardiovasc Med 9 16. Cecula P (2021) Artificial intelligence: the current state of affairs for AI in pregnancy and labour. J Gynecol Obstet Hum Reprod 50 17. Sarno L, Neola D, Carbone L, Saccone G, Carlea A, Miceli M, Iorio GG, Mappa I, Rizzo G, Girolamo RD, D’Antonio F, Guida M, Maruotti GM (2020) Use of artificial intelligence in obstetrics: not quite ready for prime time. Am J Obstet Gynecol MFM 5 18. Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med 19. Zeng X, Zhang Y, Kwong JSW, Zhang C, Li S, Sun F, Niu Y, Du L (2015) The methodological quality assessment tools for preclinical and clinical studies, systematic review and meta-analysis, and clinical practice guideline: a systematic review. J Evid Based Med 8:2–10 20. Belaghi RA, Beyene J, McDonald SD (2021) Prediction of preterm birth in nulliparous women using logistic regression and machine learning. PLoS One 16 21. Hamilton EF, Dyachenko A, Ciampi A, Maurel K, Warrick A, Garite TJ (2018) Estimating risk of severe neonatal morbidity in preterm births under 32 weeks of gestation. J Matern Neonatal Med 7058 22. Lu X, Wang J, Cai J, Xing Z, Huang J (2022) Predıctıon of gestatıonal dıabetes and hypertensıon based on pregnancy examınatıon data. J Mech Med Biol 22 23. Malacova E, Tippaya S, Bailey HD, Chai K, Farrant BM, Gebremedhin AT, Leonard H, Marinovich ML (2020) Stillbirth risk prediction using machine learning for a large cohort of births from Western Australia, 1980–2015. Sci Rep 10 24. Meyer R, Hendin N, Zamir M, Mor N, Levin G, Sivan E (2020) Implementation of machine learning models for the prediction of vaginal birth after cesarean delivery. J Matern Neonatal Med, 1–7
Machine Learning for Perinatal Complication Prediction: A Systematic …
803
25. Pini N, Lucchini M, Esposito G, Tagliaferri S (2021) A machine learning approach to monitor the emergence of late intrauterine growth restriction. Front Artifial Intell 4:1–11 26. Romeo V, Ricciardi C, Cuocolo R, Stanzione A, Verde F, Sarno L, Improta G, Paolo P, Armiento MD, Brunetti A, Maurea S, Machine learning analysis of MRI-derived texture features to predict placenta accreta spectrum in patients with placenta previa. Magn Reson Imag 64:71–76 27. Shazly SA, Hortu I, Shih J-C, Melekoglu R, Fan S, MEO, GGE. (MOGGE) FoundationA. ˙Intelligence (AI) (2022) Prediction of clinical outcomes in women with placenta accreta spectrum using machine learning models: an international multicenter study. J Matern Neonatal Med 35:6644–6653 28. Yoffe L, Polsky A, Gilam A, Raff C, Mecacci F, Ognibene A, Crispi F, Gratacós E (2019) Early diagnosis of gestational diabetes mellitus using circulating microRNAs. Eur J Endocrinol 181:565–577 29. Feduniw S, Sys D, Kwiatkowski S, Kajdy A (2020) Application of artificial intelligence in screening for adverse perinatal outcomes: a protocol for systematic review. Med (United States) 99:E23681 30. Cho H, Lee EH, Lee K-S, Heo JS (2022) Machine learning-based risk factor analysis of adverse birth outcomes in very low birth weight infants. Sci Rep 12 31. Rani S, Kumar M (2021) Prediction of the mortality rate and framework for remote monitoring of pregnant women based on IoT. Multimed Tools Appl 80:24555–24571 32. Belciug S (2022) Learning deep neural networks’ architectures using differential evolution. Case study: medical imaging processing. Comput Biol Med 146 33. Hang Y, Zhang Y, Lv Y, Yu W, Lin Y (2021) Electronic medical record based machine learning methods for adverse pregnancy outcome prediction. In: T, J, D, Q, Y, L, K, M (eds) 12th International conference on signal processing systems. SPIE, Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
Multi-modal Biometrics’ Template Preservation and Individual Identification B. Nithya and P. Sripriya
Abstract We introduce a system that provides multi-modal template security in response to the rising vulnerability to biometric templates. The proposed work aims to provide a multi-modal biometric identification system with protected templates that do not degrade overall recognition performance. The presented shielded technique was compared against an unprotected multi-modal biometric recognition system to prove the above metric. Many criteria are used to determine the success of the recommended system, including training time, testing time, Equal Error Rate (EER), accuracy, and classifier performance. Unique characteristics are acquired using Speededup Robust Features (SURFs) and Histogram of Oriented Gradients (HoG) from three biometric modalities (fingerprint, face, and signature). With the aid of the bio-secure template security method, the extracted characteristics have been fused, and templates have been turned into new templates. The generated template can be altered by simply adjusting the seed’s random matrix. A virtual database is developed to evaluate the recommended approach. The hybrid feature extraction method is also assessed in addition to the performance of the single feature extraction strategy. Finally, the classifier and deep neural network are trained to predict the provided individual. The findings reveal that the protected approach improves the system’s overall recognition performance, and the EER value remains lower at different feature counts. The acquired highest accuracy is 96%, and the lowest EER is 0.07% on the 20 vital hybrid feature points. Keywords Template security · Revocable · Cancelable · Multi-modal biometrics · Neural network classification
B. Nithya (B) Department of Computer Science, New Prince Shri Bhavani Arts and Science College, Medavakkam, Chennai, Tamil Nadu, India e-mail: [email protected] P. Sripriya Department of Computer Applications, VISTAS, Chennai, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757, https://doi.org/10.1007/978-981-99-5166-6_54
805
806
B. Nithya and P. Sripriya
1 Introduction Compared to a single-modal biometric system, a multi-modal biometric system improves security, adaptability, and accuracy by combining several biometric modalities. Biometric security systems have several challenges, including different biometric picture captures at the moment of enrollment and verification, noisy inputs, and non-universality [1]. These problems are principally responsible for the False Acceptance Rate (FAR) and False Rejection Rate (FRR). When these two criteria increase, the total recognition performance will get impacted. Reliability can be achieved when numerous modalities are employed. Mainly, there are two possibilities of attacks on biometric template databases [2]. The main attack is hacking or unauthorized access to the database, while another assault is leaking. These are severe issues because when the original templates are hacked or leaked, the user’s template cannot be regained due to the uniqueness of biometric characteristics. As a result, preserving the unique templates is a considerable risk because they cannot be updated or regenerated once they have been compromised. The solution can be implemented using various methods, but it must adhere to the qualities of diversity, renewability, and performance. Standard encryption, feature transformation, and biometric cryptosystem are all options for safeguarding templates. Cryptographic algorithms such as AES and RSA might be used to carry out traditional encryption. However, template theft has been found on cryptographic methods, and these are insufficient to safeguard biometric templates [3]. The feature transformation technique applies the function f (.) on the original template (T) to create the changed template. The database will then save the changed template f (TE, k) instead of the original template (T). Here, the k might be a random number or the user’s password [4]. Invertible and non-invertible transformations are the two types of transformation. The random multi-space quantization approach [5] is a well-known invertible transformation. This method transforms the biometric feature vector into a newly modified feature space by multiplying the seed-based random matrix with feature vectors. The values obtained are quantized, transforming them into binary values. Inversion to the original templates is prevented using the hashing method [6]. The non-invertible transformation technique is the name for this sort of template security. Another type of template security is the biometric cryptosystem, which uses cryptographic keys derived from biometric features. The cryptographic keys utilized in the cryptosystem are called “helper data,” and the system is characterized as a key binding and key generation technique [7]. Biometric cryptosystems are being used in various identity verification systems [8] since they are now required in high-security platforms such as banking, hospitals, and different government and corporate sectors. The cryptosystem approach is used in various finger samples, including the vein, outline, fingerprint, and knuckle. Feature-level fusion, decision-level fusion, and score-level fusion are used to test these samples. The proposed research will use neural networks to demonstrate that it effectively categorizes biometric feature vectors. The performance of the hamming distance matching method and the neural
Multi-modal Biometrics’ Template Preservation and Individual …
807
network classification methodology was examined in this study. The following are the contributions of the proposed work: • Multiple modalities (fingerprint, face, and signature) were employed to acquire the identification system’s reliability, and biometric attributes were extracted using Speeded-up Robust Features (SURFs) and Histogram of Oriented Gradients (HoG) methods. • Instead of employing traditional biometric matching approaches, deep neural network classification is applied. • For training neural networks to determine the performance of secured template recognition, both changed and unmodified templates were provided as inputs. • The suggested bio-hash is used in conjunction with hybrid and single feature extraction approaches.
2 Related Works The author discusses assaults and the strength of the privacy protection technique while presenting texture [9] based bio-secure of fingerprint pictures. The authors examined EER findings between raw fingerprint and Biocode, finding that the cancelable Biocode yielded 0% EER, whereas the raw finger code yielded 19% EER. This type of attack was named a “zero-effect assault.” At the time of verification, the findings were obtained using the hamming distance matching method. The authors [10] used the bio-secure approach to achieve fingerprint cancelability, and the results were compared between the codes with and without bio-secure applying the hamming distance of matching methodology. Fingerprint pictures were collected from several sensors, and feature extraction was performed using the local binary pattern approach. The researchers [11] constructed a biometric-hash framework using external data using random multi-space quantization and the Fisher discriminant analysis (FDA) feature extraction approach. The authors compared the findings with other states-ofthe-art algorithms by using facial photos to validate the results with the assistance of hamming distance. By reducing EER to a near-zero level, the recommended approach demonstrates non-invertibility and cancelability. The study in [12] discussed security and privacy attacks on two-factor authentication. One is a password unique to the person, and the other is a biometric feature. This research developed a compressed sensing recovery approach to producing better results even at greater danger levels. The principal component analysis approach was utilized to extract features. Using 1bit compressive sensing and facial pictures, the authors attempted to produce bio-hash vectors. A pre-trained CNN model is used in work [13] to demonstrate cancelability and unlinkability on face and iris pictures. Hybrid template security is the combination of two ways for safeguarding biometric templates. With the aid of the VGG-16 pre-trained model, deep hashing and secure sketch were integrated to produce a secure hybrid environment. Features are retrieved internally in deep network layers, and there are numerous action layers, including a domain-specific layer, join representation layer, fusion
808
B. Nithya and P. Sripriya
layer, hashing layer, and Softmax classifier. The actual acceptance rate is computed using various security bit lengths. The authors concluded that when the size of the securing key grows more extensive, the error-correcting capabilities decrease. Cancelable Face Hashing [14] was used to create a non-invertible transformation that includes RSA encryption and decryption to improve the security of the templates. This approach enhances recognition performance while maintaining the original feature vectors. The key set [d, α] is exploited in [15] to offer a secured version of the fingerprint’s original template. PCA-based alignment was presented to offer orientation at a single axis to prevent orientation issues during enrolling and verification. According to the authors, even if both keys and secured templates are compromised, the original templates cannot be revoked. It also compares the performance of the three approaches, PCA, ICA, and DWT, by grouping them in different ways. An attempt at template protection using DNA encoding has been described to achieve uniqueness [16]. This study focuses on Z-pattern creation, 4-bit binary code formation, Quad-Bin formation, Dec-code formation, and DNA codec formation. This type of feature codec procedure recognizes the precise location of security bits within the operating time. According to the literature, the unimodal biometric is the most commonly used biometric on the bio-secure template protection method, and the traditional matching approach of hamming distance is employed for authentication. However, this study uses the bio-hash technique to classify three biometric modalities using neural networks with hybrid feature extraction. The suggested system and its outcomes are described in the following sections.
3 Proposed Technique 3.1 Feature Extraction Speeded-up Robust Features (SURFs) The main goal of the SURF algorithm [17] is used to detect interest points that are the most distinctive places of the image. The neighborhoods of interest points are also calculated, and these points are known as descriptors. Due to the Hessian matrixbased measurement of interest points, SURF is considered a fast detector. Figure 1a shows the SURF feature points extracted from multiple modalities. ∑ This strategy is possible with an integral image denoted as I (X) and achieved by the following Formula (1): I∑ (X ) =
j>6, the total complexity tends to O(N).
3 Experiments We conduct our experimental studies in two stages: first, we assess the effectiveness of the proposed STR mining algorithm 3S, and then, we assert the usefulness of the STR-created repeat sequences for genome comparison. Accuracy, efficiency, and scalability are employed to compare the performance of 3S in STR mining tasks with Kmer-SSR and PERF using their implementations available at https://github. com/ridgelab/Kmer-SSR and https://github.com/RKMlab/perf, respectively. When compared to Kmer-SSR, PERF reports STRs along with partial motifs. We com-
916
U. Mitra et al.
pare the outcomes in the light of the fact that 3S can extract STRs in two ways, with and without partial motifs. Finally, after achieving efficient and accurate mining of STRs, we proceed to conduct STR-based phylogeny reconstruction and taxa identification of eukaryotic species. The ultimate goal of the second stage of experiments is to determine whether a genome’s STRs can accurately represent a genomic sequence to the extent where only the STRs can be used to infer the right phylogeny and correct identification of genomic taxa. We perform the experiments on DNA sequences of several chromosomes and whole genome sequences (both reference and assembled) of different organisms, publicly available in https://www.ncbi.nlm. nih.gov/assembly/organism/. All experiments are conducted by using a modest computer with Intel(R) Core(TM) i5-5200U CPU @ 2.20 GHz with 4GB RAM and 1TB HDD.
3.1 Comparative Performance of STR Mining Accuracy Table 1 shows exactly equal numbers of STRs extracted by 3S (with and without partial motifs), Kmer-SSR and PERF from human chromosome 1. For indepth comparison on the accuracy of mined STRs, we provide start positions (Supplementary_Table_S1.xlsx), repeat counts (Supplementary_Table_S2.xlsx) and the motif combination itself (Supplementary_Table_S3 .xlsx). Supplementary_Table_ S4.xlsx contains the total STR counts on all other Human chromosomes. The fact that human chromosome 1 has the most STRs and chromosomal Y has the fewest is an interesting observation. Results obtained by 3S in all the experiments match exactly with the respective results by Kmer-SSR and PERF. All these confirm correctness of the proposed seed selection approach. Supplementary tables are available in https://github.com/STRHunter/3S. Efficiency We evaluate computing efficiency and memory requirement of 3S on all the human chromosomes and compare with Kmer-SSR and PERF (Supplementary Table 5). 3S is found significantly efficient over PERF and Kmer-SSR. Considering
Table 1 STR counts from Human Chromosome 1 with cutoff value 12 Motif size Methods Kmer- SSR 3S with p=0 PERF 6 5 4 3 2 1 All
146,655 13,116 46,636 15,817 25,207 57,335 304,766
146,665 13,116 46,636 15,817 25,207 57,335 304,766
146,665 59,284 46,636 15,817 25,207 57,335 350,932
3S with p=1 146,665 59,284 46,636 15,817 25,207 57,335 350,932
A Novel Algorithm for Genomic STR Mining …
917
overall performance on all human chromosomes, 3S takes one fifth of CPU time by PERF and one twenty-ninth by Kmer-SSR by requiring less than 1% of their average processing memory (excluding the storage for sequence). It is significant to note that CPU time is invariant against cutoff length for both Kmer-SSR and 3S (Supplementary Table 6). Supplementary tables are available in https://github.com/ STRHunter/3S. Scalability A method may be efficient on moderately small amount of data, but its real test of proficiency is how well it performs on large volumes of data or on long sequences. We conduct a study on scalability of computation against sequence length. For this experiment, we utilized concatenated human chromosomes to generate long sequences that were arranged in increasing order. The maximum sequence length employed was 1.6 Gbp (gigabase pairs). PERF and Kmer-SSR along with 3S were applied on those set of sequences to mine STRs for comparing CPU time and processing memory. Further, a separate experiment was conducted on the sequence length 1.6 Gbp but this time with increasing cut-off values (Fig. 2b). It is apparent in Figs. 1 and 2 that 3S is exceedingly well and the best performer, both on CPU time, memory, and in mining STRs of any cutoff length. 800
2500
PERF 3S with p=1
Kmer−SSR 2000
3S with p=0 CPU time(sec)
CPU time(sec)
600
400
200
1500 1000 500
0 200
400
600
800
1,000
1,200
1,400
1,600 x10
0 200
6
400
Sequence length(nt)
600
800
1,000
1,200
1,400
6
1600 x10
Sequence length(nt)
(a)
(b)
Fig. 1 Scalability on CPU time against sequence length a with PERF b with Kmer-SSR 2500
1500
3S with p=1 CPU time(sec)
Processing memory in MB
PERF 2000 1500
Kmer−SSR
1000
PERF 500 0 200
1000
500
3S
400
600
800
1,000
1,200
Sequence length(nt)
(a)
1,400
1,600 x10
6
0
50
100
150
200
250
300
350
400
450
cutoff value(nt)
(b)
Fig. 2 a Scalability on processing memory against sequence length, b scalability on CPU time against cutoff length
918
U. Mitra et al.
3.2 Mining STRs on Whole Genomes Excellent scalability makes STR mining by 3S from whole genome sequences almost instant. We consider a total of eight whole genome reference sequences of evolutionary higher organism and five assembled human genomes for the study. We present the complete result in in Table 2. In all the cases, CPU time is within 200 s, and processing memory is limited to only 4 MB. 3S is thus very useful in extracting long STRs from whole genomes sequences.
3.3 STR-Based Phylogeny Reconstruction The development of whole genome sequencing technology has made it possible to swiftly and affordably sequence larger genomes, but this has created a computational challenge in efficiently comparing such massive and numerous data. Sequence alignment techniques used in the past became inappropriate and impractical. It spurs the development of numerous alignment-free sequence analysis tools and techniques [18, 19]. K-mer statistics is the main alternative among these techniques, but picking the best k is essential for the greatest feature extraction. Additionally, the length of optimal k is also becomes larger for complete genome sequences of evolutionary higher organisms, making it extremely difficult to compute feature vectors using k-mer frequency statistics. Thus, the need for a method that is suitable and scalable for comparing the whole genome sequence of evolutionary higher species persists. In [17], the authors attempted to devise a method, pattern extraction though entropy retrieval (PEER) that employ the entropy of successive intervals(or waits) of optimal length k-mers of the sequence for feature extraction. It transforms a sequence into a vector of wait entropies of optimal k-mers. Distance between a pair of sequences amounts to the Euclidean Distance between their wait vectors. It can also determine optimal value of k(. K opt ) using length of the given sequence N and cardinality (.β) of its alphabet(a, c, g and t), as .
K opt = [
ln(N − 1) ] lnβ
(6)
Even if PEER proves to be more effective at reconstructing phylogeny than seven other cutting-edge alignment-free methods,. K opt becomes 16nt for whole genomes of higher organisms. Due to this, a high-dimensional (.416 ) feature vector is produced, which makes it computationally difficult for machines with limited resources and causes the feature vector to become sparse. Interestingly, 3S can mine and generate a STR repeat sequence by concatenating all STRs from lengthy genomes of higher organism just in 200 s while only using 4MB of memory for STR mining calculations (Table 2). We note that the repeat sequence lengths for all the genomes examined in this analysis range from 15.6 MB to 17.7 MB of nucleotides. The repeat sequence
558
2
57
2
1
8
Bos taurus (2640.16 MB)
Mus musculus (2647.52 MB)
Macaca mulata (2763.46 MB)
Pan todro (2803.62 MB)
Homo sapiens (2937.63 MB)
2
8
6
1
1
NA12878 (2798.04 MB)
HG00514 (2805.67 MB)
NA19240 (2815.57 MB)
CHM1.1 (2827.65 MB)
HG00733 (2864.15 MB)
Assembled human genomes
443
2
552
465
578
489
368
545
592
411
418
619
Canis familaris (2317.59 MB)
496
2
12
Drosophila (137.05 MB)
192
188
188
187
191
196
180
187
178
179
158
8
5
72
30
47
75
39
41
33
29
1058
14
55
26
33
221
197
201
210
195
206
205
205
186
197
196
240
187
190
191
191
190
190
199
179
187
177
180
158
8
5
CPU time (sec)
13
12
14
14
3,240,565 14
3,191,074 14
3221,358
3192,268
3,213,613 14
3,211,310 14
3,190,500 14
3,267,234 14
4,062,928 17
2,850,953 13
3,877,732 15
223,123
94,697
No. of STR Avg. STR length(nt)
191
196
195
195
193
196
181
187
178
178
157
8
5
CPU time (sec)
No. of STR Avg. STR length(nt)
No. of STR Avg. STR length(nt)
CPU time (sec)
Medium STR (150nt.